More than 20 years ago, the human genome was first sequenced. While the first version was full of “holes” representing missing DNA sequences, the genome has been gradually improved in successive rounds. Each has increased the quality of the genome and, in so doing, resolved most of the blank spaces that prevented us from having a complete reading of our genetic material.
The fundamental difficulty researchers faced in reading the genome from end to end is the enormous number of repeated sequences that populate it. The 20,000 or so genes we humans have occupy barely 2% of the entire genome. The remaining 98% is essentially made up of these families of repeated sequences, mobile elements known as transposons and retrotransposons, and – to a lesser but functionally important extent – gene expression regulatory sequences. These function as switches that determine when and where genes are turned on and off.
In March 2022, a major revision of the genome was published in the journal Science. An international consortium of researchers known as “T2T” (telomere to telomere, which are the ends of chromosomes) used a novel strategy based a type of cell (CHM13) that retains only one copy of each chromosome.
Combined with the latest techniques for sequencing DNA, the researchers managed to add some 200 million letters to the human genome, resolving most of the holes in chromosomes 1 to 22.
The only one left out was the smallest of all the chromosomes we humans have: Y. It’s an exclusively male chromosome that is also the most complex, with repeated sequences of all kinds.
The Y chromosome, finally complete
Each of us has 46 chromosomes in our cells, arranged in pairs. There are actually 23 pairs of chromosomes, 22 pairs of autosomal chromosomes (1 to 22) and one pair of sex chromosomes (which can be X or Y).
From each pair of chromosomes we inherit one from our father and one from our mother. Most females have the 46XX chromosome configuration – the last pair of chromosomes, 23, is made up of two copies of the X chromosome. Most males have the 46XY chromosome configuration, meaning that the sex chromosome pair consists of an X and a Y chromosome.
The Y chromosome, present only in males, contains the genes responsible for the development of the male sex organs, in particular the master gene SRY, which triggers a cascade of events that eventually converts an initial undifferentiated gonad into the testes, where sperm are produced. In the absence of the SRY gene (as in 46XX females), this primordial gonad eventually develops into the ovaries, where eggs are produced.
The T2T consortium solved the technical problems that prevented the completion of the Y chromosome sequence, and in so doing, discovered 40 previously unknown protein-coding genes. As detailed in an article in the journal Nature, this adds 30 million more letters to the length of the total human genome, which would now have 3.23 billion letters. The new reference genome, called T2T-CHM13+Y, has been made available to the entire research community by the authors of the study.
Alongside the complete sequence of the Y chromosome, Nature has published a second study on the sequences of 43 Y chromosomes derived from humans who lived over the last 183,000 years. Their analysis reveals great diversity in both the size and structure of this Y chromosome over the course of evolution. The researchers have detected, among other things, large sequence inversions – DNA fragments that are flipped and inserted upside down.
That we know more about the Y chromosome is great news. Just about a year ago we saw another scientific breakthrough correlating the common loss of the Y chromosome in many cells with a shorter life expectancy for men compared to women. And it is clear that much more valuable information is hidden in the genes.
The pangenome initiative
These two new studies significantly increase our knowledge of human DNA, resolving what we have yet to discover about the smallest but most complex chromosome in our genome. They come on the heels of the pangenome initiative, which aims to capture the genetic variability that exists among human beings. While we all share a large part of our genome, we differ by approximately 0.1%. This corresponds to a difference of more than 3 million pairs of letters between any two individuals.
With the pangenome initiative, we will no longer have a single reference genome, but hundreds that will more reliably illustrate our genetic similarities and differences. Among other things, this should help us more easily detect gene mutations associated with the thousands of congenital diseases.