This web page was produced as an assignment for an undergraduate course at Davidson College.

Article Link

The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

Summary:

The 1000 genomes project was carried out in an effort to detail human genetic diversity by sequencing at least 1000 human genomes (ended up sequencing 2.504 individual genomes). While the number of individuals included in the study provide a depth of sequencing, only 26 distinct human populations were included. Mallick et al. aimed in this research to improve the depth of understanding of human genetic diversity through genomic sequencing the genomes of an additional 300 individuals from 142 distinct human populations. This information has numerous implications. From it, researchers estimated the genetic drift across populations and the time of divergence of populations from an African common ancestor. They also aimed to answer questions about the divergence of Australian New Guinean, and Andamanese populations from an African common ancestor and the relative mutation rates between African and non-African subgroups. Understanding the depth of human genetic diversity is a critical step in studying human genetic disease. Genome sequencing to identify disease-related mutations is an increasingly useful tool in medicine. However, in order to identify a "mutation", an individual genome is compared to a reference and all base pairs that do not match with the reference are considered mutations. As genetic diversity contributes to a significant amount of genetic variation that is not disease-causing, knowledge of this sequence variation is critical for identifying mutations that are actually contributing to disease. For this reason, continuing to expand upon the 1000 genomes project and on improving the reference genome to represent individuals of all populations is a critical step in improving the coverage of genomic medicine across populations.

 Explanation of figures:

Figure 1:

Figure 1a: This figure is a neighbor-joining tree in which subpopulations of the groups African, American, East Asian, Oceanian, South Asian, and West Eurasian are place into a phylogenetic tree based on genomic sequence similarity. The tree was constructed by starting with Khoe-San African group which is believed to be the first distinct African population to diverge from a common ancestor (Schlebusch et al., 2012). This group was believed to diverge into a distinct genetic population before the migration out of Africa and are therefore commonly used as a common ancestor between African and non-African populations. This figure shows the divergence of populations from the Khoe-San population by individual nucleotide change (pairwise divergence). The figure shows a divergence from African to West Eurasian to a simultaneous South Asian and East Asian and finally to Oceanian and American.

Figure 1b: This figure aims to detail the ratio of X to autosome diversity in differing populations. The X to autosome diversity ratio can be used to estimate the selection pressure on x-linked vs autosomal genes (Hammer et al., 2010). We note that in this figure, the X to autosomal diversity ratio is lower in non-Africans than in Africans and is lower in Pygmy than in other African populations. One offered explanation for this is male-driven admixture. Male-driven admixture occurs when a male from another genetic population reproduces with a female from a given population. In this case, because males contribute one X chromosome to their daughters and only a Y chromosome to their sons, across a population, the genetic diversity in the X chromosome resulting from reproduction across populations should be lower than in autosomal chromosomes where all offspring get one copy of each chromosome from the father with a different genetic background. Therefore, populations that experience increased incidence of cross-population reproduction have lower X to autosomal diversity ratios. A potential explanation for Pygmy populations having a lower X to autosome diversity ratio is that this group was largely hunter-gatherers and the population was therefore very mobile and may have mixed frequently with genetically distinct populations.

Figure 1c: A heatmap of Neanderthal ancestry across sample populations. Populations ranged from 0-3% Neanderthal. We note that the highest percent of shared sequence with Neanderthal is from populations in East Asia. This coincides with our knowledge of Neanderthals as they were known to exist in Eastern Europe and Northwest Asia and populations migrating out of Africa likely intermixed with Neanderthals before continuing to migrate into East and Southeast Asia.

 Figure 1d: A heatmap of Denisovan ancestry across sample populations. Here we see that higher sequence similarity to Denisovans is found in Southeast Asia and Oceania. Denisovans were known to inhabit a region overlapping and east of Neanderthals, from North to Southeast Asia. Thus, the higher percentage of Denisovan sequence similarity as shown in this figure is a good verification that the individuals in the population samples were inhabitants with long-rooted ancestry in the region as Denisovan sequence similarity has been found linked to these regions in other studies (Sankararaman et al., 2016). 


Figure 2:

Figure 2a-c: These figures examine the cross-coalescence rate of varying populations over time. Cross-coalescence rate is essentially the rate of genetic drift over time, measured by comparing sequence similarity of specific genes, SNPs, STRs, etc between populations and calculating a time of divergence from a most-recent common ancestor (MRCA). In this way, researchers can identify trends in divergence from a MRCA among differing populations over time based on current genomic sequence. A higher cross-coalescence rate suggests that the population is moving towards a common ancestor quickly (when moving from left to right on the x axis). Therefore, populations with higher cross-coalescence rates at lower kya (thousands of years ago) will have diverged less from the MRCA. By figure 1a, we note that the populations converge around 200 thousand years ago. It was from this data that researchers proposed that the most recent common ancestor of present day human populations was living around 200,000 years ago as this figure compares present day African populations to a number of other genetically distinct populations. Figure 2b shows the cross-coalescence rate of present day African hunter-gatherer populations, suggesting that populations diverged within Africa between 50-100 thousand years ago with a MRCA living around 100 thousand years ago.  Figure 2c shows the cross-coalescence rate of non-Africans over time, demonstrating that the genetic divergence among these groups occurred largely within the last 50 thousand years. It is predicted that much of this genetic diversity occurred during the time that these populations were migrating out of Africa (estimated around 50,000 years ago).

Figure 2d-f: Figures 2d, 2e, and 2f display the effective population size of the populations shown in figures 2a, b, and c respectively. Effective population size is an estimate of the number of individuals in a population that are able to contribute to the next generation by producing offspring (Kliman, 2008). The effective population size is estimated using the pairwise sequentially Markovian coalescent (PSMC) model which estimates based on the number of individuals reproducing required to enable a particular genetic drift in a population over time. What we notice in each of these figures (d-f) is that the time in which the population sizes converge corresponds to the predicted time of population divergence from figures 2a-c. For example, in figure 2d, the population sizes appear to be the same around 200,000 years ago, which is the same time that figure 2a shows the populations beginning to diverge, suggesting that at this time the populations were the same size because they have converged to a single population.


Figure 3:

The main portion of figure 3 is a flow chart representing predicted genetic drift through most recent common ancestors. Arrows represent genetic drift from a MRCA, nodes represent populations (red -> ancient, green -> inferred ancestral, blue -> present day). The figure, which was generated using genetic sequence data from the 300 sequenced individuals, suggests a similar migration pattern to that already predicted as a migration of individuals out of Africa into Europe and then through Asia to Australia. This tree also shows the effect of admixture events between the migrating groups and archaic humans like Neanderthals and Denisovans. The inset in the figure is calculating the likelihood that this graph is accurate after providing the assumption of a dispersal event at different times before the migration out of Africa. The 3 tested assumptions are a dispersal event 10 thousand, 20 thousand, and 30 thousand years ago (drift 0.01, 0.02, 0.03 respectively). We note that the likelihood of this flowchart being accurate is reduced drastically the sooner the dispersal event occurred, suggesting that most of the genetic diversity found among the present-day populations detailed in this flow chart have a negligible effect from the dispersal out of Africa and is more likely the result of earlier genetic drift among African populations.

Conclusions/ My Opinion:

A major conclusion from this paper comes from figure 3. The figure suggests that genetic diversity found in present day populations is much less the result of the migration out of Africa and much more likely the result of genetic diversity that existed before the dispersal out of Africa occurred. The group used this data in addition to cross-population allele frequency differentiation to shed light on the possibility of the existence of a few recent mutations (past 50,000 years) that contribute largely to differences in modern human behavior. After extensive sweeping using multiple approaches, researchers found no significant genetic differences that could explain behavioral variation. This suggests that among modern humans there is not a population that is genetically superior, or that has evolved in a way to differentiate itself from others in regard to behavior. While we note that many physical changes have ensued with natural selection, this does not extend to significant changes in neurological genes.

I found this paper to be generally well-written and the data to be well-supported. I thought it was very effective to use a number of different methods in order to support their findings (e.g. figures 2a-c supported by figures 2d-f) and I thought that the findings were very interesting with a number of implications as mentioned in the previous paragraph. However, I found that on a number of occasions, the authors failed to explain key terms and define key portions of figures, leaving it up to the reader to seek out definitions and methods from outside sources. A few examples include the failure to explain cross-coalescence rate and effective population size calculation methods (figure 2) and drift 0.01, 0.02 and 0.03 in the inset in figure 3. The key in figure 3 is titled "Present-day populations have negligible ancestry from an early dispersal of modern humans out of Africa" and I found it difficult to draw that conclusion given the data provided in the figure, particularly without a clear definition of the meaning of the drift values in the inset.



References:

Mallick, Swapan et al. “The Simons Genome Diversity Project: 300 Genomes from 142 Diverse Populations.” Nature 538.7624 (2016): 201–206. www.nature.com. Web.


Return to Nick Balanda's Home Page

Genomics Home

Email Questions or Comments: nibalanda@davidson.edu