Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude

This web page was produced as an assignment for an undergraduate course at Davidson College.

The Tibetan Plateau during the winter season. Image taken from China Daily. Permission Pending


This webpage summarizes the article "Sequencing of 50 Human Exomes Reveals Adapttaion to High Altitude" authored by Yi et al.and explains the figures presented within the article.


Preliminary Thoughts

I found this article to be rather interesting because it analyzes an important trait of adaptation; altitude. I found the topic to be very applicable to my experiences because I travel to Quito, Ecuador (which ranges from 2850 to 4200m above sea level) every winter break with the Timmy Global Health organization and I often experience high- altitude sickness. I agree with the fact that these individuals that live in high altitude have evolutionarily developed adaptive alleles, which allow them to normally function in the presence of decreased oxygen. I believe that the data provides reliable conclusions because the methodologies used were highly in control of potential confounding variables, such as genetic variation. Additionally, the data provided a wide range converging evidence which aids in solidifying the accuracy of conclusions made.

Article Summary

Because of the expansion of humans in different areas, there have been many cultural and biological adaptations that take place; the inhabitants of the Tibetan Plateau is one of them. They live over 4000m above sea level and have 40% less oxygen. This study highlights an instance of this kind of selective genetic evolution that humans use for adaptive purposes.


Fifty unrelated inhabitants from two villages in the Tibet Autonomous Region of China had their exomes sequenced (to a mean depth of 18X). A different sample of 40 Han individuals from Beijing had their genomes sequenced. The altitude of the Beijing region is a little less than 50m above sea level, compared to the over 4000m of the Tibet sample. The Han and the Tibetan villages were ideal to use for comparison because the genetic differences between the group were relatively low making whatever results found more reliable, based on the fact that significantly genetic variability would not be a confounding factor.


Preliminary test demonstrated (reference Figure 1.) that:

    1. The allelic frequencies of the two populations were related due to a strong, tightly fitted correlation; however, the EPAS1 gene had a noticeably greater allelic frequency in the Tibetan sample than in the Han sample.
    2. Tibetans and Hans were estimated to have diverged a little over 2750 years ago.

However, according to the article, “Genes with strong frequency differences between populations are potential targets of natural selection”(Yi et al., 2010). Therefore, to detect whether it was the Han or Tibet populations that was affected by selection, a more distant population sample of 200 Danish people had their examples sequenced and analyzed. By having this data, genomicists could also be able to determine the change in frequency of the Tibetan sample, since their divergence from the Hans. In other words, by having a more distant outside group, one would be able to have a comparison between the changes in frequency to determine whether natural selection was really taking place.
By using population branch statistics (PBS), which can show genetic effects of natural selection based on the reading of their signals, it was demonstrated that:

    1. Thirty-four genes in the collected data had significantly greater PBS values and interestingly these alleles were responsible for a “response to hypoxia” (which is the lack of oxygen to regions of the body).
    2. The strongest indication of natural selection was found in the endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1) gene, even when compared with one million stimulations derived from an estimated population model. The frequency of this allele in the Han sample was 9%, whereas in the Tibetan sample it was 87%. This allele is categorized as a hypoxia- inducible factor that can be expressed in adult and fetal lungs, placentas, and vascular endothelial cells. Percy et al. (2008) also suggested that there was a link between the presence of the EPAS1 gen and the regulation of red-blood cell production.



The EPAS1 allele at a high frequency was shown to be associated with low erythrocyte and hemoglobin levels, which were the opposite of what would consist adaptation to hypoxia. This lead genomicists to conclude that carriers of the “Tibetan” allele have the ability to maintain the right amount of oxygenated tissue at high altitude. An interesting factor to note is that there was an effect of influential genetic signaling, in which there were stronger changes in frequency near flanking genes. This phenomenon would be expected if there adaptive mutations were targeting candidate gene regions and they were because many of the other alleles found in the study were near candidate genes. Another study using a sample of individuals from an Andean community analyzed the adaptive nature to high altitude as well. The study also confirmed the assumption that these populations must have experienced a significantly large evolutionary divergence to adapt to high altitude. Because the Hans and the Tibetans diverged only about 2750 years ago, it’s suggested that the rate of frequency change must have occurred extremely fast; in fact faster than the lactase allele in Europe, making it potentially the fastest rate of frequency change in the history of humankind.


Breaking Down The Figures and Tables

This graph shows the frequencies of SNPs between the Han population (on the y-axis) and the Tibetan population (on the x-axis). The frequencies are correlated with each other, since they are rather tightly fit about the diagonal line. The color code system is based on a logarithmic scale that ranges from a (red) to 10000 (dark pink). This indicates the number of SNPs found in various regions of exomes sequenced. The remarkable factor to note is the outlier to the bottom right of the graph. There is significant difference between the frequencies of SNPs between the two populations; the gene indicated by this outlier is the EPAS1 gene.



This table shows the top 30 genes that have the highest PBS values found within the Tibetan sample. The PBS values indicate measure of divergence between population samples and the presence of positive selectivity. The higher the value, the more the gene is indicative of being a product of natural selection. As expected, the EPAS1 gene is at the top of the list confirming its importance in the oxygen-related adaptation to high altitude

The graph in panel A shows the fixation index population branch statistic (FST) for the Tibetan population on the y-axis. The FST is calculated by dividing the average of pairwise difference within the population by the number of pairwise difference between the populations, then subtracting that total by 1. This number indicates the level of relatedness of populations with 0 being a full interbreeding and 1 being absolutely distinct. On the x-axis marks the number of SNPs (single nucleotide polymorphisms) found in each gene. By plotting this graph in such a manner, we able to see which gene were more distinct between the population samples in comparison to how many SNPs are found in the gene. The greatest outlier is the EPAS1 gene, which has a high PBS statistic, which indicates that it varies greatly from the other population samples. The DISC1 gene is the next prominent outlier, followed by the FANCA gene. The outliers are indicated in “red”.

The trees in panel B demonstrate the FST branch lengths between the populations. The small tree on the left shows the branching in relation to the genomic average. The Tibet (T) and Han (H) branches are similar in length because their genomes are similar to each other; however the branch length of the Danish (D) sample is longer because their genome is significantly more different from the other two population samples. This indicates that the Danish and Tibet genomes are more evolutionarily distant. The larger tree on the right shows the branching in relation to the EPAS1 gene. One can see that the Tibetan branch length is much longer than the other populations’. This indicates that this gene had a very high signal of selection, meaning that gene adaptation was a product of natural selection; in this case for surviving at high altitudes.


Final Thoughts

I find this data to be remarkable because it shows the powerful influence genetics play in the development of humans and their ability to adjust and adapt to environmental changes and other factors over time. Equally striking is the speed in which this process can occur. It took many years for modern humans to evolve and for some adaptive traits, such as lactase persistent allele that rose in frequency in Europe that took around 7500 years. However, evidence has shown that this EPAS1 allele took around 2750 years. Natural selection has once again shown that it is a powerful tool in the survival of mankind and other biotic organism.


(2007). Largest glacier group on tibetan plateau uncovered . (2007). [Web Photo]. Retrieved from

M. J. Percy et al., N. Engl. J. Med. 358, 162 (2008).

Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z. X. P., Pool, J. E., . . . Wang, J. (2010). Sequencing of 50 human exomes reveals adaptation to high altitude. Science, 329(5987), 75-78.






Return to Home Page


Genomics Page
Biology Home Page

Email Questions or Comments.

Copyright 2012 Department of Biology, Davidson College, Davidson, NC 28035