Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.[1]


Population genomics has been of interest to scientists since Darwin. Some of the first methods used for studying genetic variability at multiple loci included gel electrophoresis and restriction enzyme mapping.[2] Previously genomics was restricted to only the study of a low amount of loci. However recent advancements in sequencing and computer storage and power have allowed for the study of hundreds of thousands of loci from populations.[3] Analysis of this data requires identification of non-neutral or outlier loci that indicate selection in that region of the genome. This will allow the researcher to remove these loci to study genome wide effects or to focus on these loci if they are of interest.

Research applications

In the study of Schizosaccharomyces pombe (more commonly known as fission yeast), a popular model organism, population genomics has been used to understand the reason for the phenotypic variation within a species. However, since the genetic variation within this species was previously poorly understood due to technological restrictions, population genomics allows us to learn about the species' genetic differences.[4] In the human population, population genomics has been used to study the genetic change since humans began to migrate away from Africa approximately 50,000-100,000 years ago. It has been shown that not only were genes related to fertility and reproduction highly selected for, but also that the further humans moved away from Africa, the greater the presence of lactase.[5]

A 2007 study done by Begun et al. compared the whole genome sequence of multiple lines of Drosophila simulans to the assembly of D. melanogaster and D. yakuba. This was done by aligning DNA from whole genome shotgun sequences of D. simulans to a standard reference sequence before carrying out whole genome analysis of polymorphism and divergence. This revealed a large number of proteins that had experienced directional selection. They discovered previously unknown, large scale fluctuations in both polymorphism and divergence along chromosome arms. They found that the X chromosome had faster divergence and significantly less polymorphism than previously expected. They also found regions of the genome (e.g. UTRs) that signaled adaptive evolution.[6]

In 2014 Jacquot et al. studied the diversification and epidemiology of endemic bacterial pathogens by using the Borrelia burgdorferi species complex (the bacteria responsible for Lyme disease) as a model. They also wished to compare the genetic structure between B. burgdorferi and the closely related species B. garinii and B. afzelii. They began by sequencing samples from a culture and then mapping the raw read onto reference sequences. SNP based and phylogenetic analyses were used on both intraspecific and interspecific levels. When looking at the degree of genetic isolation, they found that intraspecific recombination rate was ~50 times higher than the interspecific rate. They also found that by using most of the genome conspecific strains didn’t cluster in clades, raising questions about previous strategies used when investigating pathogen epidemiology.[7]

Moore et al conducted a study in 2014 in which a group of Atlantic Salmon populations which were previously analyzed with traditional population genetic analyses (microsatellites, SNP-array genotyping, BayeScan (which uses the Dirichlet-multinomial distribution)) to place them into defined conservational units. This genomic assessment mostly agreed with previous results, but did identify more differences between regionally and genetically discrete groups, suggesting there were potentially even greater number of conservation units of salmon in those regions. These results verified the usefulness of genome-wide analysis in order to improve the accuracy of future designation of conservation units.[8]

In highly migratory marine species, traditional population genetic analyses often fail to identify population structure. In tunas, traditional markers such as short-range PCR products, microsatellites and SNP-arrays have struggled to distinguish fish stocks from separate ocean basins. However, population genomic research using RAD sequencing in yellowfin tuna[9][10] and albacore[11][12] has been able to distinguish populations from different ocean basins and reveal fine-scale population structure. These studies identify putatively adaptive loci that reveal strong population structure, even though these sites represent a relatively small proportion of the overall DNA sequence data. In contrast, the majority of sequenced loci that are presumed to be selectively neutral do not reveal patterns of population differentiation, matching results for traditional DNA markers.[9][10][11][12] The same pattern of putatively adaptive loci and RAD sequencing revealing population structure, compared to limited insight provided by traditional DNA markers is also observed for other marine fishes, including striped marlin[13] and lingcod.[14]

Mathematical models

Understanding and analyzing the vast data that comes from population genomics studies requires various mathematical models. One method of analyzing this vast data is through QTL mapping. QTL mapping has been used to help find the genes that are responsible for adaptive phenotypes.[15] To quantify the genetic diversity within a population a value known as the fixation index, or FST is used. When used with Tajima's D, FST has been used to show how selection acts upon a population.[16] The McDonald-Kreitman test (or MK test) is also favored when looking for selection because it is not as sensitive to changes in a species' demography that would throw off other selection tests.[17]

Future developments

Most developments within population genomics have to do with increases in the sequencing technology. For example, restriction-site associated DNA sequencing, or RADSeq is a relatively new technology that sequences at a lower complexity and delivers higher resolution at a reasonable cost.[18] High-throughput sequencing technologies are also a rapidly growing field that allows for more information to be gathered on genomic divergence during speciation.[19] High-throughput sequencing is also very useful for SNP detection, which plays a key role in personalized medicine.[20] Another relatively new approach is reduced-representation library (RRL) sequencing which discovers and genotypes SNPs and also doesn't require reference genomes.[21]

See also


  1. ^ Luikart, G.; England, P. R.; Tallmon, D.; Jordan S.; Taberlet P. (2003). "The Power and Promise of Population Genomics: From Genotyping to Genome Typing". Nature Reviews (4): 981-994
  2. ^ Charlesworth, B. (2011). "Molecular population genomics: A short history" (PDF). Genetics Research. 92 (5–6): 397–411. doi:10.1017/S0016672310000522. PMID 21429271.
  3. ^ Schilling, M. P.; Wolf, P. G.; Duffy, A. M.; Rai, H. S.; Rowe, C. A.; Richardson, B. A.; Mock, K. E. (2014). "Genotyping-by-Sequencing for Populus Population Genomics: An Assessment of Genome Sampling Patterns and Filtering Approaches". PLOS ONE. 9 (4): e95292. Bibcode:2014PLoSO...995292S. doi:10.1371/journal.pone.0095292. PMC 3991623. PMID 24748384.
  4. ^ Fawcett, J. A.; Iida, T.; Takuno, S.; Sugino, R. P.; Kado, T.; Kugou, K.; Mura, S.; Kobayashi, T.; Ohta, K.; Nakayama, J. I.; Innan, H. (2014). "Population Genomics of the Fission Yeast Schizosaccharomyces pombe". PLOS ONE. 9 (8): e104241. Bibcode:2014PLoSO...9j4241F. doi:10.1371/journal.pone.0104241. PMC 4128662. PMID 25111393.
  5. ^ Lachance, J.; Tishkoff, S. A. (2013). "Population Genomics of Human Adaptation". Annual Review of Ecology, Evolution, and Systematics. 44: 123–143. doi:10.1146/annurev-ecolsys-110512-135833. PMC 4221232. PMID 25383060.
  6. ^ Begun, D. J.; Holloway, A. K.; Stevens, K.; Hillier, L. W.; Poh, Y. P.; Hahn, M. W.; Nista, P. M.; Jones, C. D.; Kern, A. D.; Dewey, C. N.; Pachter, L.; Myers, E.; Langley, C. H. (2007). "Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans". PLOS Biology. 5 (11): e310. doi:10.1371/journal.pbio.0050310. PMC 2062478. PMID 17988176.
  7. ^ Jacquot, M.; Gonnet, M.; Ferquel, E.; Abrial, D.; Claude, A.; Gasqui, P.; Choumet, V. R.; Charras-Garrido, M.; Garnier, M.; Faure, B.; Sertour, N.; Dorr, N.; De Goër, J.; Vourc'h, G. L.; Bailly, X. (2014). "Comparative Population Genomics of the Borrelia burgdorferi Species Complex Reveals High Degree of Genetic Isolation among Species and Underscores Benefits and Constraints to Studying Intra-Specific Epidemiological Processes". PLOS ONE. 9 (4): e94384. Bibcode:2014PLoSO...994384J. doi:10.1371/journal.pone.0094384. PMC 3993988. PMID 24721934.
  8. ^ Moore, Jean-Sébastien; Bourret, Vincent; Dionne, Mélanie; Bradbury, Ian; O'Reilly, Patrick; Kent, Matthew; Chaput, Gérald; Bernatchez, Louis (December 2014). "Conservation genomics of anadromous Atlantic salmon across its North American range: outlier loci identify the same patterns of population structure as neutral loci". Molecular Ecology. 23 (23): 5680–5697. doi:10.1111/mec.12972. PMID 25327895. S2CID 12251497.
  9. ^ a b Grewe, P.M.; Feutry, P.; Hill, P.L.; Gunasekera, R.M.; Schaefer, K.M.; Itano, D.G.; Fuller, D.W.; Foster, S.D.; Davies, C.R. (2015). "Evidence of discrete yellowfin tuna (Thunnus albacares) populations demands rethink of management for this globally important resource". Scientific Reports. 5: 16916. Bibcode:2015NatSR...516916G. doi:10.1038/srep16916. PMC 4655351. PMID 26593698.
  10. ^ a b Pecoraro, Carlo; Babbucci, Massimiliano; Franch, Rafaella; Rico, Ciro; Papetti, Chiara; Chassot, Emmanuel; Bodin, Nathalie; Cariani, Alessia; Bargelloni, Luca; Tinti, Fausto (2018). "The population genomics of yellowfin tuna (Thunnus albacares) at global geographic scale challenges current stock delineation". Scientific Reports. 8 (1): 13890. Bibcode:2018NatSR...813890P. doi:10.1038/s41598-018-32331-3. PMC 6141456. PMID 30224658.
  11. ^ a b Anderson, Giulia; Hampton, John; Smith, Neville; Rico, Ciro (2019). "Indications of strong adaptive population genetic structure in albacore tuna (Thunnus alalunga) in the southwest and central Pacific Ocean". Ecology and Evolution. 9 (18): 10354–10364. doi:10.1002/ece3.5554. PMC 6787800. PMID 31624554.
  12. ^ a b Vaux, Felix; Bohn, Sandra; Hyde, John R.; O'Malley, Kathleen G. (2021). "Adaptive markers distinguish North and South Pacific Albacore amid low population differentiation". Evolutionary Applications. 14 (5): 1343–1364. doi:10.1111/eva.13202. ISSN 1752-4571. PMC 8127716. PMID 34025772.
  13. ^ Mamoozadeh, Nadya R.; Graves, John E.; McDowell, Jan R. (2020). "Genome‐wide SNPs resolve spatiotemporal patterns of connectivity within striped marlin (Kajikia audax), a broadly distributed and highly migratory pelagic species". Evolutionary Applications. 13 (4): 677–698. doi:10.1111/eva.12892. PMC 7086058. PMID 32211060.
  14. ^ Longo, Gary C.; Lam, Laurel; Basnett, Bonnie; Samhouri, Jameal; Hamilton, Scott; Andrews, Kelly; Williams, Greg; Goetz, Giles; McClure, Michelle; Nichols, Krista M. (2020). "Strong population differentiation in lingcod (Ophiodon elongatus) is driven by a small portion of the genome". Evolutionary Applications. 13 (10): 2536–2554. doi:10.1111/eva.13037. PMC 7691466. PMID 33294007.
  15. ^ Stinchcombe, J. R.; Hoekstra, H. E. (2007). "Combining population genomics and quantitative genetics: Finding the genes underlying ecologically important traits". Heredity. 100 (2): 158–170. doi:10.1038/sj.hdy.6800937. PMID 17314923.
  16. ^ Hohenlohe, P. A.; Bassham, S.; Etter, P. D.; Stiffler, N.; Johnson, E. A.; Cresko, W. A. (2010). "Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags". PLOS Genetics. 6 (2): e1000862. doi:10.1371/journal.pgen.1000862. PMC 2829049. PMID 20195501.
  17. ^ Harpur, B. A.; Kent, C. F.; Molodtsova, D.; Lebon, J. M. D.; Alqarni, A. S.; Owayss, A. A.; Zayed, A. (2014). "Population genomics of the honey bee reveals strong signatures of positive selection on worker traits". Proceedings of the National Academy of Sciences. 111 (7): 2614–2619. Bibcode:2014PNAS..111.2614H. doi:10.1073/pnas.1315506111. PMC 3932857. PMID 24488971.
  18. ^ Davey, J. W.; Blaxter, M. L. (2011). "RADSeq: Next-generation population genetics". Briefings in Functional Genomics. 9 (5–6): 416–423. doi:10.1093/bfgp/elq031. PMC 3080771. PMID 21266344.
  19. ^ Ellegren, H. (2014). "Genome sequencing and population genomics in non-model organisms". Trends in Ecology & Evolution. 29 (1): 51–63. doi:10.1016/j.tree.2013.09.008. PMID 24139972.
  20. ^ You, N.; Murillo, G.; Su, X.; Zeng, X.; Xu, J.; Ning, K.; Zhang, S.; Zhu, J.; Cui, X. (2012). "SNP calling using genotype model selection on high-throughput sequencing data". Bioinformatics. 28 (5): 643–650. doi:10.1093/bioinformatics/bts001. PMC 3338331. PMID 22253293.
  21. ^ Greminger, M. P.; Stölting, K. N.; Nater, A.; Goossens, B.; Arora, N.; Bruggmann, R. M.; Patrignani, A.; Nussberger, B.; Sharma, R.; Kraus, R. H. S.; Ambu, L. N.; Singleton, I.; Chikhi, L.; Van Schaik, C. P.; Krützen, M. (2014). "Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms". BMC Genomics. 15: 16. doi:10.1186/1471-2164-15-16. PMC 3897891. PMID 24405840.