Linkage disequilibrium is a somewhat rudimentary, but effective way to measure of the evolution of a population. It is, essentially, a quantification of the association between alleles (which, in this case, includes single nucleotide variants) as they exist in a population. In a fixed, non-evolving population, there should, theoretically be no association, and all sites should assimilate randomly into the gamete pool. However, if there is selective pressure on a population, one, or many sites, may confer an advantage. These would appear more frequently and a statistical “dependence” would observed. By observing this link, one can assume, apriori, that the population is evolving- that some selective pressure is acting on the population to elicit non-random assorting of alleles.
One factor that may pose such a selection in human populations is disease; which is why there is so much variance at the HLA locus. Malaria is one such disease that has lead to an increased rates of sickle cell disease in African populations. What malaria (or, really Plasmodium falciparum) does is infect red blood cells where it multiplies until it ruptures. As a result, African populations developed mutations that in their hemaglobin that prevent malaria from infecting RBC’s, but cause sickle cell disease. I have a theory that an African population will have more linkage disequilibrium among SNP’s in the hemoglobin beta gene than, say, a European population.
In this markdown, I will be walking through the progress of mapping linkage disequilibrium using an open data source, Ensembl (1000 genomes) and R.
I got this data from ensemble using their data slicer app. The gene of interest, HBB, is located on chromosome 6 at 29909037-29913639. In this study, I will use Kenya for the African population as it is the population with the highest malaria infection rate that is included in Ensembl and Finland for the comparative population because it seems pretty far from Kenya. After downloading a variant call format file, I used vcftools to convert the files to a “0-1-2” matrix. This software vectorizes each site- instead of “A/A”, or “A/C”, it’s 0, or 1, or 2.
From here, I can use R to map the disequilibrium.
Mapping Linkage Disequilibrium
Loading some packages that will help with the analysis:
library(snpStats) library(Matrix) library(LDheatmap)
Afterwards, convert to a SNPMatrix that the LDHeatmap can read:
Dat.Fin = read.table("Data/FIN/HLAA_FIN_Matrix.012") # Read Matrix as table Dat.Fin = Dat.Fin[,-1] # Remove Index Column Dat.Fin = as.matrix(Dat.Fin) #Convert table back to Matrix Dat.Fin = as(Dat.Fin,"SnpMatrix") #Convert Matrix to SNPMatrix # Repeat for Kenyan Population Dat.Kenya = read.table("Data/LWK/HLAA_LWK_Matrix.012") Dat.Kenya = Dat.Kenya[,-1] Dat.Kenya = as.matrix(Dat.Kenya) Dat.Kenya = as(Dat.Kenya,"SnpMatrix")
The vcftools software also outputs position information in a different file, which I will use as a reference for genetic distance (Should be the same sites for both populations, so I just have to do this once. You can use this to label the SNP’s, but it would look like clutter, I think).
labels = readLines("Data/FIN/HLAA_FIN_Matrix.012.pos") #Read position information labels = substr(labels, start=3, stop=11) #Parse out ASCII labels = strtoi(labels) # Convert to integer
Now Generate the heatmaps:
color_spectrum = colorRampPalette(c("Red", "Yellow")) #Creat a color spectrum LDheatmap(Dat.Fin, genetic.distances = labels, color = color_spectrum(5), title = "Finnish HBB LD") LDheatmap(Dat.Kenya, genetic.distances = labels, color = color_spectrum(5), title = "Kenyan HBB LD")
Overall, I see less difference than I thought I would. There is more low/moderately linked sites (yellow) in the Kenyan population, but high association sites (red) are about the same. Maybe the hemoglobin gene is so well conserved that the sickle cell disease is the result of a single point mutation…? Perhaps you can try this out with some other loci and populations and see if you find anything interesting.
- Karlsson, Elinor K., Dominic P. Kwiatkowski, and Pardis C. Sabeti. “Natural Selection and Infectious Disease in Human Populations.” Nature reviews. Genetics 15.6 (2014): 379–393.
- Dendrou C. “HLA variation and diease.” Nature reviews. Immunology (2018).
- Grosse SD, Odame I, Atrash HK, Amendah DD, Piel FB, Williams TN. Sickle Cell Disease in Africa: A Neglected Cause of Early Childhood Mortality. American Journal of Preventive Medicine. 2011;41(6):S398-S405. doi:10.1016/j.amepre.2011.09.013.
- Mohandas N, An X. Malaria and Human Red Blood Cells. Medical microbiology and immunology. 201(4):593-598 2012.
- Gouagna L, et al. “Genetic variation in human HBB is associated with Plasmodium Falciparum transmission.” Nature Genetics. 42, 328-331 2010.
- Ensembl 2017. Nucleic Acids Research 45 Database issue:D635-D642 2017.
- Center for Disease Control and Prevention. “Malaria Maps.” Malaria and Travelers.
- Petr Danecek, 1000 Genomes Project Analysis Group, et al. Bioinformatics, 2011