Those who have followed this blog or participated in this project would been used to seeing the below linked PCA plot of European variation on this blog from Chromopainter-Finestructure with the charactaristic ">" or "<" shape where the more south-western European populations usually cluster close to the root while the Central and East European populations form on one branch while Scandianvians, Finns and Saami form on the other. Like the one below
Probably many have wondered why it has this shape and how old it is. I have now done more analysis with my ancient La Braña dataset and done a unlinked MDS plot analysis (more analysis under way) in PLINK. It has been PLINK pruned down to 69k using all participants and reference populations. As we can see there *absolutly* no doubt where the ancient La Braña cluster with at the two main dimensions.
So this probably means that the ancient La Braña Iberians had a major genetic variation that resembled Finns and Saamis today found only in Fennoscandia, variation that largely have disappeared further south and west, however as the previous post indicate there are probably components that further separate the La Braña from the Finns towards the Saamis.
All grandparents born in Norway, Sweden or Finland? Email your 23andme/FamilyFinder/deCODEme genotype file zipped to tjaaehkere at yahoo.no. NEW! Estonians, Lithuanians, Latvians, Germans, Nederlands, Belgians, Luxenbourg, Austrians, Checkz, Slovaks, Balkans, Danes, Poles, Icelanders and Russians also welcome! ANALYSIS STATUS: FILE PREPERATIONS!
Totalt antall sidevisninger
torsdag 23. mai 2013
tirsdag 21. mai 2013
La Braña and the Saamis
The La Braña dataset of 183k SNPs (that is matching 1000G dataset) and my current standard dataset of 289k SNPs only overlap each other with 4k SNP's. This is a very small number of SNPs to work with. I instead tried to see if there is another way around this problem that make it possible to do analysis using all the 183k SNP's genotyped for the La Braña. The fact that 183k of the La Braña SNP's and 288k of my current standard dataset SNP's overlap with the 1000 genome project appears to give the solution through SNP haplotype imputation widely used in genetic research combining datasets genotyped with different platforms using a common reference panel.
The imputation was done in BEAGLE and for the record I have attempted to see if a non-imputed dataset was different from a imputed dataset as most SNPs in this analysis are imputed and I have noticed a certain "drift" making the imputed individuals more similar to each other than to non-imputed individuals like 1000G individuals. I have therefore excluded all non-imputed individuals except the La Braña individuals assumming that if same minor error affect all individuals the analysis will not be so very wrong. All the genotypes used for the La Braña is actually observed genotypes (no imputation).
As we can see below in the Chromopainter-Finestructure using a selected "world" panel seem to have structuring that make sense both at the World and European level suggesting the imputation have worked well and can be used for further analysis.
We here see again that the La Braña individual seperates strongly from the rest of the modern Europeans as in the earlier post using non-imputed haplotypes. The reason for this appears in world view to be much stronger African ancestry than the rest of the Europeans. The East-Asian, Siberian and Native American affiliation appears to be similar to todays Finns.
The question then further of course what population is the closest to the present day populations. I first run a simple IBS (identical by state) clustering in PLINK and got these distances where in diploid mode sees Lithuanians and Finns on the top while in haploid mode see total domination of Lithuanians.
However from own experience these direct IBS comparisment cant be thrusted fully just by direct comparison as many factors may affect the similarity. I therefore made a new Chromopainter-Finestructure run using only European populations to see if its more information in the data.
In Dimension 1 and 3 (Y-axis) we see that the La Braña are closest to Finns and to some extent also the Vologda Russians. In dimension 3 we also see that the Basque and the Lithuanians shows opposite variation where the Saamis appears intermediate between the two.
The imputation was done in BEAGLE and for the record I have attempted to see if a non-imputed dataset was different from a imputed dataset as most SNPs in this analysis are imputed and I have noticed a certain "drift" making the imputed individuals more similar to each other than to non-imputed individuals like 1000G individuals. I have therefore excluded all non-imputed individuals except the La Braña individuals assumming that if same minor error affect all individuals the analysis will not be so very wrong. All the genotypes used for the La Braña is actually observed genotypes (no imputation).
As we can see below in the Chromopainter-Finestructure using a selected "world" panel seem to have structuring that make sense both at the World and European level suggesting the imputation have worked well and can be used for further analysis.
CC "World" 183k linked PCA
CC "World" 183k heatmap
We here see again that the La Braña individual seperates strongly from the rest of the modern Europeans as in the earlier post using non-imputed haplotypes. The reason for this appears in world view to be much stronger African ancestry than the rest of the Europeans. The East-Asian, Siberian and Native American affiliation appears to be similar to todays Finns.
The question then further of course what population is the closest to the present day populations. I first run a simple IBS (identical by state) clustering in PLINK and got these distances where in diploid mode sees Lithuanians and Finns on the top while in haploid mode see total domination of Lithuanians.
However from own experience these direct IBS comparisment cant be thrusted fully just by direct comparison as many factors may affect the similarity. I therefore made a new Chromopainter-Finestructure run using only European populations to see if its more information in the data.
CC "Europe" 183k heatmap
As we can see from the heatmap there appear to be affiliation with the Lithuanians, Finns and Basque but more distant to the Saamis, however the large asymetry between CC received and CC donated to other populations gives the earliest branching from the rest of the European panel probably to some extrent because of the minory African admixture detected earlier.
However in the Principal Component analysis or PCA we can disect the different influences seen in the ChunkCount or CC data. We see here first in Dimension 1 or the X-asis that shows the largest variance the La Braña individual on the far left and the Lithuanians on the far right. On Dimension 1 there is no doubt that the Saamis appears closest to the La Brãna and on Dimension 2 that shows the second largest variance is to the Lithuanians. Note that here the Saamis and the Basques form opposites suggesting a North-East vs South-West Component.
However in the Principal Component analysis or PCA we can disect the different influences seen in the ChunkCount or CC data. We see here first in Dimension 1 or the X-asis that shows the largest variance the La Braña individual on the far left and the Lithuanians on the far right. On Dimension 1 there is no doubt that the Saamis appears closest to the La Brãna and on Dimension 2 that shows the second largest variance is to the Lithuanians. Note that here the Saamis and the Basques form opposites suggesting a North-East vs South-West Component.
CC PCA Euro 183k D1-D2
In Dimension 1 and 3 (Y-axis) we see that the La Braña are closest to Finns and to some extent also the Vologda Russians. In dimension 3 we also see that the Basque and the Lithuanians shows opposite variation where the Saamis appears intermediate between the two.
CC PCA Euro 183k D1-D3
The La Braña appears to have a major component that to the strongest extent are found in modern populations among the Saamis but appear rather absent in the rest of the Europeans. The Saamis also appear to have a second component that appears to follow a North-South gradiants also found in larger amounts among the Finns but the La Braña appears here to be much further "South" at the lower end of the Lithuanians in the direction of the Basque suggesting that the La Braña have considerable southern ancestry. Both the components in dimension 1 and 2 are found largest in the northernmost population in Europe the Saami suggesting that there is a second "Northern" component found here not found in the ancient La Braña individuals. The third component seems more diffcult to explain. Here Finns and Vologda Russians appears to show closest variation with the La Braña closer to the Lithuanians while the Saamis appears to pull toward the Basque.
Summit: La Brãna shows a "northern" variation only seen to large extent among the Saamis and at the same time show affiliations toward more southern populations like Lithuanians and Basque. This probably suggest that they had both considerable "Northern" and "Southern" ancestry.
These finding appears at least partly to be in agreement with Vadim Verenich earrlier finding of connection with Saamis and Mesolittic hunter-gatherers:
onsdag 15. mai 2013
La Braña individuals and the 1000G European populations
I have finally managed to extract 183k SNPs from the La Braña individuals that matched the 1000 genome (via the dbSNP hg18 database) project SNP's. I also believe I have managed to large extent follow the quality filter procedures used by the authors of the original research paper. The La Braña genotypes is as also seen in the earlier ancient Gotlander analaysis "haploid" meaning that we only have only have half of the actual diploid genotypes as shown in the example below:
Example haplotype 1 and 2 vertically::
AC -> C
GC -> G
TT -> T
This means that we cant phase the haplotypes, however the La Braña individuals do appear to be very similar so there is reason to believe that they have large runs of similar haplotype segments or ROH meaning we to some extent do have haplotypes.
Knowing that I dont really have the La Braña haplotypes I still runned the La Braña in the Chromopainter-Finestructure pipeline vs the 1000 genome project individuals. I got the following result for ChunkCounts. Note here that all other individuals have been converted from diploid to haploid to allow proper comparison.
The result seems first to indicate minority African like influence (first left column) for this composite individual, further they appear to have somewhat similar minority East-Asian like influence (last right column) as Finns. The coloring vs European seem to indicate that Finns are closest (lower right blue box) and second Iberians (to up left of Finns) and third to Brits (upper left blue box). The relationship to the Tuscans (down left of the Brits) appears to be the most distant of the 1000 genome European populations.
The worldwide PCA seem to indicate the same as seen above. The La Braña individual here seen as the green dot a little outside the European cluster pulling toward the Africans to the left and upwards toward the East-Asians. The plot is similar to the one seen in the original research paper but in the paper the Finns (red) where more contracted and the La Braña alone in the space towards the East-Asians.
The ChunkLenght heatmap and PCA appears not to provide any useful information ( weird clustering on the trees and PCA) because of very low linkage c=0.05 between the markers used.
I also run Chromopainter-Finestructure unlinked model to compare if the linked model managed to capture any linkage. As shown below we see much the same here meaning the genotype data practically are unlinked.
I also at last did a quick and dirty unsupervised and supervised ADMIXTURE run for this dataset. In the former it appears like there is no minority admixture however its detected some minority admixture in some of the Finns. In the latter supervised run estimating only the ancestry in the La Braña and all others ancestry as given, the La Braña cluster with Finns as also seen in the Chromopainter-Finestructure analysis..
Example haplotype 1 and 2 vertically::
AC -> C
GC -> G
TT -> T
This means that we cant phase the haplotypes, however the La Braña individuals do appear to be very similar so there is reason to believe that they have large runs of similar haplotype segments or ROH meaning we to some extent do have haplotypes.
Knowing that I dont really have the La Braña haplotypes I still runned the La Braña in the Chromopainter-Finestructure pipeline vs the 1000 genome project individuals. I got the following result for ChunkCounts. Note here that all other individuals have been converted from diploid to haploid to allow proper comparison.
The result seems first to indicate minority African like influence (first left column) for this composite individual, further they appear to have somewhat similar minority East-Asian like influence (last right column) as Finns. The coloring vs European seem to indicate that Finns are closest (lower right blue box) and second Iberians (to up left of Finns) and third to Brits (upper left blue box). The relationship to the Tuscans (down left of the Brits) appears to be the most distant of the 1000 genome European populations.
The worldwide PCA seem to indicate the same as seen above. The La Braña individual here seen as the green dot a little outside the European cluster pulling toward the Africans to the left and upwards toward the East-Asians. The plot is similar to the one seen in the original research paper but in the paper the Finns (red) where more contracted and the La Braña alone in the space towards the East-Asians.
The ChunkLenght heatmap and PCA appears not to provide any useful information ( weird clustering on the trees and PCA) because of very low linkage c=0.05 between the markers used.
I also run Chromopainter-Finestructure unlinked model to compare if the linked model managed to capture any linkage. As shown below we see much the same here meaning the genotype data practically are unlinked.
I also at last did a quick and dirty unsupervised and supervised ADMIXTURE run for this dataset. In the former it appears like there is no minority admixture however its detected some minority admixture in some of the Finns. In the latter supervised run estimating only the ancestry in the La Braña and all others ancestry as given, the La Braña cluster with Finns as also seen in the Chromopainter-Finestructure analysis..
ADMIXTURE unsupervised K=3
ADMIXTURE supervised assuming all others ancestry as given (K=6).
ADMIXTURE supervised assuming British, East-Asians and Africans as given (K=3).
Abonner på:
Innlegg (Atom)