onsdag 15. mai 2013

La Braña individuals and the 1000G European populations

I have finally managed to extract 183k SNPs from the La Braña individuals that matched the 1000 genome (via the dbSNP hg18 database) project SNP's. I also believe I have managed to large extent follow the quality filter procedures used by the authors of the original research paper. The  La Braña genotypes is as also seen in the earlier ancient Gotlander analaysis "haploid" meaning that we only have only have half of the actual diploid genotypes as shown in the example below:

Example haplotype 1 and 2 vertically::

AC -> C
GC -> G
TT -> T

This means that we cant phase the haplotypes, however the La Braña individuals do appear to be very similar so there is reason to believe that they have large runs of similar haplotype segments or ROH meaning we to some extent do have haplotypes.

Knowing that I dont really have the La Braña  haplotypes I still runned the La Braña in the Chromopainter-Finestructure pipeline vs the 1000 genome project individuals. I got the following result for ChunkCounts. Note here that all other individuals have been converted from diploid to haploid to allow proper comparison.

The result seems first to indicate minority African like influence (first left column) for this composite individual, further they appear to have somewhat similar minority East-Asian like influence (last right column) as Finns. The coloring vs European seem to indicate that Finns are closest (lower right blue box) and second Iberians (to up left of Finns) and third to Brits (upper left blue box). The relationship to the Tuscans (down left of the Brits) appears to be the most distant of the 1000 genome European populations.

The worldwide PCA seem to indicate the same as seen above. The La Braña individual here seen as the green dot a little outside the European cluster pulling toward the Africans to the left and upwards toward the East-Asians. The plot is similar to the one seen in the original research paper but in the paper the Finns (red) where more contracted and the La Braña alone in the space towards the East-Asians.

The ChunkLenght heatmap and PCA appears not to provide any useful information ( weird clustering on the trees and PCA) because of very low linkage c=0.05 between the markers used.

I also run Chromopainter-Finestructure unlinked model to compare if the linked model managed to capture any linkage. As shown below we see much the same here meaning the genotype data practically are unlinked.

I also at last did a quick and dirty unsupervised and supervised ADMIXTURE run for this dataset. In the former it appears like there is no minority admixture however its detected some minority admixture in some of the Finns. In the latter supervised run estimating only the ancestry in the La Braña and all others ancestry as given, the La Braña cluster with Finns as also seen in the Chromopainter-Finestructure analysis..

ADMIXTURE unsupervised K=3

ADMIXTURE supervised assuming all others ancestry as given (K=6). 

 ADMIXTURE supervised assuming British, East-Asians and Africans as given (K=3). 

