Totalt antall sidevisninger

tirsdag 21. februar 2012

Finestructure analysis Chr 1 to 22

I have now updated the analysis to cover all of the genome from chromosome 1 to 22. The resolution have increased but most of the structure remains the same.

There have been limited sub-structuring compared to the analysis reported by other genome blogger using somewhat similar population panel. It is possible that one reason for this is the inclusion of Saami individuals have stolen the "show" smoothing over smaller differences within other populations.

In general the heat-map and plots confirms the earlier findings from the MDS analysis plots using only genotype data but do provide more details about the MDS plot clustering. The data seem to be in accordance with earlier scientific research that the genetic landscape of Europe is usually of small genetic distances with the exception of Saamis and Finns.

The plots for the number of shared segments (chunkcounts) and the total lenght of these segments (chunklenght) and the shared mutations are mostly consistent with the MDS plots done earlier.

CHUNKCOUNT AGGREGATED


The map clearly shows Saamis, Finns, Lithuanians and Italians as populations with large number of shared segments. The higher number of shared segments within a population can be due to more recent populations events like population expansion, genetic drift, limited genepool due to foundereffect or the result of unique mutations within the population that results in higher matching within the population that outside.

The Saamis seperate into one group. The Finns seperate into two groups. The first group consisting exclusively of Finns is basically eastern Finns. The second group appears to be affiliated with the Botnia region consisting mostly of Finns and some individuals from Sweden with at least partly Finnish ancestry and with at least partly known Saami ancestry. Scandinavians cluster in one group.

CHUNKCOUNT AGGREGATED POPULATION


CHUNKCOUNT PAIRWISE

CHUNKCOUNT PCA


 The PCA plots is similar to the earlier MDS plots but here mirrored,

CHUNKLENGHT AGGREGATED




The map shows that Saami, Finns and Lithuanians have larger lengths of the shared segments than other European populations. Larger sharing of the genome usually means a more recent history but not always. The larger sharing can indicate events in more recent genetic history like population expansion, genetic drift or limited gene-pool since the founding of the populations due to isolation.

CHUNKLENGHT AGGREGATE POPULATIONS



 CHUNKLENGHT PAIRWISE


CHUNKLENGHT PCA


The PCA plot is in line with the earlier MDS based analysis having a very similar structure.

MUTATION MATRIX AGGREGATED


The mutation matrix shows a genetic landscape of largely closely related mutations with the exception of Saamis and Finns that separate at the highest level of the tree. Scandinavians appears in this heatmap similar to continental Europeans. Saamis and Finns appears to separate more clearly from each other in this plot than in the maps.

As I understand the program each individual haplotype chunk is compared to all the other haplotype chunks as potential donor to determine which haplotype is the closest neighbor of the haplotype in question. The program first find the closest neighbour donor haplotype. If this donor haplotype is identical to the recipient haplotype no mutations are counted and not registered in the mutation matrix. If the haplotype is not identical but have a mutational difference in the haplotype then this is counted as a mutation between the precipitant haplotype and the donor haplotype.

So the mutation matrix do not show identical haplotypes but the counts of SNP from related but not identical haplotypes..

MUTATION MATRIX AGGREGATED POPULATIONS


MUTATION MATRIX PAIRWISE


This is probably the most useful plot for the individual. It shows the degree of mutational relationship between individuals, in other words who are your closest related haplotype match in the cases you do not share identical haplotype segments. It also give an impression of the group relationships that appears to correlate with the distances seen between the groups in the MDS plots f.ex Italians are the most distant to the Vologda Russians.

The Saami and Finns do not appear to have any clear relationship with these groups except within themself and somewhat between themself. The two Finnish groups appears to have limited mutational sharing.

MUTATION MATRIX PCA
The mutational matrix PCA plot resembles the earlier described structure seen in the PCA plots for the ChunkCounts and for ChunkLenght and in the MDS plot analysis done earlier.

The Botnian Finns appears closer to Scandinavians and Vologda Russians than the eastern Finns and the Saamis.

DISCUSSION

SAAMI AND FINNS

The results for the Saami and Finns appears consistent for all maps and plots including the earlier MDS plot analysis. They appear to have a clear separate history from the rest of Europeans in the panel and mutations not seen in other populations appears to be at least part of the explanation for why they separate genetically from the rest of Europe. It is still possible that at least some of the unique mutations belong to populations further east not included here but this would be the goal for maybe the next analysis round. This may be the reason why two Italians appears very distinct from others in the mutation matrix but not in other plots.

There appears to be some inconsistency for Saami and Finns vs others in the maps. In the mutational matrix maps they appear to seperate from other Europeans at the highest branch, but at the ChunkCount and ChunkLenght heat maps they come under the same second highest branch as Vologda Russians, Lithuanians and Belorussians. This may indicate that there may have been some recent geneflow from these populations or as proxies for some other populations that have reached the Saami and Finns resulting in increased number of shared haplotypes and with larger lenghts. This recent geneflow of haplotypes do not appear to have been there so long that they have resultet in new mutations resulting in increased relationship in the mutational matrix maps.

There are also inconsistency between Finns and the Saami that could have the same explanation as above. There appears to be similar ChunkCount and ChunkLenght sharing between Saami, Finns and the mixed individuals but on the mutational matrix there appears to be sharper differentiation betweens Saami and Finns but with a somewhat higher mutational sharing between Saami and Finns compared to the above mentioned analysis maybe reflecting partly common older shared genetic history.

The Finns also appear to have divided into two clusters. The first cluster contains exclusive of Finns from eastern Finland while the other group containing Finns from the coastal areas also contains 3 individuals from Sweden with at least partly Finnish ancestry.

SCANDINAVIANS

The Scandinavians appears to affiliate much more with continental Europeans in all maps and plots than the Saami and Finns but do at the same time show distinctions (as they do in the earlier MDS plots) and they appear to relate to other continental groups consistently. The closest continental groups appears to be French, Italians, Hungarians and Romanians.

The mutational matrix indicate close relationship with continental europe but there is some distinction in mutation haplotyes indicate that there have gone some time since the seperation from continental Europe.

The more "normal" ChunkCounts and ChunkLenghts among Scandinavians seem to be consistent with the same also seen among continental-Europeans.

(Updated 28/5/2012)








mandag 13. februar 2012

Finestructure of Fennoscandia - Preliminary result Chr 1-6

INTRODUCTION

In our previous analysis on this blog we have been using autosomal genotypes and not autosomal haplotypes. A genotype is a collection of unordered genetic data. The raw data you see in your raw data file from 23andme or FamilyFinder is a collection of genotype data. The file is ordered according to the physical position on the chromosomes but the actual order of the SNP like A, C, G or T in the file is random or just alphabetical.

You do not receive your autosome chromosomes genotype in random or alpabetically. You actually receive all your autosomes from each of your parents as smaller or larger segments of haplotypes. A haplotype is basically your autosomal genotype split into two where usually each come from one of your parents. To be able to make your autosomal genotype into a autosomal haplotype the genotype must go trough a procedure called phasing where segments or blocks of haplotypes are reconstructed. These segments or blocks of haplotypes constists of a range of SNP's that is close to each other and because they are close to each other they are unlikely to recombine or split into two in each generation.

These haplotypes you probably share with many other fellow Fennoscandians and Europeans, however you likely share more haplotypes with people or populations belonging to the same ethnicity, close in geography or close in history based on known historical or prehistorical events. Also the segment size or length have a similar connection. You share longer segments with people or populations with the above mentioned reasons than with others. Also finally you likely share more mutations with these populations than others.

NEW SOFTWARE CHROMOPAINTER AND FINESTRUCTURE:

There is a new program called Chromopainter and Finestructure that exploits autosomal haplotype information in genetic analysis. I have now done a preliminary analysis using 1-6 chromosomes. The goal is to exploit all 22 chromosomes. The software generates new plots that may be new to many of you called heatmaps. These maps use colors to show the relationships between individuals and groups. Yellow beeing the most distant genetically while blue being the closest. Genetic similar people usually form groups based on their similarity. The program generates 3 basic heat maps called chunkcount map, chunklenght map and mutation map.

RESULTS: CHUNKCOUNTS


The chunkcount map shows the number of haplotypes shared between individuals including not identical but related haplotypes.  It can tell more about common ancestry maybe ancient between groups. In Fennoscandia the program structures the participants into 5 main groups. 1) North-Saami except SA3 2) South-Saami with SA3 and SWE7 3) Finns 4) Swedish and Norwegians 5) Mixed groups consisting of mixed individuals of Scandinavia-Saami, Swedish-Finns and some Ostrobotnian Finns.

The tree shows that North-Saami and Finns share a subbranch while the South-Saamis and the Mixed group share similarly a closely related subbranch. This subbbranch if at the next higher level shared with the Lithuanians, Belorussians and the Vologda Russians. The Scandinavians on the other share a higher level branch with the French, Italians, Hungarians and Romanians. Please note that the Scandinavians and Saami/Finns split from each other at the highest level.




This map can also be presented with populations labels that can be useful in the PCA plot:




For invidual assesment of relationships the pairwise plot should be used:



The identified populations can be presented in a PCA plot:

RESULTS: CHUNKLENGHTS

The Chunklenghts show the lenght of each shared segments seen in the earlier Chunkcounts. Larger segments usually means more recent common ancestry, while smaller opposite. The tree structure is much the same as desribed for chunkcounts but the south-saami group have dissappeared and dived into the North-Saami and into the Mixed group. Also there have been some minor movement of Finns between Finns and the Mixed group. The Finns also appears to have stronger internal sharing in segment lenght than in number of segments. Else the heatmap appears much the same as for chunkcounts.



This map can also be presented with populations labels that can be useful in the PCA plot:



For invidual assesment of relationships the pairwise plot should be used:


The identified populations can be presented in a PCA plot:

RESULTS: MUTATIONS

The mutations matrix show the number of SNP than have mutated compared to other haplotypes.  This map is not as clean cut as the earlier presented and should strictly to be considered preliminary, it probably need more chromosomes added to get a better assignment and tree structure. In this plot most Scandinavians are spread out on a large branch consisting of the panel of continental Europeans like French, Italians, Hungarians, Romanians and Belorussians, and only a few with Lithuanians and Vologda Russians. The North-Saami is all in one group here, the South-Saami and Finns have dispersed into what appears as two mixed groups.


This map can also be presented with populations labels that can be useful in the PCA plot:


For invidual assesment of relationships the pairwise plot should be used:


The identified populations can be presented in a PCA plot:

PRELIMINARY END DISCUSSION:
FINNS AND THE SAAMI

The number of shared segment analysis and the lenght of these segments appears extreme among the Saamis and to a lower extent with the Finns (also similar with continentals like Lithuanians and Italians).

There can be several reasons for these higher levels of number of shared segments and the lenght of these segments 1) foundereffect giving a pool of similar haplotypes 2) genetic drift that kills of or fixated haplotypes 3) lack of gene inflow or/and outflow reducing diversity of haplotypes that can match with other populations 4) mutations only found within the group reducing the matching of segments with other populations.

The large number of shared haplotypes and large lenght of these haplotypes you may be tempted to suggest that any outlierness among the Saami and secondary also Finns is due to a shallow foundereffect and following genetic drift however the mutational matrix may suggest at least partly a different history.

This brings us back to the mutation heatmap where the Saami seperate at a higher branch. Here it appears that the Saami have their own mutations and only to a extent share with Finns or the mixed group. The mutations may in part explain the high internal sharing among the Saami and Finns. They have their own seperate mutational history but Finns/Mixed group have closer ties to the surrounding populations are at least partly from individuals in these groups. It is tempting to think that the sharing between Finns and Saami is due to Saami admix, or alternative it due to common source or Finnish admixture before the Finns genetic affiliation with continental populations. This is in accordance with the MDS plot analysis placement of Finns and Saami as outliers.

The possibility of introduction of "foreign" haplotypes not seen in the panel should also be investigated further. It may explain the two strange Italians at the upper left corner of the mutation matrix.

SCANDINAVIANS

In this analysis the Scandinavians clearly seperated from other populations in the number of shared haplotype and lenght of these shared haplotypes and with an internal sharing not so different from most of the continental populations. In this analysis they are always connected to the continental European populations like French. They do not however appear different to the continental-europeans populations when it comes to the mutation matrix. Here they appear to share as much with them as the Saami share with them self. It is currently difficult to know how old this large mutational sharing actually is but it appearantly old enough to make them seperate into their own group when it comes to shared haplotypes and haplotype lenght. In the MDS plot Scandinavians appears close to the continental European populations like French, Italians and so on.

BIG PICTURE

If we looker at the bigger picture we see that most of continental Europe is tied to each other more trough mutations than others making them harder to seperate even at this level (6 chromosomes). We see that Lithuanians seem to have stronger affiliation to the large continental European cluster including Scandinavians but this affiliation is weaker for Vologda Russians. This connection is even weaker for Finns and almost non-existing for Saamis. This is in accordance with the MDS plot.

(Updated 28/5/2012)