Totalt antall sidevisninger

torsdag 3. mai 2012

Geneflows in Fennoscandia

I have after some trying and failing maybe come to a usefull and maybe even powerful way to analyze the project participants genetic history using the Chromopainter software (See earlier posting using Chromopainter and Finestructure software). The analysis provide an in-depth insight that cant be shown using software like ADMIXTURE and MDS/PCA that simply compares allele frequencies from genotypes on assumed indepentent SNP's.

I will in this analysis show that ADMIXTURE and MDS/PCA analysis is not directly wrong but that these programs omitt interesting genetic histories.

Technical info:

289k SNP from 22 chromosomes. Genotypes phased to haplotypes in BEAGLE. A set of 70 Fennoscandians receiptians run trough Chromopainter with 10 donorpopulations with 8 individuals each. HapMap recombination used. Chromopainter run in donor mode with 10 iterations.

INITIAL ANALYSIS:

Proportions:

The proportions highlights is that the Swedes shows the highest proportions to the French and British. This is in accordance with earlier MDS plot analysis. They also show the lowest proportion to the Koryak and the Chuvash. The Saamis shows the highest proportion to the Vologda Russians and to the Chuvash and have the lowest proportions to the Koryak and the Romanians. Note that Saami have the highest proportion to the Koryak in the panel. The Norwegians have as the Swedes highest proportion to the French and British, and the lowest proportions to the Koryak and the Chuvash. The four Lithuanians project participants we have of course shows the highest relationship to the Lithuanian donor population, with Belorussians and Vologda Russians as runner ups. We will se that these Lithuanians will be very useful as controls. The Finns do as the Saamis also have Vologda Russians on the top of the proportion list but instead of Chuvash have Lithuanians as second runner up. The Finns have Koryaks and Italians on the bottom of their list. Estonians appear closest to the Lithuanians and to the Belorussians. They have as the Finns the Koryaks and the Italians at the bottom of their list.


So its clear that from proportions alone that the Swedes and Norwegians appears very similar to each other when it comes to source of influences and that Finns and Saamis have different influences both to Scandinavians and to each other. The positions of these groups on earlier MDS plot support this conclusion.

To empasise these differences more strongly I have ranked the receiptant individuals according to influence from 1 to 70 (70 the total number of project participants). Here a small number means lower influence and high number higher influence vs other Fennoscandians. Note again the differnces between Scandinavians (Norwegians, Swedes) and Finns, Lithuanians and Estonians. Especially the Saami appears to rank low to most other populations in the panel.


The next tables support the descriptions commented above.

Number of shared segments (ChunkCounts):


Please note that higher number of shared segments usually means closer relationship while lower number of segments usually means more distant relationship.

However it is possible that number a lower number shared segments also reflect a more recent shared history. To check out this possibility you also have to take into consideration total shared length to calculate actual shared segment size. If the segment size is large the history is more recent. See next two tables.


Total length of shared segments (ChunkLenght):


Please note that higher total shared segments usually means closer relationship while lower number total shared segments usually means more distant relationship.

However its possible that the total lenght of shared segments is more fragmentet that is have a higher number of shared segments. This would mean that the total shared segments are older. To check out this possibility you also have to take into consideration number of shared segments to calculate actual shared segment size  Se two tables above and below.

Segment size (ChunkLenght/ChunkCount):


Please note that smaller segment size usually should mean older segments while bigger segments usually should mean newer segments both due to recombination that breaks up segments.

As we can see here Estonians, Lithuanians and Estonans appears to have the largest segments vs the donor populations. The Lithuanian controls confirms this. The Scandinavians and the Saamis appear to have the smallest segments. This suggests Estonians, Lithuanians and Finns have a more recent connection to the continental donorpopulations.

Number of related but not identical haplotypes (MutProb):


Please note that higher counted related but not identical haplotype usually should mean closer relationship as it correlates with higher proportions and the other observations seen above. There may however be more to to the data.

As we can see here Estonians, Lithuanians and Estonans appears to have on average the lowest number of counted mutations vs the donor populations. The Scandinavians and the Saamis appear to have the largest number of counted mutations. This suggests Estonians, Lithuanians and Finns have a more recent connection to the continental donorpopulations.

Proportions Correlations:

Correlations 70 participants vs donor populations:


Positive correlations suggests complementary geneflows, if your high on Koryak proportions you likely will also have high Chuvash proportion. Negative correlations are opposite. If you have high Chuvash proportions it will be on the expense of your French proportions. The correlations suggests clear affiliations for Fennoscandians vs the donorpopulations.


LOOKING MORE IN-DEPTH INTO THE NUMBERS

As seen above in tables above. There is no doubt that there are influences from different directions to different groups in Fennoscandia. Could the data be manipulated in a way that let us infer more about the influences genetic history of these groups?


In the last table we so how the counted mutations for related but not indentical haplotypes appears to distribute very similar to the observations of the other tables. Its obviously a high correlation. However how much do these correlate with for example proportions? The correlation is not perferct. Is it possible that the counted mutations are denser or less denser per proportion than for others and is it possible to infer something about history about this?

Lets try it. We simply take the number of counted mutations for related but not indentical haplotypes and divide it by the observed proportions.

A simple example to illustrate what I am trying do to:

Ind 1: Counted mutations 10. Proportion 100% or 1 = 10/1 = 10
Ind 2: Counted mutations 10. Proportion 50% or 0.5 = 10/0.5 = 20

As we can see here even the number of counted mutations for related but not identical haplotypes is higher for Ind 1. Ind 2 actually have 100% higher density of mutations than for Ind 1. I call it correcting for proportion.

Lets check it against the real data:


Lets first check the controls.

The Lithuanians participants have the lowest number of mutations for related but not identical haplotypes even they have the highest proportions and highest counted mutations before correcting for proportions to the Lithuanian donorpopulation. This is because the Lithuanians in our panel are the closest to the donor Lithuanians so that way it conforms that they have least divergence time or the least time to develop differentiating mutations to the donorpopulation.

This means we can say something about divergence times for the proportions observed. It can be a powerful tool.

Overview:

At first we see that two groups stands out on each extremes. The Estonians appears as absolutly closest in general to the continental donor populations even to the Koryaks. Here it appears that Estonians are closest to Vologda Russians. This also make sense when looking at the MDS plots published earlier. On the other side we have the Saamis who appears closest to the Belorussians but in general have the largest mutational distances to all the other donor populations except the Koryaks where the Norwegians beat the Saamis with a few points.

This suggest that the Estonians have the least divergence time to the donor poulations and the Saami have the largest divergence time.

Discussion:

Lets first look at some observations:

Koryak: We see that even the Saamis have the highest proportions that they also have the second highest divergence after Norwegians and before Swedes. The lowest divergence is among Estonians, Finns and Lithuanians. This suggest that the Koryak influence among the Scandinavians and the Saami is older than what is found in the south-east of Fennoscandia and Baltikum.

Chuvash: We see similar as above that Saamis appears to have not only the largest Chuvash proportion but also the largest divergence from the Chuvash. The divergence appears similar for Scandinavians. In south-east Fennoscandia and Baltikum it appears to have more recent influx of Chuvash like related haplotypes. Note that Finns is up to "yellow" color-code to the Chuvash. It may indicate that Finns is in a intermediate zone. They may have partly more divergent haplotypes than furter south like in Lithuania and Estonia.

Estonians: The Estonians appears to show the closest divergence times with the Vologda Russians, Belorussians, Chuvash and Lithuanians. It make sense as Estonians are close neighbours to these population or in the closer region. Note that the Estonians appears as close to the Lithuanian donor populations that Lithuanian project participants. This may be due to the margin of error (what exact numbers I do not know) or due to somewhat different history for the project Lithuanians vs the donor Lithuanians.

Finns: Finns appears to show somewhat less divergence to the Lithuanians and Hungarians. The first observation make sense as Lithuanians are geographically close. However why the Hungarians seem to be closer is a mystery, but the Lithuanians shows a simiar but closer divergence for these populations.

Norwegians: The Norwegians seem to show closest divergence to Hungarians and Belorussians. This is also odd and a mystery but may have something to do with the high frequency of Y-chromosome R1a in Norway.

Swedes: The Swedes shows lowest divergence time to Belorussians and Lithuanians.

Saami: Saamis who have the largest divergence time to almost all populations except Koryaks appears to have somewhat lower divergence time to Belorussians. This is a mystery but maybe connected to the similar lower divergency times for Scandinavians.

Summary observations: It appears like in order of intensity of influence that Estonians, Lithuanias and Finns have the closest divergency time to ALL the continental donor populations. On the other side we have the Swedes, Norwegians and the Saami with the largest divergency times.

Geneflows:

The data then obviously suggest that Lithuanians and Estonians indeed are continental populations, what maybe more unspected is that Finns appears to have been strongly influenced too by more recent immigrations or geneflow from continental Europe. This inflow appears to have come trough Lithuanian and Estonia.

This recent inflow to the Finns appears to even be somewhat more stronger than the inflow from continental Europe to Norway and Sweden. Please note that we do not have any donor population from Denmark, Germany or Poland. It may affect this result somewhat for Scandinavians. It may be that Hungarians and Belorussian influence is indirect or as proxy through Germany or Poland. However its obiously intriging that Finns appears to in general show lower divergence to continental European populations while Scandinavians seem to show this somewhat less.

The Saamis appears to have received the least geneflow at least more recently and therefore keept an older divergency time. The general higher density of mutations towards almost all populations suggest that the Saamis mutations largely are European of origin. If we remove the Koryak and even the Chuvash from the average counted mutations the Saami still hold the position as the most divergent population (See table). Why Saamis appears to show lowest divergency time with Belorussians and Italians is a mystery.

Its also interesting to see the intermediate positions Scandinavians have when it comes to divergency times. They almost look like the Saami minus what we observe further south.

This is how I see it currently. I have no guarantee what published here is absolutly correct. It may be wrong. Please take it into consideration when reading this post.

END.

OBS! Please note that the observed connections may be proxies for populations not included in the panel. If having Chuvash scores it doesnt mean your Chuvash. It means that you have something that resemble Chuvash.









Ingen kommentarer:

Legg inn en kommentar