Totalt antall sidevisninger

tirsdag 29. november 2011

Geographic MDS maps of Fennoscandia and Europe (updated)

There exists some similarity between Finns and Lithuanians not seen in Saamis, so when adding the Lithuanians the Finns moved out of the North-Saami cluster compared to the previous analysis. Therefore Lithuanians where added. Also FI5 and FI6 where 2nd cousins pulled down the Finns into the North-Saami cluster. Removing each indepentently resulted in the same effect suggesting the relatedness had an effect in the analysis. FI6 was therefore removed.

The MDS plots can be confusing for some. I therefore made a map showing the geographic average distributions for the 3 main components and it may show an intriging history. Dimension 1 accounts for the most of the variation, while dimension 2 account for the second largest genetic variantion while dimension 3 account for the third largest genetic variation. Therefore the dimensions do not acount equally when represented in a 3D view.

The map appears to show Saamis as extremes in all three dimensions.

This is how much each dimension explains the genetic variation in each dimension:

 D1: 20 604 SNP 4.2%
 D2: 10 417 SNP 2.1%
 D3: 8 560 SNP 1.8%
Dimension Total: 8.1%

D1: Saamis, Finns and Vologda Russians appears to make a far eastern bridge. Earlier analysis seem to suggest that this is a Siberian component seen in Europe. It appears absolutely lowest among the Italians. Note that the Italians appears to be "isolated" in Europe in this context with much higher rates seen in close populations. Scandinavians appear closest to Belorussians and Hungarians in this dimension.

D2 also appears very high among the Saami and seem to have a western European distribution as it has high frequency also among Finns, Swedes and Norwegians. It is also higher in western Europe as seen among the French and Italians. However it drops very dramatically from Finns to Lithuanians, Estonia and Vologda Russian. The level increases again in more western east-European populations like Hungarians and Romanians.

D3 also appears very high among the Saami but shows an intriguing pattern. Its very low among Scandinavians, Lithuanians and Estonians, however the next highest occurrences are found among the Italians, then the Romanians and the Vologda Russians and then finally Finns just somewhat above intermediate levels like the French, Hungarians and Belorussians. 

May it be the "eastern component" researcher Tambets mentions in her abstract for the coming paper about genetic of the uralic speakers? If yes why is it also found second highest among Italians?

Also the local opposite clustering of this component only seen among Norwegians, Swedes, Lithuanians and Estonians are intriguing suggesting something in common.

As the result indicate the Saami outlierness appears to be the sum of these 3 different dimensions or genetic components. Italians appears to be opposite outlier in D1, Lithuanians appears to be opposite outlier in D2 and Swedes appears to be opposite outlier in D3.

6 kommentarer:

  1. D3 is indeed very strange: something (relatively) shared by all but NW Europeans (uh?) The geography "vector" seems to point towards the Balcans or West Asia.

    On the other hand D1 and D2 are pretty straightforward to read once we take out the Saami/Finnish extreme: D2 (highest in Scandinavia and then France and Italy) points clearly Western Europe, D1 instead does towards Eastern Europe.

    I like the way you described the importance of dimensions in the map, very pedagogic and clarifying: good idea. On the other hand I miss the measure of "weight" of each component, so we can judge more easily the relation between them (sometimes it is minor difference, other times brutal).

    So we have three vectors pointing towards the NE (D1), the West (or slightly to NW maybe: D2), and the SE (D3). What we don't know is how much they wight: how relevant is each.

    A bit puzzling is that the Saami-Finish group scores always extremely high for all dimensions. This is probably caused by the combination of two factors: (1) the strong peculiarity of the Finnish-Saami population and (2) the fact that these and the somewhat related Scandinavians and Eastern Europeans are oversampled. Oversampled for good reason but may be a cause for the genetic compass to always point North here (allow me the metaphor).

  2. How mucb do each dimension or component count? I have received the question before and I wondering about it myself too.

    I have estimated F the inbreeding coeffisient, but it didnt show much difference for Saami and Finns compared to the rest of the panel. The Finns are not even close to f.ex Basque inbreeding coeffisient.

    I have also tried to kill off minor allele frequencies below 5%, the structure didnt change much.

    I also tried to prune the SNP, but that didnt change much either.

    I also tried both removing minor allele freqencies and LD but the structure remained much the same.

    So if anyone have more suggests to solve the "problem" let me know.

  3. Here is the main driving SNP and how much variance they acocunt for:

    Dimension 1: 20 604 SNP 4.2%
    Dimension 2: 10 417 SNP 2.1%
    Dimension 3: 8 560 SNP 1.8%
    Dimension Total: 8.1%

  4. Thanks Anders, that has some interest, as the D1 counts about double than the other two. D2 and D3 however are very similar in their weights.

    I guess it makes sense if we consider that in general Finns (and Saami) tend to cluster in Eastern Europe, which is where D1 points to.

  5. Variance? It's an statistical term roughly equivalent to diversity, why?

    Per Wikipedia: "In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out".

    Read the full article because I'm not too good at maths, so the fine detail may elude me.