Totalt antall sidevisninger

onsdag 29. januar 2014

Is there a "East-Asian" influence in Continental Europeans? Part II

This post goes further in the the previous post "Is there a "East-Asian" influence in Continental Europeans?". It further elaborate on the separate run Finestructure using European, Siberian and East-Asian samples shown at the down part of the previous blogpost. In that part we only looked at the first dimension 1 and 2 and here we move further to the higher dimensions.

The third dimension is actually the same dimension we have seen many times when doing the PCA plot for the European panel in this project giving the characteristic "V" shape where South Europeans cluster at the root while Finns, Saamis cluster on one branch while Eastern Europeans branch on the other. This dimension is the Saami-Finnish branch variation vs South Europe. This variations peaks on one side among Saamis and Finns and as we can see from the gradient map it also exists consistently among the Siberians but not among the East-Asians. On the other side it peaks among Sardinians, Basque and Italians and the East-Asians cluster here with the South-Europeans. I am very unsure about the interpretation here but as both Siberians and East-Asians is not at the extreme on either side of the variation I tend to believe it may represent a gene flow from Europe towards Siberia and East-Asia. Maybe the northern spread represent a geneflow of Saami/Finnish like hunter gatherers eastward into Siberia and the lower part a geneflow from Europe towards East-Asia through todays India.


Dimension 3 - peaks among Saami, Finns vs Sardinians and Basque

Dimension 4 is the equivalent to the other branch of the "V" in Europe. On one extreme we found the Lithuanians, Mordovians and other Eastern European populations (actually here there is a Chukchi individual that a very little higher value than the top Lithuanian). The other extreme is Basque, Western Europeans, Saamis, Scandinavians and also the East-Asians. Also here I am unsure about the interpretation but it appear to show consistency as in dimension 3 but this time with a different spread. It may be a geneflow spread from Eastern Europe eastward through Siberia.


 Dimension 4 - peaks among Lithuanians vs Basque 

Dimension 5 appear to be a dimension that peaks among East-Asians on one side and Siberians on the other with the Europeans between. As we can see there is tendency to Siberian like influence in western part of Europe.

  Dimension 5 - peaks among Siberians vs East-Asians

This dimension appear interesting with regard to the question if there is any East-Asian influence among continental Europeans. So if we zoom to Europe and remove the PCA elements from outside Europe leaving only the European PCA elements we cen a more detailed view.


Dimension 5 - peaks among Siberians vs East-Asians

What is very striking here is that it appear to peak among Eastern Europeans and to some degree also Finns but appear least among the Basque, Western and South-West Europeans and among Norwegians and Saamis. It may suggest a gene flow from East-Asia that have divided in half an earlier haplotype distribution that may have gone from Western Europe to Siberia but now only remains among Western Europeans Scandinavians and Siberians. This dimension has a striking resemble to another dimension in Europe I have earlier believed to be internal European variation.

The PCA coordinates for all individuals and all dimensions can be downloaded here.
.

55 kommentarer:

  1. So, if I'm reading correctly your dim. 5 maps, SW Europeans from Italy to Norway and peak among Basques are "more Siberian" than Eastern Europeans.

    On the other hand, dim. 4 (which should actually weight more) tells exactly the opposite story: Eastern Europeans are "more Siberian" than SW ones. Much of the same for dim. 3, which again should weight quite more than the former two.

    What conclusions can be obtained, if any? All I get is that dim. 3 is very similar to the distribution of yDNA N1.

    SvarSlett
    Svar
    1. Dimension 3 may look indicative of the usual Siberian affinity at first, but that is contradicted by especially Mordovians and to some degree Russians being on the opposite "red" side to Finns. In a typical admixture analysis Vologda Russians and Mordovians (Volga Finns) show at least as much "Siberian" as eastern Finns, here it's clearly different.

      Though Y-dna doesn't have effect in an autosomal run, it might be worth mentioning that Haplogroup N is common in both of these groups. It's also unlikely that Russians or Mordovians would have received any Turkic or Mongol influence that would explain the difference here, because in aforementioned admixture runs they tend to show only Siberian and not East Asian when both components are present, unlike Turkic Chuvashes who are neighbours to Mordovians, or Central Asians for that matter.

      Slett
    2. Thats the difficult thing. In unlinked SNP PCA plots the variation is usually in the sort of AA on one extreme, AG in between and GG on the other extreme. Similar is the chunkcount haplotype variation. There is two very different haplotypes at each extreme point of the PCA plot and in first case you refer to it appears that haplotype variation between Siberians and Western Europeans is more similar to each other and in the second Eastern European haplotype variation is more similar to the Siberian hapotype variation. Is I understand it PCA dimensions is supposed to be independent of each other therefore showing different stories each.

      Maju wrote: "What conclusions can be obtained, if any? All I get is that dim. 3 is very similar to the distribution of yDNA N1."

      Slett
  2. Dim.5 has a very similar distribution among northern europeans as the east asian component in 23andme. Peaks among vologda russians and Lithuanians.

    SvarSlett
    Svar
    1. So do you think that blue is higher and red/brown lower? I was reading the dimensions in the exact opposite way, based on the previous entries, in which brown/red is always considered high score and "peak".

      However I don't have it very clear because then he also talks about the negative blue side as having some personality, when I read it as mere opposition, not definitory. For example if dim. 1 peaks among circum-Baltic populations, Native Australasians appear as the most "anti-Baltic" within Greater Eurasia.

      Dim. 2 describes West Eurasians (negative: East Asians), dim. 3 describes Palestinians and Arabs (negative: NE Asians). And now that I realize, dim. 3 does not peak among Sámi, Finns, as described above, but actually among Sardinians, so I retract my previous comment re. N1.

      As I understand it negative dimensions in PCA are not as descriptive as positive scores. They mean just: "less akin to the positive pole than most".

      Anyhow, now that I think about it, Anders: why dim.3, 4, 5 here are not the same as dim. 3,4,5 in this previous entry? It is because you have removed all West and South Asian samples, as well as many East Asian ones, in a good demonstration that sampling strategy is hyper-important.

      However I can only disagree: how do you expect to measure East Asianness without enough East Asian samples? In fact the best would be to project European samples into an Asian-only plot rather than use trickstery to inflate the European sample and pretend to measure Easternness when you are actually measuring only variations of Westernness.

      Now I got what it means: you are measuring a few Asians in European terms:
      → dim. 3: Sardinian: no influence in Siberia
      → dim. 4: Siberian: clear influence in Eastern Europe
      → dim. 5: Basque/West Euro: some influence in Siberia (why?)

      In all cases East Asians (or Siberians) score negative, because all these dimensions measure only European variation.

      Slett
    2. Brown and blue are the extreme opposite haplotype variation, the between are the between. Thought it was obvious for most but maybe not fore some. There is else nothing more "positive" or more "negative" about any of these dimensions. Thats word for your own account only.

      Slett
    3. The samples have not been removed but put into the superindividual group called "Others". This is an approach developed by the Finestructure authors to be able to analyze the individuals of interest and their relationship.

      I know that in dimension 3 and 4 the Siberian and East-Asian samples is plotted *within* the European extremes in the charactaristic "V" shape where South-West Europeans typically cluster at the root while Eastern Europeans is on one branch and Scandinavians, Saamis and Finns on the other. What is interesting here is that this haplotype variation continue in both cases far eastwards beyond Europe.

      Maju wrote

      "Anyhow, now that I think about it, Anders: why dim.3, 4, 5 here are not the same as dim. 3,4,5 in this previous entry? It is because you have removed all West and South Asian samples, as well as many East Asian ones, in a good demonstration that sampling strategy is hyper-important."

      Slett
    4. In PCA there's always a positive score and a negative score and these dimensions are, if I understand correctly, PCA dimensions, right? I always understand that the negative score is not too informative because it can lump various different elements only united because of their low score re. the positive pole.

      That's the way I read PCAs and similar autosomal analysis results (eigenvectors, etc.) and I believe it is the correct way to read them. So, in my understanding, unless there is a shared positive score, there's no clear affinity.

      The typical "V" shape is usually caused by two positive scores (one in each dimension). In the typical global case, depending on sampling strategies, either of the three main continental populations take the peaks in dimensions 1 and 2, with the remaining one taking the negative score. In this example: Africans take PC1, Europeans PC2 and East Asians are in the extreme negative for both dimensions. The "V" shape is caused by intermediate scores between Europeans and the other two major populations. As there's no real admixture or intermediate scoring between East Asians and Africans (for mere geographical reasons), a full triangle is not formed (but it would in a hypothetical "perfect" case in which geography would be more neutral, generating clines between the three poles).

      Continuing with the global example, East Asian personality is not yet characterized in dim. 1 and dim. 2 but would need of a third dimension (or more). Here they are only characterized as "strongly neither this nor that". Instead MXL (Mexicans) are characterized as "strongly non-African" and "weakly European". An East Asian "weak" affinity can be inferred but cannot be confirmed unless more dimensions are plotted.

      "Brown and blue are the extreme opposite haplotype variation"...

      There's only one positive score (brown if I'm correct), the other can be anything because you are not measuring a single binomial SNP but a lot of them. For example, if the positive score tends to a simplified sequence GATTACA, the negative pole would equally include CTATACA and GATTTGT, which are not related among them but are equally non-related to the positively scoring sequence. Therefore if you read the negative pole as meaning something, you may commit these kind of "mindless lumping" errors.

      Slett
    5. In my standard Europe analysis the Basque and Sardinians typically end up in the root of the PCA plot and I now see they are negative to both dimensions while the two branches with Finns/Saami and Eastern Europeans at the other edge of the "V" is positive? Does it make the clustering of the Basque and Sardinians at the root a "fluke"?

      I have some time ago been testing what SNP's is driving this V shaped MDS plot using Plink. Whatever branch of the "V" you test the most significant SNP always shows a variation in the dimension. F.ex if Finns have the AA genotype, Scandinavians have the AG genotype and the Basque have the GG genotype, the Finns will branch to one extreme, Scandinavians in the middle and Basque at the other extreme. The same if you do with whatever other dimension that have variation. The same also apply for these chunkcount haplotypes. As you can see from the gradient maps its not noise you seeing but actual structure as in dimension 3-5 were East Asians and Siberians actually are clearly separated

      Its first when the higher dimensions begin showing individual variation than group variation the PCA is no longer useful at least for this purpose..

      Maju wrote:

      "In PCA there's always a positive score and a negative score and these dimensions are, if I understand correctly, PCA dimensions, right? I always understand that the negative score is not too informative because it can lump various different elements only united because of their low score re. the positive pole.

      That's the way I read PCAs and similar autosomal analysis results (eigenvectors, etc.) and I believe it is the correct way to read them. So, in my understanding, unless there is a shared positive score, there's no clear affinity."

      Slett
    6. Its the highest positive score and the other the lowest negative score. If all negative the least negative will be the brown one.

      Maju wrote:

      "There's only one positive score (brown if I'm correct), "

      Slett
    7. Wouldn't you say that the difference between Finns and Mordovians (both Finno-ugric groups with apparent "Siberian" affinity) in every dimension from 1 to 4 is a certain indicator that something actually is going on here?

      Dimension 5 is the only dimension where they seem similar, but there Poles also look like them, and Saami behave like southwest Europeans, so it isn't clearly representing much - except maybe some Lithuanian-likeness.

      Slett
    8. Dimension 5 is interesting. I see now as it peaks among the Chukchi and some other northern Siberians but also among some Western Europeans. I have run this same panel + some Native Americans and I get some very suprising clustering in at least one dimension.

      Slett
    9. I agree that dim. 5 seems very intriguing indeed. Reminds me of some "old" claims about an alleged Tungusic-Basque connection. Problem is that I can't recall the reference but it was a paper that attempted to connect a diverse global sample one on one using some odd methodolgy that I have not seen continued.

      Another possible connection might be the, likely Neanderthal, B006 X-chromosome haplotype, which peaks among Basques and Native Americans but seems most diverse (from memory, unsure) in Central Asia.

      It may be a residual signal of some very ancient connection from Aurignacian or Gravettian times, can't say.

      Inversely, it could be understood as less apportion of the East Asian tenuous influence in Europe but then, why are Siberians high in it. So I'd rather discard this interpretation, at least in any straightforward manner.

      Slett
  3. "Does it make the clustering of the Basque and Sardinians at the root is a "fluke"?"

    Not necessarily but it needs confirmation, IMO. In this case there's such confirmation in dim. 3 of this entry, with both Basques and Sardinians scoring highest in it. Similar kind of "positive" clustering has also shown up in other analysis. Anyhow, it's also notable your dimension 5 which peaks among Basques but only includes Sardinians as any other West/SW European population.

    "Whatever branch of the "V" you test the most significant SNP always shows a variation in the dimension. F.ex if Finns have the AA genotype, Scandinavians have the AG genotype and the Basque have the GG genotype, the Finns will branch to one extreme, Scandinavians in the middle and Basque at the other extreme".

    It's very possible but you also see occasionally populations that we know are not related lumping together in the same negative or neutral position rel. to an axis or even bidimensional area of a PC graph. Recently I saw Gujaratis and Mexicans forming a cluster for example, and we do know they have no direct relation whatsoever (but they score similarly between East Asians and Europeans, and highly negative towards Africans, so their plotting ends up being almost the same in a global PCA).

    So your negative scores may speak of real affinity among those populations involved but still require confirmation: they can also be mere flukes. For example in dim. 4, many Europeans and East Asians score similarly low (blue) and in this case it is almost certainly a mere fluke or coincidence, being both highly negative towards the positive "Sibero-Lithuanian" component (which may well be related to "ANE").

    SvarSlett
    Svar
    1. I am trying to read up about it to confirm it, but meanwhile if what you hold make sense (and it may do about dim 4).

      Dim 3 - The Finnish/Saami vs Siberian relationship is in the "blue" and highly negative so it may not be real, however still negative but much less negative vs the East-Asians therefore in the "red".

      Dim 4 - Link between Eastern Europeans "red" and for Siberias and especially the Lithuanians and the Chukcki. Negative for Western Europeans, Scandinavians, Saamis and East-Asians.

      Dim 5 - Red and blue among Siberians especially Chukchi, weak red in Western Europe. Blue in East-Asia. In Europe Blue among Eastern Europeans and red among Western Europeans, Norwegians and Saamis.

      Slett
    2. As I write before I do not feel convinced as I have experimented with the MDS engine and have an idea whats going on in the "engine room" - variation. I have read that negative Eigenvalue is not goof but these I only see in the absolute highest dimensions that I dont use anyway.

      Slett
    3. It make sense that the common ancestry even quite distant would manfest itself this way, however we do not here work with single SNP variation but haplotype variation that should have far more discrimination power.

      Maju wrote:

      "Recently I saw Gujaratis and Mexicans forming a cluster for example, and we do know they have no direct relation whatsoever (but they score similarly between East Asians and Europeans, and highly negative towards Africans, so their plotting ends up being almost the same in a global PCA)."

      Slett
    4. More data should have more discrimination power but only if it is properly analyzed. A 2D PCA does not have that power because what it does is similar (not identical) to a K=2 in Admixture-like analysis (actually a K3 and maybe even some other component can often be inferred but are not directly measured in any case).

      Also, while Mexicans can be loosely considered to be European-East Asian admixture, Gujaratis are definitely not, they just tend to score intermediate in 2D global PCAs because that's the "lesser evil" the limitations of the system allows. When an Indian-specific PC is detected (usually at higher PC values) they logically align towards that component. Similarly if there is a Native American component distinct from that of East Asians, Mexicans will tend to it and not at all towards East Asians.

      PCAs can be very misleading if its workings are not well understood. And in my experience even some professional geneticists do not really seem to understand them too well. The same can be said, to at least some extent, about other methods of mass genetic comparison.

      For all these reasons, I personally tend to be cautious about strange nDNA results until I see the same or very similar results replicated by several independent studies (different methods preferably). Although I reckon that what is "strange" is subjective and I can equally commit errors by accepting upfront a result that to my eyes appears not suspicious.

      The best test should be the "formal tests" performed by direct point comparison of the sample (X) with other references, such as when various Europeans are compared one on one with MA-1 and a Yoruba control, etc.

      In this case it'd require East Asian (and maybe separately Central-East Siberian) comparisons, judging on your PCAs. Just exposing the method, I have no idea how to do it in technical terms (informatics is a bit complicated).

      Slett
  4. "Also, while Mexicans can be loosely considered to be European-East Asian admixture, Gujaratis are definitely not"

    When you go far back in time, they are something like that. Ultimately the "South Asian" components are a drifted blend of generic East and West Eurasian, which is also visible in admixture runs such as one in Lazaridis et al @ k3 and k4. It's plausible that some type of analyses would put them in same place as recent east/west eurasian mixes.

    SvarSlett
    Svar
    1. In the best performed analysis South Asians only very very weakly show any East Asian admixture: they are not any simple East-West mix but actually a mix of a local genetic pool (ASI) with some important (but variable) Western input (ANI) and only very weak East Asian influences, concentrated in some specific areas and populations. So their allocation between Europeans and East Asians in a global plot is actually an artifact, a product of the limitations of the method (and possibly of sampling strategies). In a Eurasian only plot ASI typically takes the third position, at the vortex of the "V" shape, while true Eat-West Eurasian mixes like Uyghurs are located directly in the Europe-EA axis, far away from South Asians. But in a global PCA, Africans invariably take one of the three positions, so South Asian distinctiveness becomes invisible.

      In many aspects this is similar to the typical global K=2 graph (when Africans are sufficiently sampled) that places Europeans and West Asians between Africa and East Asia in ways that are not at all even remotely realistic. It's just that they are "neither this nor that", so they are placed in between. In the global PC1-PC2 graph South Asians are "neither WEA nor EA" but "much more like WEA and EA than like Africans", so they end up sitting between the WEA and EA extremes.

      Slett
    2. In the Lazaridis graph, the South Asian specific component does not appear until K=8, so everything before is pretty much irrelevant when analyzing South Asians.

      Why doesn't it appear until such a high figure? Partly because SAs are relatively undersampled, not being the focus of the study. It is notable also that the analysis could not detect any distinction between Pygmies and other Africans until K=13, when we know it is an important distinction. Instead the Hadza component shows up rather early, maybe in part because it was "supported" by several other East Africans (and surely also because the Hadzas are very drifted).

      Discerning why each component shows up earlier or later in an Admixture-like analysis is difficult but, in short, that there are two main factors at play:

      1. Distinctiveness (which may be caused by ancient separation or much more recent extreme drift, usually because of low population levels and endogamy)

      2. Demographic weight (which is determined by sample sizes).

      In order to be reasonably sure that a given K-level is the most meaningful one, cross-validation check must be performed and the lowest scoring K value selected for the analysis. It is often a quite high number (and almost always low K values are totally meaningless).

      Slett
    3. I have not done this using a world MDS plot using Africans, South Asians, East-Asians and Europeans but in my Europe panel analysis if looking at the first dimension showing independent SNP variation between Finns/Saami on the one end and Basque and Sardinians on the other Scandinavians would typically come between. Not because they where forced there but because their genotype variation is actually between these two extremes.

      Like the example before if Finns had almost only AA, and almost only Basque/Sardinians GG the Scandinavians would be place between these because they had mostly AG. I have done this finding "driving" SNP for the European panel and Scandinavians typically have a SNP variation between those two extremes.

      It actually make sense when lookng at geography. So also we would expect to see some more "East-Asian" alleles among South-Asians.for those SNP that have variations between these populations.

      Maju wrote:

      "It's just that they are "neither this nor that", so they are placed in between."

      Slett
    4. I don't how is it in Europe, sorry. It may well be as you say.

      Slett
    5. Maju: I have been in contact with the Finestructure authros about this. There is no difference between negative and positive PCA coordinates. It could be reverse if you added a additional sample to it. I think what you are talking about is the Eigenvalue but that is another thing. So what these PCA plots describe is nothing but haplotype variation. The largest variation is found in the lowest dimension and then further shows lowev amount of haplotype variation. So in case of dimension 4 for example the South-East Asians and Western Europeans do show a similar haplotype variation. The challenge is to know why.

      Slett
    6. It is very difficult for me to accept that claim even if it comes from "higher places", sincerely. As I said before it is very possible that in many particular cases positive and negative values are interchangeable and that the clusters depicted in the negative zone are real but I don't accept that this tendency can be universalized. Your dimension 4 map is a very clear example of two populations (West Europeans and East Asians) dumped together by no apparent relationship but rather just a shared opposition to the positive pole.

      Maybe you can track the particular SNPs involved and find if in fact there is an actual relationship between East Asians and West Europeans locus on locus across a random sample, but my impression is that it is not the case at all and that performing such locus on locus analysis should prove me right. But it can also prove me wrong, so please, if you can, do it.

      On the other issue, PCAs and eigenvalue graphs are in essence the same thing, eigenvectors and eigenvalues are the core of the PCA (only the naming convention and some non-relevant topographic details change as far as I can discern). According to Wikipedia: "PCA is the simplest of the true eigenvector-based multivariate analyses", and:

      "The eigendecomposition of a symmetric positive semidefinite (PSD) matrix yields an orthogonal basis of eigenvectors, each of which has a nonnegative eigenvalue. The orthogonal decomposition of a PSD matrix is used in multivariate analysis, where the sample covariance matrices are PSD. This orthogonal decomposition is called principal components analysis (PCA) in statistics".

      If not exactly the same thing, they are very intimately related, just like a square and a trapezoid are.

      Slett
    7. Maju: I would believe that people who actually can programming now the fundementals of PCA and as my own check of driving SNP¨s in my earlier European MDS plots. What drives the PCA is the SNP¨s showing the largest variation that means from AA to GG on each extreme of a dimension and anything between with the AG.

      I have by the way got a tip from the author about an approach for finding the SNP¨s in question but it must wait until later and would require some massive harddrive space as it will produce a file with over 800 haplotypes with 289k SNP each.

      Slett
    8. I have by the way tried to use the Finestructure PCA coordinates in Plink to see if I could find the SNP¨s behind it. Plink did find a SNP with much variation but it didnt have the structure as in the chunkcount haplotype variation. It shows that single SNP¨s cant explain this particular case but haplotype can.

      Slett
    9. But a haplotype is just a contiguous sequence of loci, of SNPs (much as a sentence is a contiguous sequence of letters, and some other characters like spaces). How do you think you know that the haplotype (and which haplotype) is similar between East Asians and West Europeans? I really don't understand the reasoning behind your conviction: if there are no affinities between the component letters, there should be no affinity in the sentence either (or vice versa).

      Slett
    10. "Your dimension 4 map is a very clear example of two populations (West Europeans and East Asians) dumped together by no apparent relationship but rather just a shared opposition to the positive pole. "

      This is unlikely to be the case, the dimension opposite to West Euros and East Asians peaks in very different populations (Belorussians and East Siberians) so it is not a singular pole.

      Slett
    11. The case is that in dim. 4, at least as I interpret it, what we see is some sort of genetic affinity (ANE-like?) that is shared by Eastern Europeans and Siberians (positive pole) but strongly not by both West Europeans and many East Asians (negative pole). My stand is that these two populations are probably not sharing anything in spite of sharing polarity because negative scoring alone is not evidence of genetic sharing but needs confirmation by other means and in this case it seems extremely unlikely.

      In case of single SNPs this can't be the case because they are almost always binomial. But when talking of larger amounts of genetic data, what is what these algorithms actually process, there are a very high number of possible combinations of SNP states, so "strongly not akin" to the positive pole does not mean by default that they must be akin among themselves (they may or not but needs confirmation in some other test). This is my stand, although Anders (and apparently some guys behind the FineStructure algorithm) disagree.

      In this example most likely we are before an RGB-like case in which "strongly not red" actually has two unrelated states "green" and "blue", approx. equidistant among them and towards the red pole.

      These are the kind of nuances that can go unnoticed in shallow readings of autosomal statistical processing unless careful control methods are ensured. For example in Lazaridis et al., they show us a PCA and say something on it, but then proceed to make carefully designed formal tests (supp. materials mostly) to reach to their main conclusions with clear hardly questionable support.

      I have minor issues with the Lazaridis paper but certainly not with this careful methodology, which are the same kind of "triangulation" tests that allowed for the detection of Neanderthal ancestry, etc. I have never seen in any other genetic study that West Europeans and East Asians have any kind of affinity other than the generic Eurasian, so I'm very convinced that in this particular case it's not A vs B but A vs (B+C), where B and C are only coincident in strongly not being akin to A.

      Slett
    12. If the red-brown dimension represents sharing between Siberians and East Europeans, it's something shared only by Siberians and Balto-Slavic (and Mordovian) populations, and thus not ANE but something else.

      How do we know this? In Lazaridis et al the Saami sample showed one of the highest affinities to Mal'ta in Europe, yet in dim 4 they behave like Iberians.

      Slett
    13. AFAIK there is no Sámi sample in the Lazaridis paper, not even Finnish, just Estonians (who do indeed score the highest in MA-1 affinity, followed by Scots, Hungarians and Lithuanians in this order). There is a small Sámi sample in the more populated Admixture graph but that's not a formal test (nor directly detects the ANE vector, just various generic affinities).

      Regardless, I said "ANE-like", i.e. similar to ANE. I do not pretend it is exactly the same thing, because it's a different kind of measure. That's something we have to accept re. the mind-boggling complexity of autosomal DNA analysis: that different kinds of tests or algorithms, and different kinds of samples, often produce somewhat different results, all of which are usually "true": complementary truths that need to be processes in holistic ways.

      I know this is difficult because we want answers and often these analysis just give us more questions. But it's what it is. I tend to store the data, wait, compare with the next batch of tests, etc. and, taking them in whole, try to discern the patterns that repeat (and keep in wait or just discard those who don't) in several or even many independent analysis. Otherwise the information can end up being very contradictory.

      This map is another data-point in the file of "Siberian affinities of Europeans", together with the issue of ANE and other stuff. I agree in any case that it is a bit odd that Finns and Sámi (and even Scandinavians) score low in this dimension and I reckon I have no explanation for it.

      Slett
    14. Maju: About this "negative" and "positive" thing on the PCA you claim exists. Do you have any source to substantiate this as not even programmers of PCA have heard about this.

      Slett
    15. No I don't have a source but rather the conviction based on what I understand I have seen in the results, year after year for a decade or so. However I may well be wrong. Let it be at that by the moment because I don't have time to study everything on PCA, eigenvectors, etc. Feel free to totally disregard my ideas and sorry for bothering you.

      Slett
    16. A sequence has more differentiation power than a single SNP. You do in other words have more information in a sequence of SNP than throught a single SNP. There is as far as I know two ways a sequence can differentiate in autosomes. The sequence can mutate or it can recombine or/and both. Why Western Europeans/Scandinavians and East-Asians show similar variation in the dimension 4 case could be that the "blue" is the ancestral sequence while the "brown" represent a changed sequence (by mutation or/and ancestral recombination) that later have spread from Eastern Europe and eastward or from Eastern Siberia and westwards.





      Maju wrote:

      "But a haplotype is just a contiguous sequence of loci, of SNPs (much as a sentence is a contiguous sequence of letters, and some other characters like spaces). How do you think you know that the haplotype (and which haplotype) is similar between East Asians and West Europeans? I really don't understand the reasoning behind your conviction: if there are no affinities between the component letters, there should be no affinity in the sentence either (or vice versa)."

      Slett
    17. In the Lazaridis paper's supplementary data there is a Saami sample in the f4 test, and there it's even more shifted towards Mal'ta than Estonians and Lithuanians. Saami having high "ANE" is certain.

      Mal'ta shift was also high in Scandinavians, similar to eastern euros, but here in dimension 4 we see no such thing. There's Balto-Slavic populations and Siberians on one side, and Saami, Southeast Asians and West Europeans on the other, and Mal'ta cannot explain this because Scots, Saami and Scandinavians are all opposite to Belorussians and Lithuanians.

      Slett
    18. Which section of the supplementary data? It's a very long file and I don't find it.

      Slett
    19. Maju: Its here (not a big pdf)

      http://biorxiv.org/highwire/filestream/383/field_highwire_adjunct_files/0/001552-1.pdf

      Slett
    20. Based on the vertical positioning of the f4 test including the Saami and other European populations. I would guess at least in European context that this gradient map from dimension 1 from a wider Euroasian panel than from this analysis that it could represent ANE panel. Notice that here the brown peaks among Saami, Finns and Estonians while the other peak is among Malays. I suspect the latter belongs to traces of ancient coastal migrations today maybe best represented by the Papuans and Melanasians.

      https://sites.google.com/site/fennobga/Euroasia191213-D1.png

      Slett
    21. I was looking in the other file: in the supplementary data, which includes some 15 mini-papers on very specific issues, not the extended data figures, which is the one you're sending me to. Just to be sure, I located the Saami sample in fig. 7, where they score slightly above Estonians on MA1 affinity but also very strongly in Han affinity.

      I suspect that this graph is relevant for your question of "is there East Asian admixture in Europe"? I think that the answer, based on this fig. 7 is:

      1. Yes
      2. For most populations East Asian (Han) affinity correlates closely (but at lower values) with ANE affinity, so it's likely that whichever flow brought the ANE component (IEs, Uralics or whatever) also brought some East Asian affinity product of ancient admixture in Central Asia.
      3. For some populations however East Asian affinity drifts away from the general ANE-associated pan-European pattern, suggesting a second source of East Asian admixture. These populations are Saami and Chuvash in the extreme, and also Mordovian, Russian and Finnish in a more attenuated form.
      4. At less obvious levels (maybe just noise?), some, mostly Mediterranean, populations seem to have some more ANE admixture (or affinity) than what would correspond to the main pan-European pattern of ANE/Han correlation. The most notable are Maltese but Sicilians, Spaniards and English also show some of that tendency. If not mere noise, this may suggest a secondary source of ANE admixture with less East Asian incorporation, (maybe a West Asian source or maybe WHGs?)

      "I would guess at least in European context that this gradient map from dimension 1 from a wider Euroasian panel than from this analysis that it could represent ANE panel."

      Maybe but my question would be: how does it correspond in the Siberian context? I'd be surprised if there's no or very low ANE in Siberia and Central Asia, really.

      I'm not very sure that the brown area correlates well enough with where in Europe ANE peaks either. It's true that you did not sample some non-Eastern high ANE populations like Greeks or Scots but is that enough explanation? Also Sardinians are the lowest ANE population of Europe, yet they score higher than Basques, who have double ANE scores per Lazaridis.

      Finally ANE also scores high in parts of West Asia (notably Iran), something your map does not.

      Said that, ANE is nothing but comparison with MA-1, not any absolute category (maybe an alternative comparison with Afontova would show noticeable different results, but so would do if the WHG sample chosen would be some other most likely), so discrepances when actual populations are only compared against each other are to be expected. I'd rather suggest not to expect to find ANE as such by merely comparing modern populations. That's the virtue and limitations of comparing with ancient or otherwise archetypal speciments: they point to something but they are not exactly that something, just a proxy.

      Slett
    22. "I'd be surprised if there's no or very low ANE in Siberia and Central Asia, really."

      http://www.nature.com/nature/journal/vaop/ncurrent/extref/nature12736-s1.pdf
      page 95's f3

      Some siberians like Nganassans, Evenki and Yakuts, as well as Central Asians, have much less shared drift with Mal'ta than East Europeans (Finns, Balts, Russians, Mordovians) so they obviously have less ANE. Some Siberians (Chuckhi and Ket) have similar level of Mal'ta affinity as these europeans, but Kets (like Selkups) have some European admixture and are less "siberian" than Nganassans who seem to have less ANE than French.

      Slett
    23. Thanks for pointing me to the exact page. :-)

      Many Siberians score there as high or higher than Europeans: Naukans, Kets, Chukchi, Khanti, Selkup, Koryak, Sors and Tundra Nentsi, all score higher than Hungarians (who are second in the ANE score per Lazaridis' supp. mat. section 12). It is true that some Siberian populations score lower than most Europeans however.

      I'm not sure if shared drift and formal affinity are identical concepts anyhow, because you can see some notable differences between the Lazaridis formal tests' scores and this graph.

      ... "but Kets (like Selkups) have some European admixture"...

      European? Or is it something more specifically Central Asian/West Siberian? See: http://leherensuge.blogspot.com/2010/07/central-eurasian-genetic-specifity.html

      In synthesis: Hui Li 2009 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2790568/) spotted a Central Asian specific autosomal component which peaks among the Khanty (neighbors of the Kets). Once considered this component (K=6 for an all-Eurasia sample) the residual European admixture in West Siberians is very low.

      This component is anyhow intermediate in affinity to West and East Eurasians, so it probably originated in some ancient admixture episode (but with enough time passed since then so it has homogenized enough to show up as distinct component, much like happens with the Ethiopian component in some analysis, which shows up as mixed West Asian-African in others). I'm not sure if this component is akin in some aspects to to ANE or rather a more recent product (probably this) but it is interesting to consider in any case.

      It is also very important in Central Asia (Khazars, Uyghurs and to lesser extent Hazaras). It also shows up at minor values in the Komi, Chuvash and Mongol. Its apparent presence in Pakistan/India instead I'd consider a mere effect of the lack of resolution of a South Asian specific component because of undersampling (i.e. noise) but in the other populations it'd seem a solid component, as two West Eurasian and three East Asian components are also described.

      Slett
    24. Anyway I plan to use the upper paleolithic individual in my next run if this individual share enough markers with my current standard SNP panel. The most amazing thing is that they have managed to extract the diploid genome of this individual so if would be possible to phase it together with the rest of the standard panel giving far better resolution. All the ancient genome analysis I have done so far have been "haploid" so I have been forced to analyse them using the lower resoultion unlinked model in Chromopainter.

      Slett
    25. "Central Asian specific autosomal component which peaks among the Khanty (neighbors of the Kets). Once considered this component (K=6 for an all-Eurasia sample) the residual European admixture in West Siberians is very low. "

      I don't think that's relevant at all, it's just a product of Khanty being a drifted homozygous group, commonly created by isolates even if they are only some centuries old. It shows up in distant groups at certain K values which is also nothing special, and it should disappear from Central Asians at higher K values.

      http://img507.imageshack.us/img507/4446/admixneareastveuropeesk.png
      Compare here, the drift component of North Italian Illegio isolate considerably shows up in Basques at k7 but diminishes later.

      The "Khanty component" has a spread west of Urals which seems to stop at Komi, which may be result of some mixing or old ethnogenesis of that group. Not much more can be gathered from it I think.

      Slett
    26. Then why does the "Khanty" (or rather Central Asian) component found at high doses in so many Central Asian populations? I do think it is very relevant and not at all a Khanty-only thing.

      Of course that all Far North populations are highly drifted, even Finns (and certainly Komis), but that is only part of the picture and does not seem to justify the widespread presence of a component that peaks at the Khanty but is by no means Khanty-exclusive. If nothing else it shows that Khanty and Central Asians are closely related and that a lot of the W-E admixture found at shallowest K levels is absorbed by this Central Asian specific component.

      Slett
    27. The drift components of isolates commonly show up in populations that have not had contact with the isolate. It's mostly the shortcomings of admixture runs that cause this seeming similarity. Kalash are one of the most infamous examples.

      Drift components are just a sum of their parts, so you'd have to look at earlier K values to see the real situation. It's obvious that Basques don't really have 20% admixture from Illegio, and (in the same run) it's equally unlikely that Orcadians really have similar amount of late settlement Finnish component. It's just a quirk of chance that makes drift components show up in populations that don't have real connections, but somewhat similar basal admixture. With Central Asians and Khanty this is very likely what's going on.

      Slett
    28. "It's obvious that Basques don't really have 20% admixture from Illegio, and (in the same run) it's equally unlikely that Orcadians really have similar amount of late settlement Finnish component."

      In general terms these affinities make some good sense. If Basques are partly of Neolithic ancestry and that ancestry (Cardium Pottery) arrived via Italy, it makes all sense that we are connected with some of the oldest surviving Italian populations (which now appear to be of Neolithic formation), the same as happens with Sardinians. On the other hand Orkney's yDNA appears half Norwegian (Capelli 2003), so it's almost certain that they have a good dose of Medieval Viking ancestry (what is much less clear in the case of mainland Britons, whose apparent Danish/Saxon affinity can have arrived in many different periods beginning in the Epipaleolithic), so Orcadians should have a fraction of the Norwegian Finnish affinity, whichever that is.

      "It's just a quirk of chance that makes drift components show up in populations that don't have real connections"...

      I'm really not familiar with the concept of "drift components" (not by that name at least) but, if I understand it correctly, it indicates drift towards similar genotypes, and that is almost certainly not caused by chance. The effects of genetic drift as such are random and, precisely for that reason, if two drifts go strongly in the same direction, that stinks to shared ancestry, at least on first look. So I'm quite perplex at your latest argumentation, Anonym.

      Slett
    29. "I'm really not familiar with the concept of "drift components" (not by that name at least) but, if I understand it correctly, it indicates drift towards similar genotypes, and that is almost certainly not caused by chance."

      It certainly can be caused by chance. The admixture software in this case seems to assign closest approximations at certain K values, and may change them again at higher K values as seen with Basques and Illegio.Likely this would happen with Khanties and Central Asians.

      Late Settlement drift in Finns is less than 500 years old, much more recent than Viking era, so that component's appearence in Orcadians could well be a fluke too.

      How do we know a shared admixture component that peaks in a drifted, isolated population is not just noise but a true indicator of shared ancestry, then? Easy. It must stick over multiple K values in that same run and other tests. This happens with Finnish drift component in, say, Estonians in every admixture run that has Finns, Estonians and a component that peaks in Finns. The Khanty/Central Asian component appears in one test, in just one K value (K6), so I think more evidence for its tangibility is needed .

      Slett
    30. "Late Settlement drift in Finns is less than 500 years old, much more recent than Viking era, so that component's appearence in Orcadians could well be a fluke too".

      I'm not sure what you're talking about re. Finns but what I said about Orcadians is just Y-DNA, so no relevance of drift.

      "How do we know a shared admixture component that peaks in a drifted, isolated population is not just noise but a true indicator of shared ancestry, then? Easy. It must stick over multiple K values in that same run and other tests".

      I would rather run cross-validation tests to be sure you are at a good K value, what you say are also indications, but it's useless if you are at K values with clearly suboptimal cross-validation score.

      Also I'd segregate highly drifted populations from analysis, unless you want to specifically focus on them, in which case there are several possibilities using supervised runs and such.

      Slett
  5. The so-called ASI component would contain significant "eastern non african". It's only drift that creates the present South Asian components. When fitting populations into "west eurasian", "ENA" and "african", Gujarati likely would come out close to Mexicans. Pygmy is mainly basal "african" with old drift into its own separate way.

    SvarSlett
  6. I can't agree. For example, when you study only Asians, as in the HUGO paper, South Asians invariably show early divergence from Circum-Pacific populations, what should mean that they have low, remote, affinity only.

    But it may be even more apparent in the Admixture graph from Lazaridis et al. which you mentioned: if you measure the SA-specific (~ASI) component at K=7 it scores ~67%, the rest being stable WEA affinity (~ANI). Almost half of it comes from the East Asian + Australasian pseudo-affinity scored in lower K values, and the other half (slightly greater) comes from the WEA affinity now greatly reduced. That I interpret as the SA-specific component being almost equidistant between WEA and EA/Aus and not because of admixture but because it is a distinct basic Eurasian component which scores as "neither this nor that" (not surprising because South Asia was the heart of the Eurasian colonization, with some help of SE Asia, according to all haploid DNA evidence, with some archaeological support). The only meaningful admixture SAs have is WEA Neolithic (and maybe some post-Neolithic) immigration, otherwise they are their own fundamental population.

    There must be some sort of formal test to perform in order to confirm this interpretation but in my experience everything fits with this reading: SAs are not the product of WEA-EA/Aus admixture but merely the product of their own local history since the times of Toba or even before, plus some important "recent" WEA influence (and lesser EA one).

    I have already commented other similar examples of "neither this nor that" false intermediate scores. It's not like "discovering America".

    SvarSlett
  7. ENA is not East Asian though, it's supposed to be basal to non-western eurasian components and, the way I see it, predates ASI. ASI is still more "western" than "eastern", but closer to modern East Asian than any of the western components. Therefore while majority ASI populations diverge from East Asians, they still should have greater affinity to them than a mainly West Eurasian population with some "present-day" East Asian, like Turks.


    Interestingly Eurogenes K13 fst distances place South Asian component equidistant to West Med and East Asian.

    SvarSlett
  8. As I understand it MA-1 (single-man measure of "ENA") is a West Eurasian element, at least in essence (i.e. barring possible minor East Asian admixture). When the core ancestors of West Eurasians moved West from Pakistan/India, they went also to Altai, from where they later flowed to the East, spreading the UP ("mode 4") tech and repeatedly mixing with already established NE Asian populations (leading to Native Americans and maybe some other Siberians) but keeping the Western patrilineage Q1 in dominant way (much as after the LGM, N1 did in the opposite direction).

    The Lazaridis study's modeling is, in my understanding, incorrect: the "Basal Eurasian" concept is an artifact of mere African admixture in EEF (nonwithstanding likely lesser persistance of true "basal Eurasian" mtDNA and nDNA in Arabia, but this one is much smaller). In the Mesolithic part of the E1b "clan" expanded quite dynamically from the Nile area and reached to the Levant, where it participated, jointly with pre-existent locals (J1, G2, etc.) in the Natufian→PPNA transition (and later in the CAPC and Semitic genesis). For all we know, a branch of these somehow reached mainland Greece, where it is still very important. It was in Greece (and not directly West Asia) where the European Neolithic (both mainstream waves) began, so we are most probably in front of a founder effect in EEFs of African (Egyptian) origin, founder effect that also affected West Asia but in a very irregular way.

    So there is only one Eurasian (or non-African) branch leading to all Eurasians, Oceanians and Native Americans, West Eurasians have also some minor but significant African admixture via mostly "the E1b clan", what confuses things. I must say, to be more precise, that there's also some minor OoA remnant in Arabia, judging on mtDNA and arguably also nDNA and another distinct minor African founder effect in West Iberia with some spread to other parts of Europe (E1b-M81, etc.)

    Also I must say that Onge is not ASI. There seems to be some very distant affinity but they not the same by any means. If you use an Onge proxy instead of ASI, you don't get the same results at all but something much more similar to when we use Papuans or East Asians as proxy. The Andamanese, whichever their exact origins (which seem related to both the Far East and India, but more to the former), are a distinct population isolate derived from the Eurasian colonization in general, remaining different from all their continental neighbors.

    SvarSlett
    Svar
    1. MA-1 is supposed to be "ANE" (mostly western component), not "ENA" (which is an envisioned dump for all non-western Eurasian), despite sharing more drift with Siberian Kets, Greenlanders and Amerindians than with any West Eurasians. A more complete sequence on MA-1 would be nice to see some day.

      I agree about "Basal Eurasian" admixture being questionable, in the study they even mention that they managed successful modeling without assuming its existence.

      Slett
    2. "MA-1 is supposed to be "ANE" (mostly western component), not "ENA""...

      My bad, I guess. I was not aware of the acronym and confused it with ANE. Is that acronym used by Lazaridis? If so, I must have missed that detail.

      Whatever the case, I really don't think there is any phylogenetically consistent population (in essence, because there's always some minor admixture that complicates things) that includes non-Africans to the exclusion of West Eurasians. West Eurasians are essentially a subset of non-Africans (plus minor OoA "Arabian" remnants and minor "recent" African admixture via North Africa).

      So IMO, the existence of "ENA" and "Basal Eurasian" populations are quite arbitrary theoretical concepts that I question for the very same reasons. If "Basal Eurasian" is an artifact, so is "ENA" (and vice versa).

      Slett