| Here is the official version (PNAS Early Edition)... | ...and here a pre-print PDF version, as well. |
SummaryThis paper, which was published in the Proceedings of the National Academy of Sciences of the USA (PNAS) on 30 May 2007, has attracted a fair amount of press coverage. Since newspaper stories are often cut to fit the amount of space available and since the published paper goes into a lot of technical detail about both the genetic background and the statistical techniques we used, we have posted the following description of our work for interested readers. |
Our paper reports a statistical study of the relationship between the geographical distribution of two genes and the geographical distribution of tone languages.
The two genes, ASPM and Microcephalin, have attracted a lot of attention in the last couple of years, following two papers (1 and 2) published in Science in 2005 by a Chicago research group led by Bruce Lahn. Lahn’s group showed that there are two variants (alleles), one for each of these two genes, which emerged fairly recently (estimated 6,000 years ago for ASPM and 37,000 years ago for Microcephalin) and that these new alleles seem to be spreading quickly in the human species (and are therefore probably “adaptive”, or favoured by natural selection). They also showed that these “derived” alleles (as they are known) are unevenly distributed in the world’s populations, being especially rare in sub-Saharan Africa and most common in Europe, North Africa and Western Asia.

The distribution of the "derived" allele of ASPM in the Old World populations we studied in our paper.
Each circle represents one population and the intensity of blue reflects the allele frequency (min 0%, max 60%).

The distribution of the "derived" allele of Microcephalin in the Old World populations we studied in our paper.
Each circle represents one population and the intensity of green reflects the allele frequency (min 3%, max 100%).
Tone languages are languages (like Chinese, Thai, Yoruba, and Zulu) in which the pitch or “tone” of words and syllables makes a difference to word meaning. For example, in Chinese huār (with a high level pitch) means ‘flower’ and huàr (with a falling pitch) means ‘picture’. In non-tonal languages (like English or Spanish), pitch is only used at the sentence level, for emphasis and overall meanings like questioning. Roughly half the languages in the world are tonal and half are non-tonal, but they’re fairly unevenly distributed: tone languages are the norm in sub-Saharan Africa and are common in Southeast Asia and among Native American languages especially in parts of Central and South America. Non-tone languages are the norm in Europe and Central, South and West Asia, and among the aboriginal languages of Australia. For more details about their distribution you can consult, for example, the entry on tone in the World Atlas of Language Structures.
(Please, go here for another Chinese example, with
sound files. In Yoruba, igba
spoken with different tones means different things (recordings courtesy
of Dr. Lawrence Olufemi Adewole of Ile-Ife University, Nigeria): LowHigh = a kind of tree, MidMid = '200', MidHigh = 'gourd' and LowLow = 'time'.)

The distribution of tone languages in the Old World populations we studied in our paper.
Each square represents one population: yellow stands for non-tone languages and gray for tone languages.
(But what about the Americas?)
Superficially, the
distribution
of the
older (i.e.,
non-"derived") alleles, as reported by
Lahn’s
group, resembles
the
distribution of tone languages. Because the
two
genes in question are known to be involved
in brain growth and development, and because there is
some
evidence that differences
in performance on language-related experimental tasks can be linked to
differences in brain structure, we hypothesised that
the
proportion of the older
alleles of ASPM
and Microcephalin in
a given population would
correlate with whether the language spoken by the population is tonal.
This means that our approach is
different from the well-known work of Cavalli-Sforza
and
his colleagues, which aims to correlate
genetic and linguistic classifications of populations,
using known or hypothesised historical
relations between
languages and language
families (do populations genetically similar tend to
be also linguistically similar? - where genetic similarity involves
many independent loci and linguistic similarity involves historical,
ancestor-descendant relationships). Our
work investigates correlations
between genetic markers and typological features of languages
(do populations having certain alleles tend to speak languages using
the same feature? - without reference to overall genetic similarity or
linguistic historical classifications).
|
Language typology studies the ways in which languages can differ. Some of this is fairly familiar: for example, in French and English adjectives and nouns go in the opposite order - that’s word order typology. But there are typological differences in sound structure and word structure, too. In most Australian aboriginal languages, there are no fricative sounds (sounds like S or SH or F), whereas in most European languages there are lots - yet most Australian languages have lots of different N and L and R sounds that many English speakers struggle to tell apart. Or again: in many language (e.g. Turkish, Inuktitut (Eskimo) and Swahili) the verb forms have lots of prefixes or suffixes to indicate the subject, the object, the tense, and so forth; in English or Chinese there’s hardly any of this kind of marking. All these kinds of differences are what language typology is about. |
By comparing nearly 1000
genetic markers
and 26
linguistic features (the linguistic data with details on our sources and methods can be found here),
we were able to show that, as most people would expect, there is generally no correlation between
population genetics and language typology – but the relation between tone and
the two genes under study was confirmed to be especially strong in
all our analyses. It’s because there generally
isn’t a correlation between population genetics and language
typology that the correlation we’ve found may be interesting.
This relationship remains
important and statistically highly significant even when
we consider the correlation
between tone and ASPM and Microcephalin
simultaneously,
after we take into account the fact that neighbouring populations tend
to share both genes and languages, plus some more tests. (Go here for more details of what we did.)
The distribution of the correlations between all pairs of genetic markers and linguistic features in our database.
The horizontal axis represents the strength of the correlation (Pearson's r, between -1 and +1, 0 means no correlation).
It can be seen that most correlations are around zero, but that the correlation between tone and ASPM, and tone and
Microcephalin, respectively, are very improbable (stronger than 98.6% of all the correlations).
It must be noted that the correlation between tone and ASPM, and tone and Microcephalin are highly significant.
The distribution of tone and non-tone languages function of the population frequency of the
"derived" alleles of ASPM (horizontal axis) and Microcephalin (the vertical axis).
Tone languages are represented by empty squares and non-tone languages by black squares.
It can be seen that in the bottom-left quadrant there are only tone languages, in the to-right quadrant only non-tone languages,
while in the top-left quadrant there is a balanced mixture (the Americas fit here, supporting our prediction).
The bottom-right quadrant contains no populations in our sample and the reason is not known.
We believe that this correlation may reflect some sort of predisposition or cognitive bias induced by the two genes in question. We don’t have any detailed idea of what this bias might consist of, but we assume it is very small and would only manifest itself in language change over many generations. We know, of course, that any normal human infant can learn the language of any human community that it’s brought up in – genes don’t play any role at the individual level. But subtle differences in the way children acquire language might lead to changes in the long run. All languages change over time (as anyone who has struggled with Shakespeare knows), and computer simulations and mathematical models have suggested that small differences in the way children acquire language could, over enough generations, give rise to big differences in the way a language is structured. And if those subtle differences are influenced by a child’s genetic make-up, that could explain the kind of correlation we’ve found.
What about the Americas? |
The next step is to do experiments in which we look for evidence of the nature of the predisposition or bias. The work of Patrick Wong and his colleagues provides one possible lead here: they have shown that some monolingual adults find it much harder than others to learn an artificial language vocabulary that makes use of tone or pitch distinctions, and that the differences between these groups show up in subtle differences of brain structure as well. If we could show that these differences also reflect differences in genetic make-up, it would go some way to showing that the correlation we have found is based on a real causal link.
Our work has no immediate practical implications, but its longer term interest would lie in discovering that there’s a causal link between population genetics and language typology. (Again, we haven’t found that: we’ve just demonstrated some very unlikely correlations that suggest there might be such a link.) If that link can be found, then it will fit into the rapidly growing scientific understanding of how genetic make-up influences behaviour and cognitive development. That’s important work with lots of practical ethical dimensions: as science finds out more and more about specific genetic influences, society is really going to have to start dealing with a lot of policy questions that have only been theoretical up till now. But at this point all our paper does is report something that might be a piece of the overall jigsaw puzzle.
What the paper doesn't show nor claimFirst, we are not claiming that there is any direct connection between an individual’s genes and an individual’s language. We’re talking about small individual biases adding up to group effects over many generations of language change. People acquire the language(s) they’re exposed to in early childhood, regardless of their genes. Second, we’re not making any suggestion of “superiority” or “selective advantage” for one language over another. Our work provides absolutely no reason to think that non-tonal languages are easier or “more advanced” than tonal languages (or vice-versa). There’s also no reason to think that there’s any evolutionary advantage to non-tonal languages: Chinese society developed advanced technology and politics and philosophy with a tonal language just as successfully as Eastern Mediterranean societies at about the same time with non-tonal languages. Third, we’re not offering any new findings about the effects of these genes on brain development. We make only very limited suggestions about the detailed neurocognitive mechanisms that might be involved. Not much is known about the functions of these genes in brain development anyway, though this is certainly a hot topic in genetics. Since we’re not geneticists, we’re not involved in the front-line biochemical research, so not really in a position to speculate about what exactly might be going on in the brain. Finally, we’re not suggesting that language is involved in the selective pressure for the "derived" alleles of ASPM and Microcephalin. Nobody really knows what the selective pressures were (although a lot of people would certainly like to find out). Bruce Lahn’s group were very explicit that they didn’t know what the selective advantage might be. Some people have even argued that there is no selective advantage and that the whole story is just a matter of genetic drift. We assume that the “cognitive bias” we propose could be an accidental by-product of whatever it is that these genes are doing. |
| Last
updated: 25 June
2007 D.R. Ladd & Dan Dediu |