The Language Mosaic and its Evolution

James R Hurford,
Language Evolution and Computation Research Unit,
Linguistics Department, University of Edinburgh

(Note: In Language Evolution, edited by Morten Christiansen and Simon Kirby, Oxford University Press, 2003, pp.38-57. This HTML version may differ slightly from the eventually printed version; the printed version is the `authorized' version.)

The human capacity for language and the structures of individual languages can best be understood from an evolutionary perspective. Both the biological capacity and languages owe their shape to events far back in the past. Biological steps toward language-readiness involved preadaptations for modern phonetics, syntax, semantics and pragmatics. Once humans were language-ready, ever more complex language systems could grow, relatively fast, by cultural transmission, generation after generation. This latter process is profitably studied by grammaticalization theory and computer modelling.

0.1 What language evolution research is about

It is natural to ask fact-demanding questions about the evolution of language, such as `Did Homo erectus use syntactic language?', `When did relative clauses appear?', and `What language was spoken by the first Homo sapiens sapiens who migrated out of Africa?'. One function of science is to satisfy a thirst for such answers to questions comprehensible in everyday terms, summarized as `What happened, and when?'. Such questions are clearly genuinely empirical; there is (or was) a fact of the matter. A time-travelling investigator could do fieldwork among the Homo erectus and research the first question, and then make forward jumps in time and research the other questions. I believe, however, that study in the evolution of language will not yield answers to such questions in the near future. Therefore, finding answers to such empirical-in-principle questions cannot be the purpose of language evolution research. The goal is, rather, to explain the present.

Evolutionary linguistics does not appeal to an apparatus of postulated abstract principles specific to the subject to explain language phenomena. Language is embedded in human psychology and society, and is ultimately governed by the same physical principles as galaxies and mesons. Not being physicists, or even chemists, we can take for granted what those scientists give us. In the hierarchy of sciences `up' from physics, somewhere around biochemistry, and, on a parallel track, in mathematics and computational theory, facts begin to appear which can be brought to bear on the goal of explaining language. These facts are not in themselves linguistic facts, but linguistic facts are distantly rooted in them. The basic linguistic facts needing explanation are these: there are thousands of different languages spoken in the world; these languages have extremely complex structure; and humans uniquely (barring a tiny minority of pathological cases) can learn any of these languages. These broad facts subsume an army of more detailed phenomena pertaining to individual languages. Such facts are, of course, the standard goals of linguistics. But modern mainstream linguistics has ignored the single most promising dimension of explanation, the evolutionary dimension.

Linguistic facts reflect acquired states of the brains of speakers. Those brains were bombarded in childhood with megabytes of information absorbed from the environment through various sensory channels, and influencing (but not wholly determining) neurogenesis. The grown neurons work through complex chemistry, sending information at various speeds and with varying fidelity buzzing around the brain and out into the muscles, including those of the vocal tract. This is a synchronic description, reduced almost to caricature, of what happens in an extremely complex organism, a human, giving rise to linguistic facts, our basic explananda. Facts of particular languages are themselves partly the result of specific historical contingencies which we cannot hope to account for in detail. Collecting such facts is work at the indispensable descriptive coalface of linguistics. The theoretical goals of linguistics must be set at a more general level, accounting for the range, boundaries and statistical distribution of language-specific facts.

Much of biology, like most of linguistics, is devoted to wholly descriptive synchronic accounts of how living organisms work. But at one time in the world's history there were no living organisms. The evolutionary branch of biology aims to explain how the observed range of complex organisms arose. With the unravelling of the structure of DNA, evolutionary theory began the reductive breakthrough, still incomplete, from postulating its own characteristic abstract principles to a sound basis in another science, chemistry. Any evolutionary story of how complex organisms arose must now be consistent with what we know about the behaviour of molecules. The evolutionary story must also be consistent with knowledge from another new and independent body of theory, represented in the early work of D'Arcy Thompson (1961) and recently by such work as Kauffman (1993, 1995) and West, Brown and Enquist (1997). This work emphasizes that the environment in which natural selection operates is characterized by mathematical principles, which constrain the range of attractor states into which evolution can gravitate.

The evolutionary biologists Maynard Smith and Szathmáry (1995) have identified eight `major transitions in evolution'. Their last transition is the emergence of human societies with language. Chomsky has stressed that language is a biological phenomenon. But prevalent contemporary brands of linguistics neglect the evolutionary dimension. The present facts of language can be understood more completely by adopting an evolutionary linguistics, whose subject matter sits at the end of a long series of evolutionary transitions, most of which have traditionally been the domain of biology. With each major transition in evolution comes an increase in complexity, so that a hierarchy of levels of analysis emerges, and research methods necessarily become increasingly convoluted, and extend beyond the familiarly biological methods. Evolution before the appearance of parasitism and symbiosis was simpler. Ontogenetic plasticity, resulting in phenotypes which are not simply predictable from their genotypes, and which may in their turn affect their own environments, further complicates the picture. The advent of social behaviour necessitates even more complex methods of analysis, many not susceptible to mathematical modelling, due to the highly nonlinear nature of advanced biosocial systems. With plasticity (especially learning) and advanced social behaviour comes the possibility of culture, and a new channel of information transfer across generations. Cultural evolution, mediated by learning, has a different dynamic from biological evolution, and, to make matters even more complex, biological and cultural evolution can intertwine in a coevolutionary spiral.

The key to explaining the present complex phenomena of human language lies in understanding how they could have evolved from less complex phenomena. The fact that human language sits at the end (so far!) of a long evolutionary progression certainly poses a methodological challenge. Nevertheless it is possible to dissect out components of the massively complex whole, and to begin to relate these in a systematic way to the present psychological and social correlates of language and what we can infer of their evolutionary past. Modern languages are learned by, stored in, and processed online by evolved brains, given voice by evolved vocal tracts, in evolved social groups. We can gain an understanding of how languages, and the human capacity for language, came into existence by studying the material (anatomical, neural, biochemical) bases of language in humans, related phenomena in less evolved creatures, and the dynamics of populations and cultural transmission.

A basic dichotomy in language evolution is between the biological evolution of the language capacity and the historical evolution of individual languages, mediated by cultural transmission (learning). In the next section I will give a view of relevant steps in the biological evolution of humans towards their current fully-fledged linguistic capacity.

0.2 Biological steps to language-readiness -- preadaptations

In this section, I review some of the cognitive preadaptations which paved the way for the enormously impressive language capacity in humans. While these preadaptations do not in themselves fully explain how the full, uniquely human ability finally emerged, they do give us a basis for beginning to understand what must have happened.

A preadaptation is a change in a species which is not itself adaptive (i.e. is selectively neutral) but which paves the way for subsequent adaptive changes. For example, bipedalism set in train anatomical changes which culminated in the human vocal tract. Though speech is clearly adaptive, bipedalism is not itself an adaptation for speech; it is a preadaptation. This example involves the hardware of language, the vocal tract. Many changes in our species' software, our mental capacities, were necessary before we became language-ready; these are cognitive preadaptations for language. Preadaptations for language involved the following capacities or dispositions:

A pre-phonetic capacity to perform speech sounds or manual gestures.
A pre-syntactic capacity to organize longer sequences of sounds or gestures.
Pre-semantic capacities:
1. to form basic concepts,
2. to construct more complex concepts (e.g. propositions),
3. to carry out mental calculations over complex concepts.
Pre-pragmatic capacities:
1. to infer what mental calculations others can carry out,
2. to act cooperatively,
3. to attend to the same external situations as others,
4. to accept symbolic action as a surrogate for real action.
An elementary symbolic capacity to link sounds or gestures arbitrarily with basic concepts, such that perception of the action activates the concept, and attention to the concept may initiate the sound or gesture.

If some capacity is found in species distantly related to humans, this can indicate that it is an ancient, primitive capacity. Conversely, if only our nearest relatives, the apes, possess some capacity, we can conclude that it is a more recent evolutionary development. Twin recurring themes in the discussion of many of these abilities are learned, as opposed to innate behaviour, and voluntary control of behaviour.

Voluntary control is a matter of degree, ranging from involuntary reflexes to actions whose internal causes are obscure to us. Probably all vertebrates can be credited with some degree of voluntary control over their actions. In some sense, and in some circumstances, they `decide' what to do. In English, `voluntary' is reserved for animate creatures. Only jokingly do we say of a machine that it `has a mind if its own', but this is precisely when we don't know what complex internal states lead to some unpredicted behaviour. `Voluntary' is used to describe whole actions. If actions are simple, they may, like reflex blinking, be wholly automatic, and involuntary. If an action is complex, although the whole action may be labelled `voluntary', it is likely to have an automatic component and a non-automatic component. Both the automatic and the non-automatic component may be determined by complex processes obscure to us. What singles humans out from other species is the capacity to acquire automatic control, in the space of a few years, of the extremely complex syntactic and phonological processes underlying speaking and understanding language. Such automatization must involve the laying down of special neural structures. It seems reasonable to identify some subset of these neural structures with what linguists call a grammar. The sheer size of the information thus encoded (languages are massive) testifies to the enormous plasticity, specialized to linguistic facts, of the human brain.

Human languages are largely learned systems. The more ways a species is plastic in its behaviour, the more complex are the cultural traditions, including languages, that can emerge. Our nearest relatives, the chimpanzees, are plastic in a significantly wider range of behaviours than any other non-human animals; their cultural traditions are correspondingly more multi-faceted, while falling far short of human cultural diversity and complexity. Combined with plasticity, voluntary control adds more complexity, and unpredictability, to patterns of behaviour. Much of the difference between humans and other species can be attributed to greatly increased plasticity and voluntary control of these preadaptive capacities.

0.2.1 Pre-phonetic capacity.

Chimpanzees cannot speak. They typically have little voluntary breath control. To wild chimpanzees, voluntary breath control does not come naturally. On the other hand, chimpanzees have good voluntary control over their manual gestures, although they are not as capable as humans at delicate manual work. A preadaptation that was necessary for the emergence of modern spoken language was the extension of voluntary control from the hands to the vocal tract.

Learning controlled actions by observation entails an ability to imitate. Imitation involves an impressive `translation' of sensory impressions into motor commands. Think of a smile. Without mirrors or language, one has no guarantee that muscle contractions produce the effect one perceives in another's face. Given the required voluntary control, and the anatomical hardware, imitation of speech sounds should be easier than imitation of facial gestures, because one can hear ones own voice. A capacity for imitation is found in a perplexing range of species. Some birds can imitate human speech, and many other sounds, as well. Dolphins can be trained to imitate human movements. A capacity for imitation can evolve separately in different species, with or without the other necessary preadaptive requirements for human language. A neural basis of imitation has been found in monkeys, in the form of `mirror neurons', which fire both when an animal is carrying out a certain action, such as grasping, and when it observes that same action carried out by another animal. A recurrent theory in phonetics is the `motor theory of speech perception', which claims that speech sounds are represented in the brain in terms of the motor commands required to make them.

Although they cannot speak, our ape cousins have no trouble in recognizing different spoken human words. The capacity to discriminate the kinds of sounds that constitute speech evidently preceded the arrival of speech itself.

0.2.2 Pre-syntactic capacity.

Syntax involves the stringing together of independent subunits into a longer signal. We are concerned in this section with what Marler (1977) calls `phonological syntax', as opposed to `lexical syntax'. In phonological syntax, the units, like the letters in a written word, have no independent meaning. In lexical syntax, the units, such as the words in an English sentence, have meanings which contribute to the overall meaning of the whole signal. Many bird species can learn songs with phonological syntax. Oscine birds, which learn complex songs, are very distant relatives of humans. Many other birds, and more closely related species, including most mammals, do not produce calls composed of independent subunits. Our closest relatives, the apes, do produce long calls composed of subunits. The long calls of gibbons are markers of individual identity, for advertising or defending territory. The subunit notes, used in isolation, out of the context of long calls, are used in connection with territorial aggression, and it is not clear whether the meanings of these notes can be composed by any plausible operation to yield the identity-denoting meaning of the whole signal.

Male gibbon singing performances are notable for their extreme versatility. Precise copies of songs are rarely repeated consecutively, and the song repertoires of individual males are very large. Despite this variability, rules govern the internal structure of songs. Male gibbons employ a discrete number of notes to construct songs. Songs are not formed through a random assortment of notes. The use of note types varies as a function of position, and transitions between note types are nonrandom (Mitani and Marler, 1989:35).

Although it is fair to call such abilities in apes `pre-syntactic', they are still far removed from the humaability to organize sequences of words into complex hierarchically organized sentences. Little is known about the ability of apes to learn hierarchically structured behaviours, although all researchers seem to expect apes to be less proficient at it than humans; see Byrne and Russon (1998) and Whiten (2000) for some discussion.

0.2.3 Pre-semantic capacities

Basic concept formation. Many species lead simple lives, compared to humans, and even to apes, and so may not possess very many concepts, but they do nevertheless possess them. ``Perceptual categorisation and the retention of inner descriptions of objects are intrinsic characteristics of brain function in many other animals apart from the anthropoid apes.'' (Walker, 1983:378) The difference between humans and other animals in terms of their inventories of concepts is quantitative. Animals have the concepts that they need, adapted to their own physiology and ecological niche. What is so surprising about humans is how many concepts they have, or are capable of acquiring, and that these concepts can go well beyond the range of what is immediately useful. Basic concrete concepts, constituting an elementary pre-semantic capacity, were possessed by our remote ancestors. (A good survey appears in chapter 18 of Jolly (1985); see also Allen and Hauser (1991).)

Something related to voluntary control is also relevant to pre-semantic abilities. We need not be stimulated by the presence of an object for a concept of it to be evoked. Some animals may have this to a limited degree. When an animal sets off to its usual foraging ground, it knows where it is going, because it can get there from many different places, and even take new routes. So the animal entertains a concept of a place other than where it currently is. But for full human language to have taken off, a way had to evolve of mentally reviewing ones thoughts in a much more free-ranging way than animals seem to use.

Complex concept formation. The ability to form complex conceptual structures, composed systematically of parts, is crucial to human language. Logical predicate-argument structure underlies the messages transmitted by language. The words comprising human sentences typically correspond to elements of a conceptual/logical representation. While apes may perhaps not be capable of storing such complex structures as humans, it seems certain that they have mental representations in predicate-argument form. Simply attending to an object is analogous to assigning a mental variable to it, which functions as the argument of any predicate expressing a judgement made by the animal. The two processes of attending to an object and forming some judgement about it are neurologically separate, involving different pathways (dorsal and ventral) in the brain. This is true not only for humans, but also for apes and closely related monkeys as well. (See argument and references in Hurford (2002).) It seems certain that all species closely related to humans, and many species more distantly related, have at least this representational capacity, which is a pre-semantic preadaptation for language.

Mental calculation. Humans are not the only species capable of reasoning from experienced facts to predictions about non-experienced states of affairs. There is a large literature on problem-solving by animals, leading to ranking of various species according to how well they perform in some task involving simple inference from recent experience. See Krushinsky (1965) for a well-known example. Apes and monkeys perform closest to humans in problem-solving, but their inferential ability falls short of human attainment.

0.2.4 Pre-pragmatic capacities

Mind-reading and manipulation. When a human hears an utterance, he has to figure out what the speaker intended; this is mind-reading. When a human speaks, she does so with some estimation of how her hearer will react; this is social manipulation. Humans have especially well developed capacities for social manipulation and mind-reading, and these evolved from similar abilities in our ancestors, still visible in apes. Social intelligence, a well developed ability to understand and predict the actions of fellow members of the group, was a necessary prerequisite for the emergence of language. Recent studies amply demonstrate these manipulation and mind-reading abilities in chimpanzees (Byrne and Whiten (1988), de Waal (1982, 1989), Hare, Call and Tomasello (2001)). Cooperation. People can understand the intended import of statements whose literal meanings are somehow inappropriate, such as `It's cold in here', intended as a request to close the window. To explain how we cope with such indirectness, traditional logic has to be supplemented by the Cooperative Principle (Grice, 1975), which stipulates that language users try to be helpful in specified ways. The use of language requires this basis of cooperativeness. No such complex communication system could have evolved without reliable cooperativeness between users.

Humans are near the top of the range of cooperativeness. The basis of cooperation in social insects is entirely innate, and the range of cooperative behaviours is small. In humans, building onto a general natural disposition to be cooperative, cooperation on a wide range of specific group enterprises is culturally transmitted. Children are taught to be `team players'. No concerted instruction in cooperation exists outside humans, but there are reports of cases where an animal appears to be punished for some transgression of cooperativeness (Hauser, 1996:107-109). So the basis for cooperative behaviour, and for the instilling of such behaviour in others, exists in species closely related to humans. Common Chimpanzees and bonobos, in particular, frequently engage in reconciliation and peacemaking behaviour (de Waal, 1988, 1989). Dispositions to cooperation and maintenance of group cohesion are pragmatic cognitive preadaptations for language.

Joint attention. Cats are inept at following a pointing finger; dogs are better. Language is also used to `point at' things, both directly and indirectly. Linguists and philosophers call this `reference'. When a speaker refers to some other person, say by using a personal pronoun, the intention is to get the hearer to attend to this other person. Successful use of language demands an ability to know what the speaker is talking about. A mechanism for establishing joint attention is necessary. Human babies and children are adept at gaze- and finger-following (Franco and Butterworth, 1996; Morales et al., 2000; Charman et al, 2000) The fact that humans, uniquely, have whites to their eyes, probably helps us to figure out what other people are looking at.

Primates more closely related to humans are better at following human gaze than those less closely related (Itakura, 1996). Chimpanzees follow human gaze cues, while non-ape species such as macaques fail to follow human gaze cues. But experiments on rhesus macaques interacting with other rhesus macaques show that these animals do follow the gaze of conspecifics (Emery et al., 1997). Spontaneous pointing has also been observed in captive common chimpanzees (who had not received language training) (Leavens, Hopkins and Bard, 1996) and in young free-ranging orangutans (Bard, 1992). It thus appears that animals close to humans possess much of the cognitive apparatus for establishing joint attention, which is the basis of reference in language.

Ritualized action. Short greetings such as Hello! and Hi! are just act-performing words; they don't describe anything, and they can't be said to be true or false. We can find exactly such act-performing signals in certain ritualized actions of animals. The classic example of a ritualized action is the snarling baring of the teeth by dogs, which need not (now) precede an imminent attack, and is a sign of hostility. Human ritualized expressions such as Hello are relics of ancient animal behaviour, mostly now clothed in the phonemes of the relevant language. But some human ritualized expressions, such as the alveolar click, `tsk', indicating disapproval, are not assimilated into the phonology of their language (in this case English). The classic discussion of ritualization in animal behaviour is Tinbergen (1952), who noted the signal's `emancipation' from its original context. This process of dissociation between the form of the signal and its meaning can be seen as the basis of the capacity to form arbitrary asociations between signals and their meanings, discussed in the next section. (See Haiman (1994) for a more extended argument that ritualization is a central process in language evolution.)

0.2.5 Elementary symbolic capacity.

The sound of the word tree, for instance, has no iconic similarity with any property of a tree. This kind of arbitrary association is central to language. Linguistic symbols are entirely learned. This excludes from language proper any possible universally instinctive cries, such as screams of pain or whimpers of fear. In the wild, there are many animals with limited repertoires of calls indicating the affective state of the animal. In some cases, such calls also relate systematically to constant aspects of the environment. The best-known example is the vervet monkey alarm system, with distinctive calls for different classes of predator. There is no evidence that such calls are learned to any significant degree. Thus no animal calls, as made in the wild, can, as yet, be taken as showing an ability to learn an arbitrary mapping from signal to message.

Trained animals, on the other hand, especially apes, have been shown to be capable of acquiring arbitrary mappings between concepts and signals. The acquired vocabularies of trained apes are comparable to those of four-year old children, with hundreds of learned items. An ape can make a mental link between an abstract symbol and some object or action, but the circumstances of wild life never nurture this ability, and it remains undeveloped.

The earliest use of arbitrary symbols in our species was perhaps to indicate personal identity (Knight, 1998, 2000). They replaced non-symbolic indicators of status such as physical size, and involuntary indexes such as plumage displays. In gibbons, territorial calls also have features which can indicate sex, rank and (un)mated condition (Cowlishaw, 1992; Raemaekers, Raemaekers and Haimoff, 1984).

The duetting long call behaviour of chimpanzees and bonobos, where one animal matches its call to that of another, indicates some transferrability of the calls between individuals, and an element of learning. But such duetting is probably `parrot-like', in that the imitating animal is not attempting to convey the `meaning' (e.g. rank, identity) of the imitated call. The duetting behaviour is not evidence of transfer of symbolic behaviour from one individual to another. Probably the duetting behaviour itself has some social/pragmatic significance, perhaps similar to grooming.

In humans the ability to trade conversationally in symbols comes naturally. Even humans have some difficulty when the symbol clashes with its meaning, for example if the word red is printed in green. Humans can overcome such difficulties and make a response to the symbol take precedence over the response to the thing. But chimpanzees apparently cannot suppress an instinctive response to concrete stimuli in favour of response to symbols. With few exceptions, even trained apes only indulge in symbolic behaviour to satisfy immediate desires. The circumstances of wild chimpanzee life have not led to the evolution of a species of animal with a high readiness or willingness (as with humans) to use symbols, even though the rudiments of symbolic ability are present.

All of these preadaptations illustrate cases where some ability crucial to developed human language was present, if to a lesser degree, in our prelinguistic ancestors. Note that the levels of linguistic structure where language interfaces with the outside world, namely phonetics, semantics and pragmatics, were (apart from motor control of speech) in all likelihood relatively closer to modern human abilities than the `core' levels of linguistic structure, namely phonology and morphosyntax. The elaborated phonology and syntax so characteristic of full human language came late to the scene. In modern humans, syntactic and phonological organization of utterances, though learned, is largely automatic, not under conscious control. In a sense, then, language evolved `from the outside in'; the story is of a widening gap, bridged by learnable automatic processes, between a signaller's intentions (meanings) and the signal itself. Near the beginning, there were only simple calls analogous to English Hello, in which an atomic signal is directly mapped onto an atomic social act. Every human utterance is still a speech act of some sort. We now have the possibility of highly sophisticated speech acts, whose interpretation involves decoding of a complex signal into a complex conceptual representation, accompanied by complex calculations to derive the likely intended social force of the utterance. The crucial last biological step towards modern human language capacity was the arrival of a brain capable of acquiring a much more complex mapping between signals and conceptual representations, giving rise to the possibility of the signals and the conceptual representations themselves growing in complexity. In the first generations after the arrival of a brain capable of acquiring such a complex mapping, communication was not necessarily much more complex. The actual complex structures that we now find in the communication systems (i.e. languages) of populations endowed with such brains may have taken some time to emerge. The mechanisms by which languages grew in biologically language-ready populations will be discussed in the next section.

0.3 Cultural evolution of languages

0.3.1 The Two-Phase Nature of Language Transmission

I have referred earlier to the `phenomena of human language'. Modern linguistics focusses equally, if not more, on the noumena of language. A noumenon/phenomenon distinction pervades linguistics from Saussure's langue and parole, through Chomsky's competence and performance to his later I(nternal)-language and E(xternal)-language. Chomsky's postulation of competence attributes psychological reality to the language system, held in individual minds. This contrasts with Saussure's characterization of langue as an entity somehow belonging to the language community. The move to individual psychological reality paved the way for an explanatory link between the evolution of language and biological evolution. Modern linguistics, preoccupied with synchronic competence, has yet to realize the potential for explaining both linguistic phenomena and linguistic noumena in terms of a cyclic relationship between the two, spiralling through time.

Spoken utterances and particular speech acts located in space and time are produced by speakers guided by knowledge of grammatical well-formedness, paraphrase relations and ambiguity. This knowledge was in turn formed in response to earlier spoken utterances and speech acts, as users acquired their language. Modern linguistics has tended to characterize the overt phenomena of language, the spatio-temporal events of primary linguistic data (PLD), as `degenerate', and of little theoretical interest. The burden of maintaining the system of a language, as it is transmitted across generations, has been thrust almost wholly onto the postulated innate cognitive apparatus, which makes sense of the allegedly chaotic data in similar ways in all corners of the globe, resulting in linguistic universals.

Clearly humans are innately equipped with unique mental capacities for acquiring language. Language emerges from an interaction between minds and external events. The proportions of the innate cognitive contribution and the contribution due to empirically available patterns in the stimuli remain to be discovered. Methodologically, it is much harder to study performance data systematically, as this requires copious corpus-collecting, and it is not a priori obvious what to collect and how to represent it. In transcribing the linguistic data input to a child, it is highly probable that the transcriber imposes decisions informed by his own knowledge, and thus the true raw material which a child processes is not represented. This difficulty contrasts with the study and systematization of adult linguistic intuitions, accomplished from the armchair. But the intractability of the data giving rise to adult linguistic intuitions does not imply that the only proper object of study is linguistic competence. Because language emerges from the interaction of minds and data, linguistics must concern itself with both phases in this life-cycle.

This view of language as a cyclic interaction across generations between I-language and E-Language, has been taken up by historical linguists. Rather than postulating abstract laws of linguistic change, they (e.g. Andersen, 1973; Lightfoot, 1999) appeal to principles relating the spoken output of one generation to the acquired knowledge of the next. This is a healthy development. Historical linguistics, however, is concerned with explaining language change as it can be observed (or deduced) from extant data, either ongoing changes or reconstructed changes hypothesized from comparing related languages and dialects. Historical linguistics is not, in general, concerned with accounting for the emergence of modern complex forms of language from earlier simpler forms. As such, historical linguistics typically makes `uniformitarian' assumptions (see Newmeyer (2002) and Deutscher (1999) for a discussion of uniformitarianism). By contrast, one task of evolutionary linguistics is to work out how modern complex linguistic systems could have arisen from simpler origins, using the cyclic interaction between spatiotemporal data and acquired grammars as its central explanatory device. This task has been undertaken from two quite different directions, by theorists of grammaticalization and computer modellers working with the `iterated learning model' (ILM). I discuss these briefly below.

0.3.2 Grammaticalization

At the heart of the grammaticalization theory is the idea that syntactic organization, and the overt markers associated with it, emerges from nonsyntactic, principally lexical and discourse, organization. The mechanism of this emergence is the spiralling interaction of the two phases of a language's existence, I-Language and E-Language. Through frequent use of a particular word, that word acquires a specialized grammatical role that it did not have before. And in some cases this new function of the word is the first instance of this function being fulfilled at all, in the language concerned. Clear examples are seen in the emergence of Tok Pisin, the Papua New Guinea creole. In Tok Pisin, -fela (or -pela) is a suffix indicating adjectival function, as in niupela `new', retpela `red', gutpela `good'. This form is clearly derived from the English noun fellow, a noun not originally identified with any particular grammatical function, other than those associated with all nouns. Grammaticalization occurs in the histories of all languages, not just in the creolization process.

Grammaticalization theory has largely been pursued by scholars concerned with relatively recent changes in languages (Traugott and Heine, 1991; Hopper and Traugott, 1993; Traugott, 1994; Pagliuca, 1994). In keeping with a general reticence to speculate about the remote past, most grammaticalization theorists have not theorized about the very earliest languages and the paths from them to modern languages. Nevertheless, a recurrent central theme in grammaticalization studies is unidirectionality. The general trend of grammaticalization processes is all in one direction. Occasionally there may be changes in the opposite direction, but these are infrequent, and amply outnumbered by changes in the typical direction. It follows that the general nature of languages must have also changed over time, as languages accumulated more and more grammaticalized forms. Heine is one of the few grammaticalizatiion theorists who has speculated about what this implies for the likely shape of the earliest languages.

`` ... on the basis of findings in grammaticalization studies, we have argued that languages in the historically non-reconstructible past may have been different -- in a systematic way -- from present-day languages. We have proposed particular sequences of the evolution of grammatical structures which enable us to reconstruct earlier stages of human language(s). ... such evolutions lead in a principled way from concrete lexical items to abstract morphosyntactic forms. [This] suggests, on the one hand, that grammatical forms such as case inflections or agreement and voice markers did not fall from heaven; rather they can be shown to be the result of gradual evolutions. Much more importantly, [this] also suggests that at the earliest conceivable stage, human language(s) might have lacked grammatical forms such as case inflections, agreement, voice markers, etc. so that there might have existed only two types of linguistic entities: one denoting thing-like time stable entities (i.e. nouns), and another one for non-time stable concepts such as events (i.e. verbs).'' (Heine and Kuteva, 2002:394)

To stimulate discussion, I will be at least as bold as Heine, and offer the following suggestions about what earlier stages of human languages were like, based on the unidirectionality of grammaticalization processes. The origin of all grammatical morphemes (function words, inflections) is in lexical stems. This leads one to hypothesize that the earliest languages had: no articles (modern articles typically originate in demonstratives, or the numeral `one'); no auxiliaries (these derive from verbs); no complementizers (which may originate from verbs); no subordinating conjunctions (also likely to derive from verbs); no prepositions (deriving from nouns); no agreement markers (deriving from pronouns); no gender markers (deriving from noun classifiers, which in their turn derived from nouns); no numerals (from adjectives and nouns); no adjectives (from verbs and nouns).

In addition, I speculate that the earliest languages had: no proper names (but merely definite descriptions); no illocution markers (such as please); no subordinate clauses, or hypotaxis; no derivational morphology; less differentiation of syntactic classes (perhaps not even noun and verb); and less differentiation of Subject and Topic. All this is characteristic of (unstable) pidgins and reminiscent of Bickerton's construct `protolanguage', a crude Pidgin-like form of communication with no function words or grammatical morphemes. Still in the syntactic domain, Newmeyer (2000) has theorized that all the earliest languages were SOV, (once they had the noun/verb distinction).

In keeping with ideas from grammaticalization theory about meaning, the earliest languages would have had, in their semantics: no metaphor; no polysemy; no abstract nouns; fewer subjective meanings (e.g.epistemic modals); less lexical differentiation (e.g. hand/arm, saunter/stroll/amble); fewer hyponyms and superordinate terms.

One can apply similar ideas in phonology. Probably the earliest languages had simple vowel systems and only CV syllable structure. See the next subsection for mention of computer modelling of the emergence of phonological structure, via the cyclic two-phase mechanism of language transmission.

0.3.3 Computer modelling of language evolution

Grammaticalization theorists work backward from modern languages, via known processes of linguistic change, toward earlier, simpler stages of language. By contrast, computer modellers of emerging language start from simulated populations with no language at all, and their simulations can lead to interesting results in which the populations have converged on coordinated communicative codes which, though still extremely simple, share noteworthy characteristics with human language. Some examples of such work are Batali (1998, 2002), Kirby (2000, 2002), Hurford (2000), Teal and Taylor (1999), Tonkes and Wiles (2002). A survey of some of these works, analyzing their principal dimensions, and the issues they raise, appears in Hurford (2002). Hurford refers to this class of computer models as `Expression/Induction' (E/I) models; Kirby has rechristened this general class `iterative learning models (ILMs), a term which seems likely to gain currency. There is a noticeable trend in recent computer simulations of language evolution away from modelling of the biological evolution of features of the language acquisition device (e.g. Hurford (1989, 1991), Batali (1994)). More recent simulations (e.g. those cited earlier in this paragraph) typically model the evolution of languages, via iterated learning. Such studies, moreover, do not typically attempt to `put everything together' and reach a full language-like outcome; rather they explore the interactions between pairs of strictly isolated factors relevant to the iterated learning model. An example of such work is Brighton and Kirby (2001).

Language has not always existed. Hence there is a puzzle concerning what behaviour the first speakers of a language used as a model in their learning. Computer modelling studies have addressed this problem, using simulations in which individuals have a limited capacity for random invention of linguistic forms corresponding to given (pre-existing) meanings. Massive advances in computing power make it possible to simulate the complex interactive dynamics of language learning by children and their subsequent language behaviour as adults, which in turn becomes the model for learning by the next generation of children. It is now possible to simulate not only the learning of a somewhat complex communication system by a single individual, on the basis of a corpus of presented examples of meaning-form pairs, but to embed such individual learning processes in a population of several hundred individuals (each of whose learning is also simulated) and to simulate the repetition of this population-wide process over many historical generations.

The cited research has implemented such simulations with some success in evolving syntactic systems which resemble natural language grammars in basic respects. This research can be seen as a step up from the preceding paradigm of generative grammar. In early generative grammar, the researcher's task was to postulate systems of rules generating all and only the grammatical sentences of the language under investigation. Early generative grammars were somewhat rigorously specified, and it was possible in some cases to check the accuracy of the predictions of the grammar. But, whether rigorously specified or not, the grammars were always postulated. How the grammars themselves came to exist was not explained, except by the quite vague claim that they were consistent with the current theory of the innate Language Acquisition Device. The recent simulation studies, while still in their infancy, can legitimately claim to embody rigorous claims about the precise psychological and social conditions in which grammars themselves evolve.

This strand of computational simulation research has the potential to clarify the essentials of the interaction between (a) the psychological capacities of language learners and (b) the historical dynamics of populations of learners giving rise to complex grammars resembling the grammars of real natural languages. In such simulations, a population of agents begins with no shared system of communication. The agents are `innately' endowed with certain competencies, typically including control of a space of possible meanings, an inventory of possible signals, and a capacity for acquiring grammars of certain specified sorts on exposure to examples of meaning-signal pairs. The simulations typically proceed with each generation learning from its predecessor, on the basis of observation of its communicative behaviour. At first, there is no coherent communicative behaviour in the simulated population. Over time, a coherent shared syntactic system emerges. The syntactic systems which have been achieved in this research paradigm are all, of course, simpler than real attested languages, but nevertheless possess many of the central traits of natural language syntactic organization, including recursivity, compositionality of meaning, asymmetric distribution of regular and irregular forms according to frequency, grammatical functional elements with no denotational meaning, grammatical markers of passive voice and of reflexivity, and elementary partitioning into phrases.

There has been less computer simulation of the evolution of phonological systems, but what exists is impressive. De Boer (2001) manages to approximate to the distribution of vowels systems in the languages of the world through a model in which individual agents exchange utterances and learn from each other. An early computational study (Lindblom, MacNeilage and Studdert-Kennedy, 1984) can be interpreted as modelling the processes by which syllables become organized into structured CV sequences of segments, where the emergent selected consonants and vowels are drawn from economical symmetrical sets, as is typical of actual languages.

Computer simulations, within the iterated learning framework, starkly reveal what Keller (1994) has called `phenomena of the third kind', and Adam Smith (1786) attributed to an `Invisible Hand'. Languages are neither natural kinds, like plants and animals, nor artefacts, deliberate creations of humans, like houses and cars. Phenomena of the third kind result from the summed independent actions of individuals, but are not intentionally constructed by any individual. Ant trails and bird flocks are phenomena of the third kind, and so, Keller persuasively argues, are languages. Simulations within the ILM framework strip the interaction between individuals down to a bare minimum which from which language-like systems can be shown to emerge. The key property of these models is that each new generation learns its language from a restricted set of exemplars produced by the preceding generation.

One of the most striking results of this work is this: in a population capable of both rote-learning and acquisition of rules generalizing over recurrent patterns in form-meaning mapping, a pressure exists toward an eventual emergent language that expresses meanings compositionally. No calculation of an individual agent's fitness is involved, nor does any consideration of the communicative efficacy of the language play a part. The convergence on `efficient' languages is essentially a mathematical outcome of the framework, analogous to the hexagonal cells of honeycombs. At least some of the regular compositional patterning we see in languages is the result, not of humans having an inbuilt bias towards learning languages of a certain type, but of the simple fact that languages are passed on from one generation to the next via a limited channel, a `bottleneck'. As Daniel Dennett has remarked (personal communication), this turns the familar `poverty of the stimulus' argument in relation to language acquisition on its head. The poverty of stimulus argument appealed to an intuition that human languages are learned from surprisingly scanty data. Work in the iterated learning framework shows that in order for regular compositional language to emerge, a bottleneck between the adult providers of exemplary data and the child learner is necessary. Interesting experiments show that in these models, overprovision of data (i.e. practically no bottleneck), results in no convergence on a regular compositional language.

These two strands of research, grammaticalization studies and computer modelling within the ILM, are at present quite distinct, and followed by non-overlapping research communities. Computer modellers typically come from backgrounds in artificial intelligence, and know little Latin and less Greek (to put it kindly); grammaticalization theorists come predominantly from humanities backgrounds, and have difficulty conceptualizing computer models. These two research strands will ultimately converge. When they do converge, they should also converge on the attested facts of historical change and creolization.

0.4 Last Words

In the last two decades, new techniques, such as gene sequencing, massive computer simulation, and the various brain imaging methods, have flashed light on intriguing features scarcely contemplated before. But these flashlights are highly selective in their illumination, each gathering reflections from only a few dimensions of the hugely multidimensional space of language structure and use. Language evolution research must continue to feed, voraciously and eclectically, on the results from a very wide range of disciplines. The study of language origins and evolution is harder than molecular biology, physical anthropology or language acquisition research, for example, because, at various levels, it draws on all of these, and more. We now understand far more about questions of language origins and evolution than has ever been understood before. But precisely because we can now begin to grasp the nature of the questions better, we also know that good answers are even more elusive than we thought.

Key further readings

As background readings relevant to many of the issues raised under the heading of `Preadaptation' in this paper, I suggest the papers in the three edited collections, Hurford, Studdert-Kennedy and Knight (1998), Knight, Studdert-Kennedy and Hurford (2000) and Wray (2002). For work on grammaticalization and related theoretical positions, I suggest de Boer (2001), Hopper and Traugott (1993), Pagliuca (1994), and Traugott and Heine (1991). For work on computational simulations of language evolution, I suggest Briscoe (2002) and Parisi and Cangelosi (2001).

References

Allen, C. and Hauser, M.D. 1991. Concept attribution in nonhuman animals: theoretical and methodological problems in ascribing complex mental processes. Philosophy of Science, 58:221-240.

Andersen, H. 1973. Abductive and deductive change. Language, 40:765-793.

Bard, K.A. 1992. Intentional behavior and intentional communication in young free-ranging orangutans. Child Development, 63(5):1186-1197.

Batali, John. 1994. Innate biases and critical periods: Combining evolution and learning in the acquisition of syntax. In R.Brooks and P.Maes (Eds.), Artificial Life 4: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems. Redwood City, CA: Addison-Wesley. 160-171.

----- 1998. Computational simulations of the emergence of grammar. In J.R. Hurford, M.Studdert-Kennedy, and C.Knight (Eds.), Approaches to the Evolution of Language: social and cognitive bases. Cambridge: Cambridge University Press. 405-426.

----- (in press). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In E.Briscoe (Ed.), Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge: Cambridge University Press.

Brighton, Henry, and Kirby, Simon. 2001. The survival of the smallest: Stability conditions for the cultural evolution of compositional language. In J.Kelemen and P.Sosik (Eds.), Advances in Artificial Life: (Proceedings of the 6th European Conference on Artificial Life). Heidelberg: Springer-Verlag.

Briscoe, E. (Ed.) 2002. Linguistic Evolution through Language Acquisition:formal and computational models. Cambridge: Cambridge University Press.

Byrne, Richard W., and Russon, A.E. 1998. Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21(5):667-721.

----- and Whiten, A. 1988. Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Clarendon Press.

Charman, T., Baron-Cohen,S., Swettenham, J., Baird, G., Cox, A., and Drew, A. 2000. Testing joint attention, imitation, and play as infancy precursors to language and theory of mind. Cognitive Development, 15(4):481-498.

Cowlishaw, G. 1992. Song function in gibbons. Behavior, 121:131-153. de Boer, Bart. 2001. The Origins of Vowel Systems. Oxford: Oxford University Press.

de Waal, Frans B.M. 1982. Chimpanzee Politics. London: Jonathan Cape.

----- 1988. The communicative repertoire of captive bonobos (Pan paniscus), compared to that of chimpanzees. Behaviour, 106:183-251.

----- 1989. Peacemaking among Primates. Cambridge, MA: Harvard University Press.

Deutscher, Guy. 1999. The different faces of uniformitarianism. Paper read at the 14th International Conference on Historical Linguistics, Vancouver.

Emery, N.J., Lorincz, E.N., Perrett, D.I., Oram, M.W. and Baker, C.I. 1997. Gaze following and joint attention in rhesus monkeys (Macaca mulatta). Journal of Comparative Psychology, 111(3):286-293.

Franco, F. and Butterworth, G.E. 1996. Pointing and social awareness: declaring and requesting in the second year of life. Journal of Child Language, 23:307-336.

Grice, H.Paul. 1975. Logic and conversation. In P.Cole (Ed.), Syntax and Semantics, Volume 3. New York: Academic Press. 41-58.

Haiman, John. 1994. Ritualization and the development of language. In W.Pagliuca (Ed.), Perspectives on Grammaticalization Amsterdam: John Benjamins. Series: Current Issues in Linguistic Theory, Vol.109. 3-28.

Hare, B., Call, J., and Tomasello, Michael. 2001. Do chimpanzees know what conspecifics know? Animal Behaviour, 61(1):139-151.

Hauser, Marc D. 1996. The Evolution of Communication. Cambridge, MA: MIT Press.

Heine, Bernd, and Kuteva, Tania. 2002. On the evolution of grammatical forms. In A.Wray (Ed.), The Transition to Language. Oxford: Oxford University Press. 376-397.

Hopper, Paul J. and Traugott, Elizabeth C. 1993. Grammaticalization. Cambridge: Cambridge University Press.

Hurford, James R. 1989. Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua,77:187-222.

----- 1991. The evolution of critical period for language acquisition. Cognition, 40:159-201.

-----, Studdert-Kennedy, M. and Knight, C. (Eds) 1998. Approaches to the Evolution of Language: social and cognitive bases. Cambridge: Cambridge University Press.

----- 2000. Social transmission favours linguistic generalization. In C.Knight, M.Studdert-Kennedy, and J.Hurford (Eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form. Cambridge: Cambridge University Press. 324-352.

----- 2003. The neural basis of predicate argument structure. Submitted to Behavioral and Brain Sciences.

----- 2002. Expression/induction models of language evolution: dimensions and issues. In T.Briscoe (Ed.), Linguistic Evolution Through Language Acquisition: Formal and Computational Models. Cambridge: Cambridge University Press. 301-344.

Itakura, S. 1996. An exploratory study of gaze-monitoring in nonhuman primates. Japanese Psychological Research, 38(3):174-180.

Jolly, A. 1985. The Evolution of Primate Behavior (second ed.). New York: Macmillan.

Kauffman, S.A. 1993. The origins of order: self organization and selection in evolution. Oxford: Oxford University Press.

----- 1995. At home in the universe: the search for laws of self-organization and complexity. Oxford: Oxford University Press.

Keller, R. 1994. On Language Change: the Invisible Hand in Language. London: Routledge.

Kirby, Simon. 2000. Syntax without natural selection: how compositionality emerges from vocabulary in a population of learners. In C.Knight, M.Studdert-Kennedy, and J.R. Hurford (Eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge: Cambridge University Press. 303-323.

----- (in press). Learning, bottlenecks and the evolution of recursive syntax. In T.Briscoe (Ed.), Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge: Cambridge University Press. 173-204.

Knight, C. 1998. Ritual/speech coevolution: a solution to the problem of deception. In J.R. Hurford, M.Studdert-Kennedy, and C.Knight (Eds.), Approaches to the Evolution of Language: Social and Cognitive Bases, Cambridge: Cambridge University Press. 68-91.

----- 2000. Play as precursor of phonology and syntax. In C.Knight, M.Studdert-Kennedy, and J.R. Hurford (Eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge: Cambridge University Press. 99-119.

-----, Studdert-Kennedy, M. and Hurford, J.R. (Eds) 2000. The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge: Cambridge University Press.

Krushinsky, L.V. 1965. Solution of elementary logical problems by animals on the basis of extrapolation. Progress in Brain Research, 17:280-308.

Leavens, D.A., Hopkins, W.D., and Bard, K.A. 1996. Indexical and referential pointing in chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 110(4):346-353.

Lightfoot, D. 1999. The Development of Language: Acquisition, Change, and Evolution. Oxford: Blackwell.

Lindblom, B., MacNeilage, P., and Studdert-Kennedy, M. 1984. Self-organizing processes and the explanation of phonological universals. In B.Butterworth, B.Comrie, and Östen Dahl (Eds.), Explanations for Language Universals, Berlin: Mouton. 181-203.

Marler, P. 1977. The structure of animal communication sounds. In T.H. Bullock (Ed.), Recognition of Complex Acoustic Signals (Report of Dahlem Workshop). Berlin: Dahlem Konferenzen.

Maynard Smith, J. and Szathmáry, E. 1995. The major transitions in evolution. Oxford University Press.

Mitani, J.C. and Marler, P. 1989. A phonological analysis of male gibbon singing behavior. Behaviour, 109:20-45.

Morales, M., Mundy, P., Delgado, C.E.F., Yale, M., Neal, R., and Schwartz, H.K. 2000. Gaze following, temperament and language development in 6-month-olds: A replication and extension. Infant Behavior and Development, 23(2):231-236.

Newmeyer, F.J. 2000. On reconstructing ``proto-world'' word order. In C.Knight, J.R. Hurford, and M.Studdert-Kennedy (Eds.), Evolutionary Emergence of Language: Social Functions and the Origins of Linguistic Form, Cambridge: Cambridge University Press. 372-388.

----- 2002. Uniformitarian assumptions and language evolution research. In A.Wray (Ed.), The Transition to Language. Oxford: Oxford University Press. 359-375.

Pagliuca, W. (Ed.) 1994. Perspectives on Grammaticalization. Amsterdam: John Benjamins. Series: Current Issues in Linguistic Theory, Vol.109.

Parisi, D., and Cangelosi, A. (Eds) 2001. Simulating the Evolution of Language. Berlin: Springer Verlag.

Raemaekers, J.J., Raemaekers, P.M., and Haimoff, E.H. 1984. Loud calls of the gibbons (Hylobates lar): repertoire, organization and context. Behavior, 91:146-189.

Smith, Adam. 1786. An inquiry into the nature and causes of the wealth of nations: in three volumes. London: A. Strahan, and T. Cadell. (fifth edition).

Teal, T. and Taylor, C. 1999. Compression and adaptation. In D.Floreano, J.D. Nicoud, and F.Mondada (Eds.), Advances in Artificial Life, Number 1674, in Lecture Notes in Computer Science. Springer.

Thompson, D. 1961. On Growth and Form. Cambridge: Cambridge University Press. Abridged edition, edited by J.T.Bonner.

Tinbergen, N. 1952. `Derived' activities: their causation, biological significance, origin and emancipation during evolution. Quarterly Review of Biology, 27:1-32.

Tonkes, B. and Wiles, J. 2002. Minimally biased learners and the emergence of language. In A.Wray (Ed.), The Transition to Language. Oxford: Oxford University Press. 226-251.

Traugott, E.C. 1994. Grammaticalization and lexicalization. In R.E. Asher and J.M.Y. Simpson (Eds.), The Encyclopedia of Language and Linguistics, Oxford: Pergamon Press. 1481-1486.

Traugott, E.C. and Heine, B. (Eds.) 1991. Approaches to Grammaticalization, Volumes I and II. Amsterdam: John Benjamins. In Series: Typological Studies in Language.

Walker, S. 1983. Animal Thought. London: Routledge and Kegan Paul.

West, G.B., Brown, J.H., and Enquist, B.J. 1997. A general model for the origin of allometric scaling laws in biology. Science, 276:122-126.

Whiten, A. 2000. Primate culture and social learning. Cognitive Science, 24(3):477-508.

Wray, A. (Ed.) 2002. The Transition to Language. Oxford: Oxford University Press.