Genetics and language: a neurobiological perspective on the missing link (-ing hypotheses)

The paper argues that both evolutionary and genetic approaches to studying the biological foundations of speech and language could benefit from fractionating the problem at a finer grain, aiming not to map genetics to “language”—or even subdomains of language such as “phonology” or “syntax”—but rather to link genetic results to component formal operations that underlie processing the comprehension and production of linguistic representations. Neuroanatomic and neurophysiological research suggests that language processing is broken down in space (distributed functional anatomy along concurrent pathways) and time (concurrent processing on multiple time scales). These parallel neuronal pathways and their local circuits form the infrastructure of speech and language and are the actual targets of evolution/genetics. Therefore, investigating the mapping from gene to brain circuit to linguistic phenotype at the level of generic computational operations (subroutines actually executable in these circuits) stands to provide a new perspective on the biological foundations in the healthy and challenged brain.


The empirical bases of the biology of language
The study of the biological foundations of speech and language processing is a lively and controversial area of research. Broadly speaking, there are three areas of biology that are widely discussed: evolutionary biology (under which one can include "evo devo" as well as comparative ethology), genetics, and neurobiology. All these areas of investigation are stimulating vigorous debate (e.g., Rice 1996Rice , 1997Pinker 1999;Jackendoff 2003;Chomsky 2007;Larson et al. 2010;Fitch 2010;DiSciullo and Boeckx 2011;Maggie and Gibson 2012). Notwithstanding the enthusiasm for the questions and controversies, the empirical basis of much of the research is relatively thin-at least if one is looking at evolutionary biology and genetics. I outline some of the challenges to empirical research below.
The empirical restrictions for evolutionary biology are predictably complicated. The evolutionary biologist Richard Lewontin has argued that because we do not have access to earlier forms-at least with respect to overt, measurable, and quantifiable behavior or output, since the individuals are dead and their central nervous systems inaccessible-basically all accounts remain post hoc narrative interpretations of potential evolutionary scenarios, often called "just so stories" borrowing from the author Rudyard Kipling, totally unsatisfying from a mechanistic, explanatory point of view (Lewontin 1998). The spectrum of evolutionary scenarios that are entertained in this literature is wide (e.g., we first imitated animal sounds, or first made noises of cooperation, or first felt a need to gossip about conspecifics, or social motivation drives changes in the system, etc.), and the stories range from the plausible to the charming to the ridiculous (in part reviewed in Pinker and Bloom 1992; for current perspectives, see Larson et al. 2010). Recent considerations from evolutionary developmental biology provide a new angle (Chomsky 2010); nevertheless, these are largely data about structural properties of the biological system and structural similarities across species-and we remain largely in the dark about what putatively early functional forms of linguistic representation or communication did in fact look like. This area of research holds significant promise, but there are as yet no major results that one can point to as foundational for an understanding of the foundations of representation and computation in language.
A different angle on evolution, comparative ethology, has a relatively rich body of data, namely the extensive research on animal communication and cognition, but the results remain puzzling (and often equally speculative) in light of the system that is ultimately under investigation, human speech and language processing. This work is, to be sure, a potentially highly informative source of data on the biological infrastructure underpinning the human system; in particular, as researchers converge on what are the kinds of questions about speech/language that one can plausibly investigate in other species, this work will become increasingly influential. For example, it stands to reason that attributes of the input and output systems (i.e., sensory and motor processing) can be profitably tested in nonhuman preparations, on the view that much of the machinery is highly conserved across evolution (e.g., Hauser 1998). On the other hand, there are aspects of language processing that appear highly idiosyncratic, and it is not at all obvious how study of nonhuman species will inform us in this regard (for example, the operations underlying inflectional and derivational morphology, the representation of structurally conditioned long-distance relationships between reflexives and their antecedents, the nature of the words/concepts that are the "atoms" of language, and so on).
One strategy that has yielded results in comparative ethological research is focused on identifying the inventory of elementary operations underlying language processing. Assuming (reasonably) that the broader domains we use to talk about language processing (e.g., phonology, syntax, semantics) can be decomposed into more elementary operations (say, concatenation, labeling, construction of a constituent; see Poeppel & Embick 2005), one can begin to assess whether this inventory of elementary operations is available, usable, and used in different animal species. (The strategy to decompose language processing into elementary subroutines is further discussed below.) That is to say, perhaps we are not prepared (or interested) to attribute "syntax" to some species, but we would not be offended to characterize that animal as using, say, "concatenation" as an elementary operation to link units. A concrete example of research in this vein concerns recursion as a hypothesized primitive computational operation (Hauser et al. 2002;Fitch 2010). Similarly, research on the sensorimotor foundations of speech has tested parallels between human and animal systems.
In summary, the evolution of speech and language is widely discussed and hotly debated, but the empirical basis of the arguments remains problematic. One issue that has been an obstacle to more rapid progress is that there is no agreement on how to evaluate evolutionary hypotheses in this context because there is serious disagreement about the nature of the linguistic system itself. It is fair to say that the vast majority of scholars assume that the system (the "faculty of language") is quintessentially designed for communication. On this view, whatever parts make up the language system (phonology, morphology, syntax, lexical and compositional semantics), their major role is in the service of communication. Therefore, precursors and evolutionary scenarios are constructed with communication (a concept that itself is rather underspecified) as the critical causal centerpiece. In contrast, some research arising from a more linguistic theoretical tradition ("biolinguistics"; for a recent perspective, see Chomsky 2007) does not presuppose that communication lies at the core; rather, the computational and representational mental inventory is considered to be the key question. What is the "parts list" that forms the basis of the language system? What are putatively language-specific (domain specific) aspects (say morphology) and what aspects are arguably generic and likely to be shared with other species (say a long-term memory system)? On this kind of view, some of the subroutines may be used for communication whereas others may be serving other purposes (e.g., externalization of thought). Language is used for communication-but not necessarily "optimally designed" for this purpose. Indeed, some parts may make it more ill-suited for communication, e.g., the rampant ambiguity that exists in virtually every aspect of language processing. Its properties and quirks may derive from other aspects of psychology, say being able to combine constituent thoughts, generating new internal representations, etc. (Chomsky 2000). It goes without saying that the construction of alternative evolutionary scenarios is closely linked to the underlying theories about the system itself, and as a consequence, the hypotheses that have been articulated differ sharply between scholars adopting such different foundational perspectives, i.e., language as an internal computational system that can be used for communication versus language as a system evolved to optimize communication. To adjudicate between these options, one will want to consider the arguments for the two theoretical positions as well as hope for a considerable enrichment of the available evidence.
A different area of biological research in which the empirical basis is steadily growing is genetics. A number of papers in the special issue elegantly deal with the advances in this domain. Just a few general remarks are in order, to situate the work and motivate the perspective that is outlined below. The toolboxes of genetics have been used in different ways to study language. There is a small, albeit growing, amount of psycholinguistically motivated research focused on twin and adoption studies (Stromswold 2001). The bulk of the data, however, comes, predictably, from the study of speech and language disorders, which offer a useful entry point into genetic analyses. With regard to the input and output systems, there is interesting research on dyslexia and stuttering (e.g., Drayna, this volume, on stuttering); with regard to more central language systems, there is increasing interest in characterizing the genetics of specific language impairment (Rice, this volume). These areas of research are ably reviewed and discussed in the papers. The question that is briefly raised here concerns the relationship between potential results in genetic research and the online representation and computation of speech and language. Specifically, let us assume that the genetic studies succeed perfectly and that we have available a detailed map of the genetic foundations of the linguistic phenotype. Something very foundational is missing for causal models: an understanding of the actual circuits executing the processing.
Genes do not speak; people do. In particular, it is "the brain part of people" that does the speaking (and hearing, comprehending, and interpreting…). Therefore, to begin to have an explanatory understanding of how our genetic makeup underlies the structural and functional properties at the basis of speech and language-to have satisfying mechanistic linking hypotheses between linguistic behavior and its genetic foundations-the correlative data mediating between genetics and linguistics (and psychology/cognitive science, more generally) cannot do without an understanding of how the genome/epigenome relates to the neuronal circuitry that is the implementational infrastructure for cognition. I take it to be the goal of this research direction to provide a mapping from genetics to neural circuitry to computational neuroscience to language processing. Presumably, the yearning is to identify the genetic basis of the specific neural circuits (on whatever microscopic, mesoscopic, or macroscopic scale turns out to be relevant) that in turn constitute the basis for the operations that underpin speech and language, i.e., the representations and computations that lie at the foundation of the faculty of language.
The third area of biological inquiry, neurobiology, is obviously closely related to evolutionary biology and genetics-and inconceivable without these-but has a more immediate available basis for testing hypotheses about the relevant biological infrastructure. We now turn to an-admittedly parochial-perspective on some cognitive neuroscience foundations of speech perception and language comprehension, with the goal of providing examples of the kinds of circuits and operations that both evolutionary theory and genetic characterization ultimately must aim to capture.

Three ideas from the cognitive neuroscience of language
Here I briefly sketch three ideas that illustrate at what level of analysis both evolutionary and genetic considerations might be able to make some substantial progress. First, a more up-to-date functional anatomy of language is briefly presented. Second, some recent physiological data on timing are summarized, highlighting their potential relation to genetic work. Third, we discuss the problem of aligning the analytic constituents of language research and neurobiology: the granularity mismatch problem.
(a) The pathways of speech and language: functional neuroanatomy Until at least the late 1990s, consultation of a textbook on neurology, neuropsychology, neuroscience, linguistics, or any associated discipline showed the "classical model" of the brain basis of language: on a schematic of the left hemisphere were depicted Broca's area in the left inferior frontal lobe and Wernicke's area in the posterior superior temporal lobe. Typically, the frontal area was associated with aspects of production (and later, syntactic aspects of language processing), and the temporal area was associated with perception (and later, aspects of meaning). The classical intuition-rather behaviorist in nature in terms of its philosophical underpinning, i.e., language processing is reflexive-was that the input (speech) was taken in and processed in the posterior areas and then shuttled forwards to frontal areas for generating an output (production). There was no notion of internal representation or computation in any mentalistic context; rather, the most sophisticated models, such as Wernicke's, described the phenomena in terms of acoustic and motor images of words that were related to each other. It is now uncontroversial that this classical model, while immensely useful historically especially in a clinical context, is hopelessly underspecified biologically as well as linguistically . Experimental research deriving from noninvasive recording (fMRI, PET, EEG, MEG) as well as from patient research has enriched our understanding of the anatomic basis of speech perception and language comprehension and production. There are many more cortical and subcortical regions that we now know to play integral roles beyond the two classically defined regions; both left and right hemispheres are implicated across various tasks (e.g., voice recognition; lexical semantics), and the analyses of what these various regions accomplish are now linked to linguistics, psychology, computational neuroscience, and other relevant approaches. Below I very briefly and superficially discuss a few examples that illustrate some of the areas of progress.
One new model attempting to capture these recent developments in neuroscience (in particular, data from imaging) as well as in linguistics and psycholinguistics postulates that there is a dual stream of information processing Poeppel 2004, 2007). (A related model that focuses on the analysis of meaning also emphasizes concurrent pathways; Lau et al. 2008.) Focusing on speech recognition, an incoming signal's acoustic attributes are initially analyzed in the dorsal and posterior superior temporal gyrus and superior temporal sulcus. Importantly, these initial stages of perceptual analysis are computed bilaterally in the superior temporal cortex (Binder et al. 2000), although the left and right cortical regions have important computational specializations (especially with regard to timing properties, as discussed in the next section) that contribute differentially to perceptual analysis (Hickok and Poeppel 2007;Poeppel et al. 2008).
Two processing streams then originate from early cortical fields. A ventral pathway includes the superior temporal sulcus, the anterior temporal lobe, the middle temporal gyrus, inferior temporal sulcus, and perhaps the inferior temporal gyrus. The ventral stream underlies the mapping from sensory/phonological representations to lexical or conceptual representations (i.e., from sound to meaning). A dorsal pathway, including the Sylvian parietotemporal area as well as the inferior frontal gyrus, anterior insula, and premotor cortex, provides the substrate for mapping from sensory/ phonological representations to articulatory-motor representations. While early analysis is indisputably bilateral and much of the processing in the ventral stream is more bilateral than previously assumed (Binder et al. 2000;Hickok and Poeppel 2007), the dorsal pathway appears to be more left-lateralized. In addition to positing multiple streams-as well as multiple areas within each stream-it must now also be acknowledged that each cortical region has finegrained subdivisions. As we know from the study of visual areas, the laminar and columnar structure in the cortex offers intricate micro-circuitry underlying neural coding. Adopting such a model has a variety of consequences for discussing language processing. For example, such a distributed functional anatomy of speech processing proper suggests there must be a representational format for speech sounds that facilitates the rapid transformation between articulatory and auditory coordinate systems. In other words, the anatomy has strong implications for representational theories. Furthermore, it is obviously a new kind of research challenge to assign elementary functions or computational subroutines to these independently defined subregions. The research is not about assigning, say, "syntax" or "phonology" to an entire cortical field. Rather, it is taken for granted that successful language processing is a consequence of the coordinated action of these distributed elements; each contributes some particular operation, and jointly they form the basis for recognizing words, combining words to form phrases, and so on. Finally, this kind of architecture lends itself naturally to investigate both feedforward and feedback processing, the interaction of which lies at the center of most current theories of language processing. And, with respect to the larger issue at stake, one can now formulate hypotheses about the evolutionary trajectory of an anatomic subregion and its putative functional role. One can dissect the neural circuitry within a region and ask what aspect of the genetic toolbox might yield the observed anatomic cell and circuit structure. A particular region and its function may be implicated only in language processing or, alternatively, may show up in the processing of other cognitive domains. Critically, the functional anatomy reflects that the problem is spatially decomposed into a number of subroutines. The quip that "anatomy is destiny" should at least be taken seriously as a research strategy in this domain of research. In the next section, we turn to a similar strategy in the time domain, breaking the processing problem down at different time scales.

(b) Multi-time resolution processing
One of the fascinating challenges of perceptual analysis is how to break the input stream-which comes at a listener continuously (one might think of the arrow of time)-into chunks of the appropriate size for further processing. In the last 10 years, both behavioral and neurophysiological research have revealed that processing of this type happens on multiple timescales concurrently (Poeppel 2001(Poeppel , 2003Boemio et al. 2005;Giraud et al. 2007;Giraud and Poeppel 2012). By analogy to breaking down the complex problem of language representation and processing in space (by creating a distributed functional anatomy with "localized expertise" in different regions), we now have good reason to conjecture that the perceptual system breaks the problem down along multiple timescales that have natural relations to the units of processing.
One model that has received some attention and generated controversy, asymmetric sampling in time, AST;Poeppel (2003) posits that the input stream is sampled, simultaneously, at a relatively rapid rate (approximately 25 to 50 Hz) and a significantly slower rate (below 8 Hz). Such sampling over quite distinct time scales would allow the processor to concurrently analyze syllabic/lexical information at one scale and much more rapid segmental/phonemic information at the other. Both the rationale and the empirical support for this model have been reviewed extensively Giraud and Poeppel 2012), but recent research has begun thinking about this multi-time resolution notion in the context of genetic work. In particular, one of the more provocative hypotheses suggests that a selective dysfunction of the circuits for slow versus fast sampling could underlie deficits such as dyslexia. Interestingly, both timescales have been separately implicated in behavioral and genetic studies. On one hypothesis, a selective dysfunction of the slow sampling scale leads to phonological representations that are poorly mapped to orthography, therefore leading to poor reading performance (rise time hypothesis; Goswami 2010). An alternative hypothesis argues that it is the compromised rapid processing system that underlies dyslexic performance. There is a long and complex history trying to map elementary processing routines to language performance. It is not yet clear how these theories will relate to both linguistic theory and neurobiology. However, it is encouraging that there exist new models that lend themselves to extensive empirical testing, whether in biological or genetic studies. The central message, again, is that a system that breaks processing down in the time domain offers a further, more granular view of what constitutes the substrate for evolutionary change in what might be a profitable target for genetic investigation. One reason this area of research holds promise is that there are clear possibilities for linking hypotheses across levels of analysis, as was espoused at the outset. Recent research using optogenetic techniques (e.g., Sohal et al. 2009;Cardin et al. 2009) has identified in great detail how local cell types and circuits of a certain type regulate gamma band activity, including for sensory processing; this gamma band activity has been hypothesized to be important for certain analyses of speech (rapid sampling rate discussed above), and these rates in turn have been suggested to provide the basis for processing linguistic representations of a particular grain size (segmental and featural information). In short, the work in this domain stands a chance of being able to relate across levels genetics, the circuitry enabling specific computations, and the relevant cognitive mechanisms. A research program that outlines how oscillations mediate between spike trains at one end and speech signals at the other is provided in Giraud and Poeppel (2011).

(c) The granularity mismatch problem
We finally turn to the problem that lies at the very center of cognitive neuroscience, and we ask how a potential solution can inform both genetic research and evolutionary theorizing. Suppose we ask a large number of neuroscientists to define what are the primitive elements (the "alphabet") of neurobiology. A long list will be generated of putatively elementary units of anatomy and units of function. Presumably, the list will include concepts such as neuron, synapse, oscillation, long-term potentiation, etc. Then let us repeat that exercise with a large cohort of language researchers. There, the primitive elements will include, arguably, concepts such as distinctive feature, syllable, noun phrase, question formation, etc. Now, it is the pretense of these fields of inquiry that there is a mapping between the elementary constituents of neurobiology and the elementary constituents of cognitive science…. But if we are honest, it is quite clear that we do not have the vaguest idea of how the (empirically well-supported) elementary representations or processes from language (say distinctive feature) map to the (well-supported) elementary representations or processes from neuroscience (say dendritic spine)! The fact of the matter is that the mapping from the biological substrates to cognitive representation and processing is simply not understood at a satisfying level. In some sense, this is simply a restatement of the philosophical mind-body problem.
In part, this problem comes from the fact that these domains are studied at very different levels of granularity. In linguistics, the concepts used to study the knowledge of language one has as an adult speaker/listener, its acquisition, and its processing are exceedingly fine grained. In contrast, in cognitive neuroscience research, the typical study seeks to understand phenomena at the level of "where is syntax." This mismatch is not just practical-the granularity mismatch problem-but also speaks to the ontological commitments that different areas of research make to what they take to be the set of primitive elements (Poeppel and Embick 2005).
Is there a productive way forward? Clearly, we need suitable linking hypotheses. These are, incidentally, precisely the linking hypotheses that can potentially provide quite different targets for investigation for evolutionary theorizing and genetic experimentation. What will such linking hypotheses look like, and how can we begin to address the complicated relations between the primitives of biology and the primitives of cognitive science? In earlier work on this problem, it is argued that linking hypotheses between speech/language and brain are most likely to bear fruit if they make use of computational analyses that appeal to generic computational subroutines (Poeppel and Embick 2005;Smolensky and Legendre 2006). The research strategy amounts to "radical conceptual decomposition." The recommendation is to take a given domain of language processing, say word recognition, and decompose it into elements that could be plausibly mapped onto circuits that can be instantiated in nervous tissue. Note that this is not an exercise in reductionism as it is typically understood. While the goal is indeed to determine the basic bits and pieces, the goal is not simply to map what we take to be linguistic primitives on to what we know to be biologically available constituents (say something like a mapping from word to neuron). In part, the problem lies in the fact that we do not know what the mapping should even be. For example, we have no reason to believe that "syllable" maps on to dendrite, or neuron, or ensemble, or cortical column, or some heretofore unknown assembly of parts. Reduction is a sensible game plan if the ontological structure of the underlying domain is known, but I contend that neurobiological "stuff," too, is not yet understood at a level that licenses reduction in the standard sense. So, broadly speaking, the aim is to find generic representations/computations of the sort that can at the same time be constitutive of a linguistic representation or process and be a plausible element that can be instantiated in a neuronal circuit. The kinds of generic operations one might have in mind here include concepts such as concatenation (take an X and a Y and generate a chain X-Y) or linearization. A different example might derived from the temporal analyses discussed above, i.e., how does the circuitry underlying gamma band responses related to the hypotheses about gamma mediate analysis in speech. There are interesting research programs going in these directions, so future empirical work will show whether this kind of strategy has legs.

Conclusion
There are few more fascinating scientific questions than understanding the structure and function of the brain and precisely how it forms the basis for human cognition. The goal must be to develop an explanatory account, such that the circuits that are identified and studied provide a causal explanation for the representations and computations deployed in language comprehension and production. At the moment, it is fair to say that the research is by and large correlative, not explanatory. In imaging studies, we identify local brain activation and quantify the extent to which it correlates with some language task. In genetic studies, we correlate relatively broad patterns of the linguistic phenotype with genetic markers. In evolutionary studies, we construct scenarios over broad categories of experience, such as "syntax"-or even "communication." The (admittedly brief and caricature-ish) argument presented here suggests that a more aggressive decomposition of these concepts-specifically the putative primitives of neurobiology and of language research-might provide a new and unconventional view to study both evolution and genetics. If the target of investigation is not a broad category such as "syntax," but a narrow category that is plausibly instantiated in neural circuits such as, say, "concatenation" or "rapid sampling," then one can imagine constructing a relatively narrow empirical test. Moreover, one could begin to investigate in which way specific computational subroutines go wrong and lead to pathologies. There are some notable efforts in this direction especially in the research on specific language impairment (see, e.g., Rice, this volume). Overall, what is advocated is no more than to "be a splitter, not a lumper." We now have excellent toolboxes to be theoretically well motivated, computationally explicit, and biologically plausible splitters (using the tools of linguistics, computational neuroscience, neurobiology); if we end up splitting into the appropriate granularity, it will provide fresh new ground for evolutionary theories and genetic and epigenetic characterization.