I’m currently reading William H. Calvin’s eccentric little book The Throwing Madonna: Essays on the Brain. I picked it up on Kindle for .99c (a great deal). His basic theory seems to be that an adaptation to throw rocks at small prey sparked the lateralization of the brain which then drove the evolution of language (which is our most lateralized brain function). He thinks that the start of brain lateralization was based on the functionality of a rapid, fine-motor skills module that plans and executes sequential motor actions, such as throwing. He hypothesizes (based on a few scattered lines of evidence) that the skills related to being able to throw changed the brain in such a way that language is some kind of side-effect or offshoot of this process. The side-effect would have enormous benefits given the growing communicative skills of the group, which were probably based primarily on gesture. The fine-sequence module is theorized to lead to better gestural communication, which provides the neurological grammatical base for the development of more sophisticated linguistic utterances structured by a more complex syntax, which would have taken advantage of the lateralization and developing skills of fine motor sequencing for moving the mouth and tongue in complex ways. Calvin doesn’t mention this possibility, but singing could have been the intermediary between gesture and complex vocal utterances that depended on the lateralized fine-motor skills demonstrated in human handedness. It is rather curious why no other animals seems to be as handed as the human, and that handedness is more or less defined in terms of the right arm/hand being capable of executing fine-motor skills in a continuous and intelligent sequence.
I should add that talking about a fine-motor skills “module” should not bring to mind a single component, as if the module were like a gear in a clock, with one unique function demarcated by its spatial structure. A neurological module could (1) have more than one function and (2) be spread out among distinct neurological spaces, perhaps even functionally distributed in populations of neural firing rates. A fine-sequence module could actually be subsumed by a wide variety of submodules acting together and sharing information in a looped hierarchy. The module could be distributed in space among multiple neural structures, each of which could have different functional roles depending on the immediate neurological context (i.e. the rate of firing of surrounding populations). Since the functionality of the system is probably distributed at least in some respects over multiple neurological regions, we have no reason to suppose that the fine-sequence module is realized by a single clump of neural tissue (nor do we have reason for thinking pretty much any large-scale behavior/function – like perception or language – is exhaustively realized by any one clump of tissue; although some clumps do have functional specialization, this doesn’t mean that the clumps only function is that one specialized function.) In all likelihood, the module for just about anything as complex and multifaceted as language is probably realized by a variety of neural components which are distributed across many different areas of the brain, although there are probably clusters of populations which are more or less crucial in the processing loops, without which the function would be completely lost, as compared to some components of the loop which are less essential, but removal or malfunction of which causes minor processing difficulties.
The modularity is simply a result of thinking about the skills in terms of functions, and realizing that neural tissue clumps can have multiple functions and be subsumed by multiple subcomponents, which can feed information towards the top and loop around and have an feedback effect on the larger functional module. There is thus no reason to suppose that modularity requires one to imagine a phrenological layout of brain function, with each neural tissue clump having one and only one function. In reality, different neural clumps realize multiple functions based on the patterns of neural activity, which are computationally continuous and thus whose explanations requires a sui generis concept of computation, one grounded in the real-time constraints of neural population spiking codes.
The “fine-motor sequence” theory of lateralization and language origin seems compatible with the theory recently put forward in Michael Tomasello’s book Origins of Human Communication. Tomasello thinks that the grammar of modern human language got part of its developmental foundation from gesture rather than vocalization. The vocalizations seen in our ape-cousins are rather preprogrammed and neurologically specific in what triggers them. The vocalizations seems to be more of emotional expressions rather than expressions of intentional communication. Ape gestures, on the other hand, have the intentional structure necessary to provide a simple grammar of requesting, which can incorporate attention-grabbing gestural signals like stomping on the ground, requests for food, or requests for play. It is this grammar of requesting that, when coupled with the development of shared communicative ground, joint-attention, leads later to a more syntactically developed grammar of informing, which is based less on a individual-based selfishness of requesting than a shared, communal, reciprocity of information sharing grounded in a shared attentional/motivational context. For apes who have been trained in communicate through sign-language, 90% of their communicational intentions are requests, often for simple, immediate bodily desires like food or play. Humans in contrast, seem to take an intrinsic pleasure in the act of social communication and sharing helpful information for the sake of sociality. Of course, apes like to play, but human children seem to think its fun to communicate and share just for the sake of communicating and sharing (e.g. a child seeing a dog, pointing at it, and saying to his parents “Look! A doggie!”). This intrinsic shared communicative context is what gets the process of language learning really off the ground in such a way as to develop the more syntactically complex grammars related to providing “bird’s eye view” information to members of the social community within a shared, normatively structured context.