4. Music Cognition and Embodiment

How might we connect the theoretical framework of embodied cognition with the study of music? First, we should examine the role of the body in music perception, cognition, and production, and attempt to take into account the realities of our perceptual systems. Let us address connections between aspects of musical time -- rhythm, timing, meter, phrasing -- and the body. People often speak of a musical groove as something that induces motion. In describing his aesthetic criteria for rhythm tracks, a colleague of mine involved in hip-hop music distinguished between a musical excerpt that "makes me bob my head" and one that doesn't (Bilal 1997). Many of us have witnessed motion induced in infants or toddlers via music, but this behavior is not universal, involuntary, or even reliable. This capacity to entrain to a regular aural pulse may be an evolutionary vestige of a previously useful ability that has more recently fallen into disuse. While nobody can account directly for this phenomenon, it clearly involves regular, rhythmic bodily movement as a kind of sympathetic reaction to regular rhythmic sound -- that is, as a kind of dance.

Recent neuropsychological studies of music perception have affirmed the cognitive role of body motion in music perception and production. From a meta-analysis of studies of brain-damaged patients with lesions localized in various regions of the brain, it was suggested that the "rhythmic component ... of an auditory image cannot be activated without recruiting neural systems known to be involved in motor activity, expecially those involved in the planning of motor sequences" (Carroll-Phelan 1994; see also Peretz 1993). Such neuropsychological data have allowed hypotheses about the induction of a sense of beat or pulse in terms of the so-called sensorimotor loop, which includes the posterior parietal lobe, pre-motor cortex, cerebro-cerebellum, and basal ganglia. In the sensorimotor perspective, a perceived beat is literally an imagined movement; it seems to involve the same neural facilities as motor activity, most notably motor-sequence planning. (Todd 1997) Hence, the act of listening to music involves the same mental processes that generate bodily motion.

One might suppose that musical gestures might be more efficacious in eliciting such sympathetic behavior if they represent aspects of human motion somehow. Such sounds might include the dynamic swells associated with breathing, the steady pulse associated with walking, and the rapid rhythmic figurations associated with speech. Note that each of these three examples occurs at a different timescale; characteristic frequencies of the first regime might fall in the range of .1 to 1 Hz, the second 1 to 3 Hz, the third 3 to 10 Hz. In fact, it is interesting to observe the correspondences in frequency range in these groups of behaviors:

1) breathing, moderate arm gesture, body sway "phrase" .1 - 1 Hz

2) heartbeat, sucking/chewing, locomotion, intercourse, head-bob "tactus" 1 - 3 Hz

3) speech/lingual motion, hand gesture, digital motion "tatum" 3 - 10 Hz

Versions of this list were suggested by Fraisse (1982) and Todd (1997). It is a plausible hypothesis that musical activity on these three timescales might exploit these correspondences.

A variety of simple truisms support this view. For example, most wind-instrument phrase lengths are naturally constrained by lung capacity. Tactus-heavy urban dance music often makes sonic references to foot-stomping and to sexually suggestive slapping of skin. Blues guitarists, jazz pianists, and quinto players in Afro-Cuban rumba are said to "speak" with their hands and fingers. All such instances involve the embodiment of the musical performer and the listening audience.

Time and timing. Let us return to the domain of expressive timing. In groove-based contexts, rhythmic expression occurs at an extremely fine timescale -- rapid enough to rule out a simple auditory-feedback mechanism for its implementation (Fraisse 1982). This relates to an age-old question in neuroscience, known as the problem of serial order in behavior (Lashley 1951). The question is how to explain our assimilation and production of very fast sequences of events in time, given that human reflexes and neural transmission speeds would seem to be too slow to account for them. Lashley cites the common experience of mistakes in serial order of rapid sequences, such as typing, as evidence of hierarchical organization in this kind of behavior.

There is evidence that temporal, rhythmic, and grouping judgments and productions employ different modes of processing for times under roughly one half-second than they do for longer times (Fraisse 1956: 29-30, cited in Clarke 1999; also Preusser 1972, Michon 1975). These short-time processes are described variously as "pre-cognitive," "sensory," or "immediate" -- as a kind of sensation, recognition, or gestalt perception, rather than a kind of analytical or counting process. It has been suggested that this cutoff corresponds to the transition between so-called echoic and short-term memory, as indicated by the timescales involved and by other experiments (Michon 1975, Brower 1993).

Consequently, these different regimes of memory should distinguish musical rhythms above and below this approximate cut-off as qualitatively different phenomena. For pulse-based music, this cut-off lies in the middle of the tactus range, about 300 to 800 milliseconds; rhythmic material below this is perceived categorically as combinations of subdivisions of a main regulating pulse, and durations above it are considered to be on the level of metric grouping of pulses. By this division, as I shall discuss in the next chapter, echoic memory covers the immediate timescale of rhythmic activity, whereas short-term or working memory covers meter and phrases. These different types of memory involve different kinds of processing. We entrain to a pulse based on the echoic storage of the previous pulse and some matched internal oscillator periodicity; we feel the relationships among strong and weak beats (accentual meter); we count times between phrases or bars (metric grouping); and we recognize sub-pulse rhythms qualitatively (Brower 1993). An embodied account of rhythm perception and cognition would need to factor in these inherent distinctions of human memory.

The role of different kinds of memory points to the need for different models to explain rhythmic expertise at such a fine scale. A hint comes from bat and owl echolocation, in which neural delay-line architectures serve to give the creatures much higher temporal resolution than neural transmission would seemingly allow (Feldman 1997). One could say that the animals' temporal acuity exists "in" these long neural pathways -- in the physical structure of the perceptual apparatus. A working hypothesis, inspired by the existence of such structures, is that precisely timed rhythmic activity involves the entire body in a complex, holistic fashion, combining audio, visual, and somatosensory channels.

According to the embodiment hypothesis, cognitive structures emerge from reinforced inter-modal sensorimotor coupling. In this view, short-time rhythm cognition might include physical sensation, visual entrainment, and sonic reinforcement, unmediated by a symbolic representation. Cognition on the part of musicians -- especially on polyphonic, multi-limb instruments such as drums or piano -- apparently involves the physical act of making music as a primary ingredient. Consider the components of the sensory-motor image associated with rhythm perception that are rooted in echoic memory: a phrasal/body-sway-oscillator component (respirator-based), a tactus/foot-tap-oscillator component (locomotor-based) (Todd 1994), and a tatum/multiple-finger-tap-oscillator component (speech- or digit-based). According to the emodied-cognition viewpoint, what have been previously called our internal representations may consist of no more than these very sensorimotor couplings.

Kinesthetics. Words like kinesthetic, haptic, and proprioceptive refer to the psychology of bodily feedback. They all refer to the sensation of bodily position, presence, or movement resulting from tactile sensation and from vestibular input. We rely on such awareness whenever we engage in any physical activity; it helps us hold objects in our hands, walk upright, lean against walls, guide food into our mouths, and swallow it. In these cases, there is a strong interaction between kinesthetic and visual input. Similarly, in the playing of musical instruments, we must treat sonic and kinesthetic dimensions as interacting parameters; we must bear in mind the spatio-motor mode of musical performance.

All too often, theorists and psychologists have treated musical motion in terms of abstract, time-varying auditory images, while ignoring the motions exerted by the performer. Musical motion is seen as bound up with structural abstractions in pitch space or other sound worlds, involving the play of forms against one another. A typical view is evident in the following quote from noted composer-theorist Roger Sessions. "The gestures which music embodies are, after all, invisible gestures; one may almost define them as consisting of movement in the abstract, movement which exists in time but not in space, movement, in fact, which gives time its meaning and its significance for us." (Sessions 1950: 20, quoted in Shove & Repp 1995: 58) A recent review of the concept of musical motion by Shove and Repp (1995) highlights the often overlooked fact that musical motion is, first and foremost, audible human motion. To amplify this view, Shove and Repp make use of Handel's (1989: 181) three levels of event awareness: the raw psychophysical perception of tones, the perception of abstract qualities of the tones apart from their source, and lastly the apprehension of environmental objects that give rise to the sound event. This last level is aligned with the "ecological level" of perception as suggested by Gibson (1979). At this level, "the listener does not merely hear the sound of a galloping horse or bowing violinist; rather, the listener hears a horse galloping and a violinist bowing." (Shove & Repp 1995: 59) In this ecological framework, the source of perceived musical movement is the human performer, as is abundantly clear to the listener attending to music as a performance event (ibid : 60). We connect the perception of musical motion at the ecological level to human motion. This suggests that musical perception involves an understanding of bodily motion -- that is, a kind of empathetic embodied cognition.

For musicians, a major part of musical competence involves the bodily coordination of limbs, digits, and for wind instruments, breathing. Such bodily awareness is most demanding on polyphonic instruments, where multiple sonic streams are generated simultaneously. (In this way, drums and keyboard instruments are the paradigm for body-centered polyphony. The drum set and the organ are the only four-limb instruments; piano should be considered a three-limb instrument including the use of the pedals, which are often coordinated strongly with the sounds generated by the hands.) For musical performers, the difference between musical and human motion collapses; the rhythmic motions of the performer and of the musical object are essentially one and the same (Shove & Repp 1995: 60).

Blacking (1973) raised the issue of kinesthetics in musical performance in comparing two types of kalimba ('thumb piano') music among the Venda community of South Africa. One very physical type, practiced by amateur boys, featured complex melodies that appeared to be secondary artifacts of patterned thumb movements; the regularity of the movements generated the jagged melodic result. The other type, a more popular style practiced by professional musicians, had simpler melodies with small intervals and flowing contours, directed more by an abstract melodic logic than by a spatiomotor one. Baily (1985, 1989) and Baily & Driver (1992) have studied the ergonomic factors that constrain and shape performance and musical structure for various Western and Persian plucked string instruments. They argued that "the spatio-motor mode should be regarded as a legitimate and commonly used mode of musical thought" and that musical creativity may involve "finding new ways to move on the instrument." (Baily & Driver 1992: 59) Especially in instances of American rock guitar, they observed, "musical patterns are remembered and executed not solely as aural patterns but as sequences of movements, and that the music is therefore represented cognitively in terms of movement patterns which have visual, kinaesthetic, tactile, as well as auditory repercussions." (ibid. : 62) From this it was concluded that "the spatial layout of notes and the physical structure of the guitar provides a framework for musical conceptualization, a compositional tool used for the development of musical ideas, an interface to be manipulated and acted upon in certain specific ways." (ibid. : 70)

From my experience with jazz improvisation on the piano, I have found that the kinesthetic/spatiomotor approach and the melodic approach form dual extremes of a continuum. One augments one's aural imagination by exploring the possibilities suggested by the relationship between the body and the instrument, and one judges the result of such experimentation by appealing to one's abstract musical processing capacities and aesthetics. Among pianists who have exploited this relationship in jazz, Thelonious Monk has been the most influential. His compositions and improvisations provide an exemplary nexus of kinesthetics and formalism. Often his pieces contained explicitly pianistic peculiarities, including the repeated use of pendular fourths, fifths, sixths, and sevenths (as in "Misterioso" [CD-3] and "Let's Call This" [CD-4]), whole-tone runs and patterns ("Four in One" [CD-5], bridge to "52nd Street Theme" [CD-6]), major- and minor-second dyads ("Monk's Point" [CD-7], "Light Blue" [CD-8]), and rapid figurations and ornamental filigrees ("Gallop's Gallop" [CD-9], "Trinkle, Tinkle" [CD-10] - see below). All of these idiosyncrasies fit, so to speak, in the palm of the pianist's hand, while often wreaking havoc for horn players (or, even worse, vocalists). Frequently, Monk's use of such kinesthetically derived material juxtaposed their relative ease of delivery on the piano with their melodic or harmonic ambiguity. He incorporated these elements as fundamental pieces of his improvising style. Also, when transferred to other instruments such as the saxophone, many of his piano-inspired compositions had revolutionary implications for the improvising soloist. Among the most exemplary of Monk's recorded work of this nature are his versions of "Trinkle Tinkle" with saxophonist John Coltrane. (Recordings of the compositions mentioned above can be found on Monk 1986, 1994).

Opening bars of Thelonious Monk's composition "Trinkle, Tinkle"

(author's transcription, piano right hand only)

Upon examination of the first two measures of this passage, one notices that amidst all its rhythmic complexity, it repeatedly employs consecutive fingerings. Such physical patterns are simpler and apparently more primal for finger coordination than any nonconsecutive pattern. Monk was able to place these simple patterns in unconventional rhythmic and melodic relationships to yield new compositional and improvisational possibilities.

For non-polyphonic instruments (winds, brass, and many bowed and plucked strings) the role of kinesthetics may be different. Playing involves less "split" consciousness among limbs; for the most part, the two hands act together. But in groove-based music, there is an implicit challenge of relating what one is playing to an internally generated pulse -- the metronome sense (Waterman 1952) mentioned below in the chapter on meter perception, or the imagined movement (Todd 1997) described above. The legendary trumpeter Doc Cheatham spoke of this relationship: "[Playing]'s like dancing; it's the movement of the body that inspires you to play. You have to pat your foot; you get a different feeling altogether than when you play not patting your foot." (quoted in Berliner 1994: 152) Here he is speaking not of tapping the rhythm he is playing, but tapping the underlying pulse in contrast to what he is playing. Similarly, a colleague who plays bass in numerous dance-oriented salsa bands noted the new dimension of rhythmic awareness that he experienced once he had learned the dance steps associated with the music he was playing. (J. Bilmes, private communication, 1996) All musicians in the group perform these rudimentary salsa dance steps while playing onstage; this elicits a compound rhythmic consciousness along the lines of Cheatham's playing-while-tapping. Evidently, part of what it means to groove or to swing involves the continual, embodied awareness of the relationship of the pulse to the generated musical material. Different musicians have different opinions about the necessity of physically generating the pulse; for example, in my collaborations with saxophonist Steve Coleman, he has often asked me to stop tapping my foot, apparently because it distracts him. But most musicians, including Coleman, seem to agree about the importance of feeling the pulse in one's body. Hence we can regard the sense of groove as at least partially kinesthetic; it involves relating actions and sounds to the sensation of pulse, which we treat as a virtual movement.

Indeed, it should be seen as no coincidence that one's sense of rhythm is referred to as "feel." A certain kind of awareness is required to be able to tap into this embodied sense of groove. Often musicians believe that this sense can vanish if one attempts to scrutinize it. Musicians often instruct each other not to "think too much" about rhythm, apparently meaning not to analyze it symbolically with numbers or words. Instead, acute rhythmic prowess tends to be a skill that is developed over time, generally in a mindful but undeliberate fashion. Overall, a fair amount of mystique is attached to rhythm perception and performance; there is a relative poverty of terminology or pedagogy associated with these finer points of rhythm.

No treatment of the kinesthetics of musical performance would be complete without at least a mention of dance. Dancing to music is found in all cultures in a vast diversity of manifestations, in secular, religious, and ritual contexts. In many societies music and dance are bound up in one general practice, such that it may be immaterial to suppose one or the other to be the primary activity (Gregory 1997: 127). Arom (1991) describes music in sub-Saharan Africa as a motor activity, almost inseparable from dance, and comments that hearing music often instantly induces body motion among many inhabitants of this region. Many Central and West African vernacular languages have no word for music alone, and few dissociate the concept of rhythm as an abstract component of music. Rhythm is thought of as the stimulus for the bodily movement to which it gives rise, and is given the name of the dance (Gregory 1997: 127). In the Anlo-Ewe culture of southern Ghana, the term that most closely approximates our usage of "music" has been translated by Ladzekpo (1995) as "dance-drumming."

Our understanding of music is enhanced if we interpret our common practices of foot-tapping, head-bobbing, and finger-snapping as a generalized kind of proto-dance, one that arises from the imagined movement associated with beat perception. If we frame groove-based music as meant to be danced to, even in these minute ways, then a possible explanation of the elusive sense of groove begins to reveal itself. The physical sensation of groove, either in performance as a musician or in co-performance as a listener, involves both the real bodily movement and the imagined movement supplied by the perception of the isochronous pulse. The former entrains to the latter via both auditory and kinesthetic feedback. Over time, the physical motion is strongly reinforced through repetition; cognitive structure emerges from this reinforced, cross-modal sensorimotor coupling. Hence, groove as performed may stem from this kind of overlearned kinesthetic pattern or sensorimotor program.

Musical Bodies in Culture. The embodied-cognition viewpoint suggests that a musician's internal representations are intimately tied to his or her connection with the instrument, which forms part of the music-making environment. Musical abstractions certainly exist, but I claim that how an individual musician chooses over time to interact with that instrument gives rise to the majority of the musician's cognitive apparatus. The musician's relationship with the instrument can leave its trace on the music itself -- that is, it can be communicated musically. Barthes made such an observation from a listener's point of view:

I can hear with certainty -- the certainty of the body, of thrill -- that the harpsichord playing of Wanda Landowska comes from her inner body and not from the petty digital scramble of so many harpsichordists (so much so that it is a different instrument). As for piano music, I know at once which part of the body is playing -- if it is the arm, too often, alas, muscled like a dancer's calves, the clutch of the finger-tips (despite the sweeping flourishes of the wrists), or if on the contrary it is the only erotic part of a pianist's body, the pad of the fingers whose 'grain' is so rarely heard... (Barthes 1977: 189)

Barthes believed that he could hear an essential bodily approach in a given musician's music -- that he was able to "know the dancer from the dance," as in Yeats's famous line. Such interpretations are only possible given a substantial amount of detailed background knowledge about the specific music and the technique of playing the instrument. One would require an understanding of how the gross bodily traits to which he refers could be encoded in music.

The emergent structure communicated therein is informed largely by cultural norms. Again, when we speak of cognition via the body and its interaction with its physical environment, we must also discuss the social and cultural forces that construct the concept of the body. An important culturally based conceptual distinction between European and African musics involves precisely this status of the body -- the degree to which the physical situatedness of the music-making or listening body is acknowledged. A journeyman jazz pianist might observe and employ different aspects of the piano from those that a journeyman classical pianist might exploit. Thelonious Monk and Cecil Taylor have treated pianos quite differently from Glenn Gould and Vladimir Horowitz, in part because of differences among these artists' respective cultural sensibilities.

The difference between this African-American dance-music model and the European concert-music model relates to the role of the body in these respective cultures and genres. In a witheringly sharp description of this contrast, McClary (1991) writes,

In many cultures, music and movement are inseparable activities, and the physical engagement of the musician in performance is desired and expected. By contrast, Western culture -- with its puritanical, idealist suspicion of the body -- has tried throughout much of its history to mask the fact that actual people usually produce the sounds that constitute music. As far back as Plato, music's mysterious ability to inspire bodily motion has aroused consternation, and a very strong tradition of Western musical thought has been devoted to defining music as the sound itself, to erasing the physicality involved in both the making and the reception of music... (McClary 1991: 136)

(It was precisely this puritanical tendency that inspired Barthes's writing on the grain of the voice (1977), discussed below.) This description recalls the aforementioned tendency of theorists, cognitive psychologists, and Western composers to prevent the reality of embodied performance movement from entering into the abstract concept of musical motion. By contrast, many musics of the world that are not associated with a socially strict high-art tradition, and especially West African and African-American music, feature a body-based approach to music-making. By this I mean that they do not regard the body as an impediment to ideal musical activity, and that instead, many musical concepts develop as an extension of physical activities such as walking or repetitive tasks. The above discussion of Thelonious Monk suggests that his highly experimental musical techniques emerged in an environment where he felt perfectly at ease exploring the relationship between his body and the piano, even allowing his musical ideas to be subject to this relationship.

This distinction between cultural models pertains particularly to the respective approaches to rhythm. In my experiences both as a European-style orchestral violinist and as a keyboardist in jazz and hip-hop/funk contexts, I have noticed a strong cultural disparity in the respective roles of the body in rhythmic activity. As youths in violin sections of school and community orchestras, my peers and I were often discouraged from tapping our feet or swaying rhythmically. Such behavior was made to seem gauche and inappropriate, and moreover it threatened to draw attention away from the conductor's visual pulse. But in many contemporary dance-oriented bands in which I work, we often quite purposefully employ a kind of rhythmic bodily entrainment. This serves not only to allow interpersonal visual-rhythmic interaction to facilitate a collective groove, but also to help each musician to feel the relationship of his part to his own internally generated physical pulse.

Note that in both models, the sense of pulse is continually reinforced; all participants are microadjusting constantly. However, in the conducted case, as is necessary with an unamplified group of such large physical extent, the visual dimension is primary, whereas in the latter, collective case, the groove is maintained chiefly through the sonic dimension and supplemented with visual input. In reviewing preliminary studies of asynchrony among ensemble musicians, Rasch (1988) observed in a survey of recorded classical chamber works that for ensembles of 10 or more persons, on average, conducting is required, and not for 9 or fewer. But while the classic baroque ensemble, jazz big bands and modern salsa bands can have 15 to 20 members, such groups rarely have true conductors; rather, they chug along rhythmically via a collective sense of pulse, deferring to a single designated musician in passages with fermatas or rubati. In the great swing-era jazz bands, the role of a conductor was often merely ornamental, fulfilling no crucial function for the execution of the music aside from the possible cueing of entrances. This suggests that other musical elements contribute to rhythmic precision -- notably the guiding role of drums, percussion, piano, and harpsichord, whose sharp attacks present unambiguous cues for collective rhythmic synchrony. But in addition to these percussion instruments, one must not underestimate the role of proprioceptive and visual feedback among the musicians and, in many cases, from the dancers.

Embodiment and Metamusical Language. Thought has gestalt properties and other overall structures that go beyond mere handling of symbolic building blocks by logical rules. The efficiency of cognitive processing, as in learning and memory, depends on this overall structure of the conceptual system. (Lakoff 1987: xiv-xv) With these realities in mind, Lakoff and Johnson (1980) set forth the claim that metaphorical structures underly the way we understand aspects of everyday life. For example, common statements such as "You're wasting my time!" and "How do you spend your days?" suggest an underlying conceptual equation of time with money, resources, and commodity. (Lakoff & Johnson 1980: 7-9) Such elementary metaphorical structures are often reflective of aspects of embodiment, be they bodily or environmental. As an example, note our tendency to treat conceptual abstractions as visual entities. We say "I see" to mean "I understand," as things become "clear" or "transparent." This would appear to stem from a primal sense of understanding that is visual in nature -- a privileging of the sense of vision among the modalities. This seems ecologically valid, as vision remains our most relied-upon and most continually varying modality.

Linguistic theories have studied the role of physical and spatial metaphors in language (Lakoff 1987, Lakoff & Johnson 1980). Lakoff (1987) developed the concept of image schemas, which are rough conceptual pictures that we use to organize our understandings of abstract concepts. For example, the "container" schema makes it possible for people to be in love, out of trouble, and so forth. Lakoff argued that many of these schemas (such as the up-down, center-periphery, container, and movement schemas) are indeed kinesthetic, that is, derived from somatic experiences that "preconceptually structure our experience of functioning in space." (Lakoff 1987: 372) Such theories suggest both bodily and environmental bases for cognition.

We can cast a similar glance at the "metaphors we play by" -- the language and schemas we use to conceptualize and talk about music. Surveying some common tropes of metamusical language ought to lead to larger, underlying ur-metaphors, which may shed light on music cognition. Some commonly cited examples are aids for visualization of abstraction: pitch as height, timbre as color. (In my experience teaching music to children, I have observed that they tended to have no a priori sense that certain pitches were "higher" than others, even when they were well aware of gradations of pitch. This seemed to suggest that such an abstraction is merely arbitrary convention.) Others may have ecological significance, as the connection between loudness and size; a "big" sound usually comes from a physically large source. Here I point out a few other common musical metaphors.

Time as space. It is unsurprising that one of the most pervasive tropes in metamusical language is a sense of spatial dimension and extent. As with time in general human experience, musical time can appear to have forward momentum or to stand still; time is spatialized in an overall horizontal sense that seems to grow out of our experience walking around in the world. However, rhythmic time also carries a vertical implication, akin to a sense of gravity. As mentioned above, verticality is commonly understood in the realms of pitch and harmony; we have high notes and low notes, stacked voicings and root movement. Less often acknowledged is the way we verticalize time in the presence of a pulse. We play upbeats and downbeats. Rhythms can be grounded or floating; time can be suspended; a bassist can walk a steady pulse; a drummer can play four-on-the-floor. This common underlying trope amounts to a verticalization of rhythmic phase, i.e. of "circular" time. This provides a compelling connection between rhythmic pulse and the act of walking, in which feet are raised and lowered in a repetitive manner. These two notions of time demonstrate a grounding in physical embodiment; an extended, rhythmic piece of music can carry a metaphorical suggestion of a walking journey, characterized by regular rhythmic pacing coupled with the gradual visual flow of one's surroundings. Gibson (1975) has argued that this experience of walking through a stationary environment underlies our ecological (and, he argues, illusory) understanding of time as a continuously flowing quantity.

Music as speech. Often, music bears metaphorical attributes of speech and conversation. Monson (1996) has given an elaborate treatment of this metaphor in the context of jazz improvisation. One often hears instances of this metaphor in African-American musical pedagogy, where "'to say' or 'to talk' often substitutes for 'to play.'" (Monson 1996: 84) Such usage underscores what musical performance does have in common with speech as an activity or behavior, as well as what music has in common with language as a symbolic system. Among the traits that link musical performance to speech, we see that:


Note that these aspects of speech and performed music are not restricted to the domain of semantics; that is, they are not solely concerned with the "intrinsic" meanings of words or notes. Rather, these specific aspects depend upon the act of performance.

Music as life. A final way of framing music metaphorically is as life itself. Among many jazz musicians, a most valued characterization is that a certain musician has his or her own, instantly recognizable sound, where "sound" means not only timbre, but also articulation, phrasing, rhythm, melodic vocabulary, and even analytical skills. Generally it came to mean a sort of "personality" or "character" that distinguishes different improvisors. Though it is a complement if someone told you that you "sound like Coleman Hawkins," it is even higher praise to be described as "having your own sound." Trombonist and improvisor George Lewis writes,

"[S]ound", sensibility, personality and intelligence cannot be separated from an improvisor's phenomenal (as distinct from formal) definition of music. Notions of personhood as transmitted via sounds, and sounds become signs for deeper levels of meaning beyond pitches and intervals. (Lewis 1996: 117)

This view supports the widespread interpretation of improvisation as personal narrative (Lewis 1996: 117), as that which gives voice to the meaningful experiences of the individual. Ground-breaking pianist Cecil Taylor wrote of equally ground-breaking saxophonist John Coltrane [CD-11],

In short, his tone is beautiful because it is functional. In other words, it is always involved in saying something. You can't separate the means that a man uses to say something from what he ultimately says. Technique is not separated from its content in a great artist. (Taylor 1959)

Often, then, an improvisor's original playing style is bound up with his or her (possibly idiosyncratic or self-styled) technique. Usually the autodidactic approach plays a large role for improvisors, for whom the creation of music is embodied in one's relationship to one's instrument; hence the inseparability of "sound," or pure musical approach, from a "phenomenal definition of music" -- a personal sense of what music is and what it is for.

The notion of personal sound functions as an analytical paradigm, a kind of down-home biographical criticism. An individual's sound, rhythmic feel, and overall musical approach are seen as an indicator of who he or she "is" as a person. Musicians' interactive strategies in music might be seen as an indicator of their interpersonal behavior; their rhythmic placement with respect to the pulse may reflect how "fiery" or "cool" their temperaments run; their melodic inventiveness and harmonic sophistication might parallel their offstage urbanity and wit. Admittedly, such stereotypical characterizations beg to be broken down; rarely does a musician's offstage personality fit such conventional wisdom. Indeed, one could also view "musical personality" as a kind of mask that the performer wears onstage, Signifyin(g) on his or her offstage identity as well as on performance itself. But in either case, the notion of personal sound, relating musical characteristics to personality traits, reveals much about how music and life can be conceptualized together. It is best seen as a manifestation of a cultural model in which music is a way of life, and vice-versa.

We can also examine this metaphor of music as life in a different light as it appears in West African music. Ladzekpo (1995) counts among the most important aspects of his musical education the understanding of different musical elements in terms of their worldly counterparts. A particular rhythm might be seen as the "artistic animation" of a real-life character. As a concrete example, he has described a rhythmic ostinato that occurs on the off-beats as the "party animal," because of the way it seems to jump off the ground repeatedly and draw attention to itself:

(In the audio example [CD-12], a click track is heard alongside this drum pattern, with the high click on the downbeat of the written measure.) Also, steady rhythmic pulse is seen as "purpose in life," and rhythmic obstacles such as cross-rhythms as "challenges" to that sense of purpose. Mastery of the music's rhythmic complexity is seen as a kind of strength, an ability to keep life balanced. Ladzekpo describes the music in the Anlo-Ewe culture as representative of the "complex fundamental disposition of mankind" (Ladzekpo 1995).

Metaphors in motion: the music of Cecil Taylor. An extended example may provide more illumination. The above metaphors of music as life and as speech were made quite concrete in my experience as a performer with pianist Cecil Taylor's "creative orchestra" in 1995. This was his music for large ensemble, which forty very fortunate Bay Area musicians had the opportunity to study and interpret under his guidance. Taylor's approach spoke volumes about improvised music as a collective activity. Early on, when we were repeatedly questioning him about the role of the written material, he said, "This [written material] is the formal content of the piece; what I want is for all the players to bring their individual languages to the interpretation and execution of the piece." Taylor desired that we create a collective embodiment of his material by filtering it through our individual "languages," framing the music as speech, individual sound as personal narrative.

In such a context the emphasis is rarely on being faithful to set compositions in terms of pitch content, melody or harmony. Compositions in an improvising context mean something else entirely -- perhaps a jumping-off point, or music-generative methods that all can agree upon to some extent. In our week of daily sessions with Mr. Taylor, the earlier sessions led us to believe that he was a stickler for detail. I recall that we spent the first 3-hour rehearsal on one postage-stamp-sized corner of one of his scores; he would continually repeat and rework the material bit by bit, singing or dancing a certain phrase for us, or asking us to permute the written pitches in a certain way. But towards the end of the week, his requirements grew less stringent, his guidance less direct; he would simply set us in motion and leave the room for a while. I realized that somehow he had taught us his language -- his sense of phrasing and repetition, his attention to detail, the way he rigorously reworks and dissects a turn of phrase. Once this had happened, we were free to bring our own ideas to this context -- to embody his language. When he returned to the rehearsal room, he would find that we had made something out of his "hieroglyphics." Evidently, Taylor's aesthetic privileges the sound of personalities interacting over conventional concepts of form. Because of the heightened role that group interactivity played, it felt at times as though we had formed a small musical civilization, rather than an orchestra.

Indeed, our group experienced in microcosm the conflict, strife, and tension that a society experiences in macrocosm. Much of this was enacted on a musical level in the performance on October 26, 1995. For example, when some musicians reached the stage, they abandoned their allegiance to the unwritten, brittle orchestral aesthetic that had been developed over the course of rehearsals [CD-13], choosing instead to yield to the temptation to play nonstop with furious intensity. This behavior raised the issue of the distribution of (physical) power -- for clearly, a tenor saxophonist can honk and shriek with enough force to drown out a section of six violinists, and a drummer can bury a pianist's efforts with ease. It was found that the louder instrumentalists possessed the privilege to control the intensity level directly, while the softer instrumentalists were forced to defer to such control. Fellow musician Matthew Goodheart (1996) has observed the added role played by the self-serving musical choices made by certain individuals who wanted to "Play With Cecil" and get noticed by the legendary pianist for possible career advancement. Also, in the absence of a more dictatorial leader figure or a hard and fast text to which to adhere, we found ourselves in frequent disagreement as to what was "supposed" to be happening or what to do next. Different factions formed to conduct their own unified small-group activities, allowing for the emergence of pockets of apparent order in the sonic chaos. The resultant performances featured truly sublime flashes of fortuitous beauty and moments of brilliantly focused small-group improvisation, amidst often inscrutable orchestral noise. The metaphor of music-as-life was borne out in our experience of ensemble-as-social-group.

What do these underlying metaphors teach us about music? It becomes clear from the above discussion that especially in the realm of jazz, an understanding of music grows out of one's relationship to one's body, instrument, peers, and broader culture. Such conclusions are also drawn by Berliner (1994). His overall claim is that one acquires the knowledge and skills called for in jazz improvisation chiefly through the combination of immersion in an acculturated community of practitioners and hours and hours of self-directed experimentation on one's instrument -- that is, through a confluence of situated and embodied learning.

Perceptual invariants. We can view the notion of sound as a carrier of identity from a perceptual standpoint, in the same way that one might describe the recognition of a specific person's speaking voice, using the notion of invariants. As Gibson (1979) described it, the perception of an environment that both changes and persists involves extracting invariants of structure from a continually varying bath of stimulation, and noticing the variation relative to these underlying invariants. In a similar vein, Shaw and Pittenger (1978) suggested the possibility of invariants that are functions of time, such as a repetitive motor. They distinguished between transformational and structural invariants. Transformational invariants are relational aspects of the information that specifies the identity of a particular pattern of change; hence one might hear speaking as a certain kind of use of the human voice, as opposed to singing, for example. By contrast, structural invariants are relational properties specific to the source object undergoing a particular style of change. These properties might include the invariant structural features of this particular speaker's vocal tract and other parts of her body that might give rise to the production of vocal sound. It seems that they might also include learned usage of those organs in specific ways, such as regional accent and vocabulary. For these structural invariants are not only confined to body; they also involve memory, history, personal choice -- in short, the person's individuality. In African-American musical contexts, especially in the case of musical improvisation, a personal sound contains the musical trace of the musician's body usage (as with saxophone timbre, for example), as well as of his or her conceptual approach (as it is conveyed in the course of improvisation). A variety of musical attributes, ranging from instrumental timbre to improvised musical choices, may be seen as manifestations of underlying invariants in the musician's embodied worldview. Hence we may align the personal sound construct with these perceivable structural invariants.

Paralinguistics, Performativity, Signifyin(g). Some recent results from psycholinguistics illuminate the role of paralinguistic phenomena such as hand gestures in conversational speech. McNeill (1998) examined footage of individuals explaining cartoon scenarios, and found that hand gestures served not just to illustrate but to augment what was described verbally. In this way, language and gesture are seen to be coexpressive, meaning that the sonic and visual dimensions sum to form a larger meaning than either one independently would convey. McNeill's aim is to challenge the modular information-processing model of cognition, which in its strictest sense does not treat different processes as interacting. In the modular view, complex cognition is broken down into something resembling subroutines or modules in a computer program. Rather, claims McNeill, visuals and sound connect to produce a whole; they need to be treated as interacting, not modular, so that context can be incorporated.

Similarly one might imagine that visual and other contextual factors in a musical performance co-articulate musical meaning along with the sonic trace. In keeping with some post-structuralist scholarly work, we may call these elements performatives. The term "performative" has grown to encompass a wide range of phenomena, but it was first coined by J. L. Austin (1962) in reference to a certain class of speech act, namely a verbal utterance that fulfills a function by virtue of its being spoken. A commonly cited example is a wedding officiant's statement, "I now pronounce you husband and wife." In speaking that declarative sentence, the officiant also executes an action -- one that can only be done by enunciating that statement. Hence the utterance accomplishes more than conveying a true or false statement; it is also an act with non-linguistic, real-world consequences. Later, Austin and others (Forguson 1969; see also Parker & Sedgwick 1995) pointed out that "there really is no good reason to distinguish between performative and other sorts of utterances at all. All utterances have their 'performative' role to play in discourse..." (Forguson 1969: 419) Hence it was acknowledged that a variety of additional meaning arises via inflection, stress, and most importantly, situational circumstance. When the Pope says, "Bless you," as an Englishman kneels at his feet, it means something quite different from when a Polish immigrant says, "Bless you," to a sneezing American passerby, even if the intonations, inflections, and intensities are identical. The dimension along which these two cases differ may be called the performative dimension.

In a related essay, Barthes (1977) pointed out that performance of composed music also carries this "extra" dimension. In addition to the meaningful intramusical dynamics, supplemental meaning is generated by the presence of a music-making body, and the sonic traces it leaves behind. Hence the "grain" of the voice, by announcing the vocalist's physical presence, signifies a rupturing of the disembodied, self-contained world of the classical work. The personhood of the performer insinuates its way into (classical) music performance through its roughness, its resistance, its departure from the ideal. The physicality and resistance of the voice point to its producer, the performer, and to the act of it being produced. The grain of a musical performance reminds the listener of the physical sensation of using the voice, or other parts of the body: "The 'grain' is the body in the voice as it sings, the hand as it writes, the limb as it performs." (Barthes 1977: 188)

These variable features of performance give music no small part of its expressive powers. Dunn and Jones (1994) provide a multiplicity of perspectives on embodiment in Western female vocal music. It is seen that the meaning of a vocal utterance is constituted not simply by its semantic content but also by its sonorous content. By focusing on the essential role played by the 'purely sonorous' (i.e. musical, non-verbal) features of the female voice in 'the construction of its non-verbal meanings,' by making explicit the fact that sonorous features must be conceptually linked to the production of vocal sound through a person's body, and by studying the various factors of acculturation that affect the reception of vocal sound, the authors provide a complex account of the status of sound as "performed." Vocal meaning derives from 'an intersubjective acoustic space,' and any attempt to articulate that meaning must necessarily 'reconstruct ... the contexts of ... hearing'." We thereby "recognize the roles played by 1) the person or people producing the sound, 2) the person or people hearing the produced sound, and 3) the acoustical and social contexts in which production and hearing occur. The 'meaning' of any vocal sound, then, must be understood as co-constituted by performative as well as semantic/structural features." (Dunn 1994:2-3) The performative dimension in music and speech, as described above, overlaps many other salient musical/linguistic dimensions (the interactive, the processual, the semiotic) to some degree. In fact it could be said that all socially situated human acts -- music, speech, writing, sport, political acts, worship, etc. -- contain a performative element, for they all involve non-verbal meanings generated by situated bodies in intersubjective cultural spaces.

How might performativity operate in music that has no score? Indeed one might ask, how might performativity not operate in music that only exists when performed, when in motion? To be sure, improvised music must project a great amount of meaning along the performative dimension, since so much of how one listens to it depends on one's understanding of its contextual factors. These aspects might include the role of improvisation as a trope for the present, interactivity as the conveyor of a shared sense of time, and the attention to the role of the body and the specific surroundings in music-making activity. In my experience, I have noticed one of the most common questions asked by novice listeners after a jazz performance to be, "What percentage of that was improvised?" One requires some understanding of the conditions and assumptions that give rise to the sounds one hears; the notes and tones do not explain themselves to an outsider.

In African-American musics, it could be said that a large part of this performative dimension coincides with the dimension of Signifyin(g) (Gates 1988). The stylized term is given dozens of meanings that play off of each other; Signifiyin(g) can refer to "a way of encoding messages or meanings which involves, in most cases, an element of indirection... [i.e.] an alternative message form [which] may occur in a variety of discourse ... Signifyin(g) is troping." (Gates 1988:80-81) The governing idea behind Signifyin(g) is verticality -- that is, the free play of rhetorical associations to conjure up multiplicities of meaning beyond the literal. In theorizing about African-American literary discourses, Gates identifies the importance of Signifyin(g) -- i.e. of verticality, of intertextuality, of history, of multiplicity, of reference to shared knowledge -- in the production and communication of meaning.

It seems fair to align the concept of Signifyin(g) with the notion of the performative, which also denotes the nonliteral meanings conjured up through nonverbal channels. Hence, just as the very performative activity of Signifyin(g) accounts for a significant portion of the generation of meaning in spoken and written language, one might expect a similarly large amount of information to be conveyed via Signifyin(g) in the performative aspects of music. For example, much meaning in African-American music is generated through continual referencing, be it explicit or implicit, of a background wealth of cultural information. In jazz, this might happen on a surface level by quoting a well-known melody in the course of an improvised solo, or by paraphrasing a melodic or rhythmic fragment that somebody else just played. Or, it might happen at a subtler, more deeply coded level, through a construct such as timbre (e.g. does Eric Dolphy's alto saxophone sound reference that of Charlie Parker? [CD-14]), or in the way that a piece is constructed (do Ornette Coleman's compositions Signify on the melodic gestures of bebop? [CD-15]), or most abstractly, in a musician's "sound" or "attitude" (does Miles Davis's sense of space, timing, and melody convey a sense of the blues? [CD-16]). In hip-hop it may occur in the choice of musical material, as with the widespread, often blatant use of samples of classic funk and soul tunes [CD-17, 18], or in lyrics, as when MC's (rappers) represent, addressing their origins or their home turf [CD-19]. Also, an MC is often characterized by his or her flow, a flexible concept (analagous to sound in jazz) that can refer in different contexts to rhythmic acuity, lyrical prowess, or general persona [CD-20]. The heightened role of such performative parameters in these various African-American musics provides a case for the conceptual role of sociocultural situatedness in their reception, perception, cognition, and production.

In light of this discussion of performativity, it seems that the application of Meyer's (1953) concepts of musical meaning to jazz leaves something to be desired. Meyer theorized that emotion and meaning in music boil down to the deferral of expectations implied by intramusical dynamics. In developing this theory, Meyer adhered to a rather arbitrary distinction between "designative meaning," in which a stimulus may become meaningful by referring to something that is different from itself in kind, and "embodied meaning," in which the reference is to something like the stimulus itself. Roughly, this distinction corresponds to the difference between the ethnomusicological understanding of music as a body of references to certain cultural practices on the one hand, and the objectivist understanding of the "intrinsic" dynamics of the music itself, somehow disembodied from its cultural context. Meyer chooses (and thus believes it possible) to focus on the latter, which, ironically, he labels "embodied meaning." Whatever its applicability or inapplicability to European concert music, I believe that this distinction collapses in much African American music, which is often arguably as much a definition of the oral culture that produces it as it is an outgrowth of it. For a music that is so conscious of its own origins, one cannot neglect the dimension of meaning made possible by Signifyin(g) -- namely the possibilities of multiplicities of meaning set forth by the metonymies of oral culture itself. To deny the dimension of history, of the Signifyin(g) possibilities inherent in the play of one piece of music against the memory of its predecessors, is to rob the music of a greater part of its meaning.

Time and Temporal Situatedness. Yet another fundamental consequence of physical embodiment and environmental situatedness is the fact that things take time. The concept of time must structure our conception of physically embodied cognition from the start. Smithers (1996) draws a useful distinction between processes that occur "in-time" and those that exist "over-time." The distinction is similar to that between process-oriented activity, such as speech or walking, and product-oriented activity, such as writing a novel or composing a symphony. In-time processes are embedded in time; not only does the time taken matter, but in fact it contributes to the overall structure. The speed of a typical walking gait relates to physical attributes like leg mass and size, and shoulder-hip torsional moment; this is why we cannot walk one-tenth or ten times as fast as we do. Similarly, the rate at which we speak exploits the natural timescales of lingual and mandibular motion as well as respiration. Accordingly, we learn (or more likely we are hardwired) to process speech at precisely such a rate. Recorded speech played at slower or faster speeds rapidly becomes unintelligible, even if the pitch is held constant. The perceived flow of conversation, while quite flexible, is sensitive to the slowdown caused by an extra few seconds taken to think of a word or recall a name.

Over-time processes, by contrast, are merely contained in time; the fact that they take time is of no fundamental consequence to the result. Most of what we call computation occurs over time. The fact that all machines are considered computationally equivalent regardless of speed suggests that time was not a concern in the theory of computation, and that the temporality of a computational process is theoretically immaterial. In so-called "real-time" systems, typically one exploits the blinding speed of modern microprocessors to allow computation so fast that one doesn't notice how much time is taken. However, this is not what the mind does when immersed in a dynamic, real-time environment; rather, it exploits both the constraints and the allowances of the natural timescales of the body and the brain as a total physical system. In other words, Smithers (1996) claims, cognition chiefly involves in-time processes. Furthermore, this claim is not limited simply to cognitive processes that require interpersonal interaction; it pertains to all thought, perception, and action.

In intersubjective activities, such as speech, musical performance, or rehearsal, one remains aware of a sense of mutual embodiment. This sense brings about the presupposition of "shared time" between the listener and the performer. This sense is a crucial aspect of temporality of performance, especially from a communication point of view. The experience of listening to music is qualitatively different from that of reading a book. The experience of music requires a "co-performance" that must occur within a shared temporal domain (Schutz 1964). This sense of co-performance is made literal in musical contexts primarily meant for dance; the participatory act of marking time with rhythmic bodily activity physicalizes the sense of shared time. The meaning derived from such physical participation contrasts with the "contemplative" mode of music listening as practiced in the European concert hall, in which any kind of body motion by an audience member is typically met with negative social feedback.

The Temporality of Musical Performance. We may consider how various cultural models may affect cognition by framing, though not entirely dictating, notions of time. Shore (1996: 62) gives examples of temporal models that can orchestrate culturally specific time frames. These include incremental, decremental, cyclical, rhythmic, and biographical models, as well as context-framing devices. A jazz performance might contain many of these models simultaneously. It might utilize a cyclic form (such as a song form or "chorus"), in which time is broken into rhythmic segments (beats, subdivisions, meter), and of which chunks may form complete episodes (such as an individual solo, which might last a number of choruses). The entire performance might be governed by a sense of appropriate length, thereby involving an overall incremental model.

Furthermore, the performance situation might be understood as a context-framing device. In his study of music of a certain community in South Africa, ethnomusicologist J. Blacking wrote, "...Venda music is distinguished from nonmusic by the creation of a special world of time. The chief function of music is to involve people in shared experiences within the framework of their cultural experience." (Blacking 1973: 48) There is no doubt that this is true to some degree in all musical performance. We can take this concept further in the case of improvised music. The process of musical improvisation in a jazz context can be seen as one specific way of framing the shared time between performer and audience. The experience of listening to music that is understood to be improvised differs significantly from listening knowingly to composed music. The main source of drama in improvised music is the sheer fact of the shared sense of time: the sense that the improvisor is working, creating, generating musical material, in the same time in which we are co-performing as listeners. Part of what we seem to experience as listeners to any music is an awareness of the physicality of the "grain," and a kind of empathy for the performer, an understanding of effort required to create music. In improvised music empathy extends beyond the concept of the physical body to an awareness of the performers' coincident physical and mental exertion, of their "in-the-moment" (i.e. in-time) process of creative activity and interactivity. Thus improvisation heightens the role of embodiment in musical performance.

Time framed by improvisation is a special kind of time that is flexible in extent, and in fact carries the inherent possibility of endlessness, similar to that pointed out in Shore (1996) in the case of baseball games. Instances like Paul Gonsalves's 27 choruses (over 6 minutes) of blues on Ellington's "Diminuendo and Crescendo in Blue" [CD-21] and Coltrane's sixteen-minute take on "Chasin' the Trane" [CD-22] -- significantly, both live recordings -- attest to the power that the improvisor wields as framer of time, deciding both the extent and the content of the shared epoch.

Temporal situatedness & musical form. Accordingly, music that privileges improvisation requires a different concept of musical form from music that is through-composed. In the former case, musical form can be described in terms of temporal situatedness. It is enlightening to consider the concept of form in the classical improvised music of India [CD-23]:

Syntactical forms are virtually unknown in the music of India. Instead we hear long, cyclical, chain structures and a general progression of organic growth that reveals the guidance of quite different formal models and metaphor. The tactics of form go hand-in-hand with the prevailing models of structure: hierarchical and syntactical forms are naturally implemented by such tactics as contrast, parallelism, preparation, rise, transition, and the like; serial forms [as in Indian music], however, tend to be modular, decorative, incremental, progressive, and open-ended. The Indian version of musical structure tends to emphasize variation of the module: by permutation of its elements, by inflation and deflation of patterns, by pattern superimpositions, and by progressive organic development. (Rowell 1988)

Improvised African and African-American music can share many of these traits, particularly in the long-term organization of material. The major role of improvisation in many oral musical traditions, combined with the important function of groove, make possible alternative notions of musical form that do not conform to the recursive hierarchies of tonal-music grammars. A teleological concept of form, in which the meaning of music is taken to be its large-scale structure, may be replaced with an alternative, modular approach, in which the meaning of music is located in the free play of smaller constituent units. Such notions of musical structure appear in many African and African-American musics. Instead of long-range hierarchical form, the focus is on fine-grained rhythmic detail and superpositional rhythmic hierarchy. Thus, large-scale musical form emerges from an improvisatory treatment of these short-range musical ingredients -- that is, from the in-time manipulation of simple components in a modular conceptual organization.

A prime example is James Brown's frequent practice of "takin(g) it to the bridge" [CD 24]. A given piece might consist of two different musical spaces or grooves, the transitions between which are cued musically by the vocalist. Hence each section may be arbitrarily long, since the only thing to delineate it is an improvised cue to the next section. Before the performance of the piece, Brown and his band may not know exactly what will happen when; rather, they know what the raw materials are and how to manipulate them during performed time. As another example, jazz drummer E. W. Wainwright (private communication, 1997) described to me a practice of creating large-scale temporal form out of an open-ended though metrically distinct musical environment, as it was done by John Coltrane's legendary quartet in the early 1960s (cf. the title track to Transition [CD-25]). In such pieces, the group would be improvising in 4/4 time, using a certain collection of pitches as a loose framework, such as a mode over a D pedal point. Eventually, formal small-section boundaries would emerge by the systematic doubling of the musical period. As was told to Wainwright by Elvin Jones (the quartet's drummer and Wainwright's teacher), the group would initially accent the beginning of every four bars, using intensity as well as rhythmic, melodic, and harmonic parameters. As the piece unfolded, they would expand the period to eight bars, then sixteen, and so on. The larger the period became, the greater heights the intensity and dissonant tension could reach, and the more effective the unified release at the beginning of the next period. As Jones told Wainwright, this practice emerged organically over the course of hundreds of improvised performances, never having been discussed verbally by any band members. These two examples suggest that aspects of musical form can stem from the sense of shared, lived time, and the way variations are carried out while embedded in time.

In addition, in jazz and other musics, intramusical hierarchical organization may very well be decentered in favor of referential, associative, or functional relationships (Honing 1993). Formal emphasis might be more on repetition, on reference to a shared body of knowledge, or on maintaining a relationship to a composite rhythmic pattern, and less on the recursive derivation of a background meaning by grouping sections into chunks. In other words, the emphasis on "the moment" as a consequence of embodiment allows for different kinds of formal derivation.

As an example, consider again the saxophonist John Coltrane. He was known early in his career for playing long, impressive, exploratory solos that projected excitement and forward motion nonetheless, full of blisteringly fast runs, filigrees, and arpeggios, as on Monk's "Trinkle, Tinkle" [CD-26]. Coltrane's improvisations were less hierarchically unified than was typical for the idiom, and more serial or sequential; sometimes it was said that his solos lacked direction or went on too long. Many have tried to establish "motific development" in Coltrane's individual improvisations as that which creates structure (Dean 1992, Jost 1981), but it seems to me that this is merely a consequence of a greater formation -- Coltrane's "sound," his holistic approach to music, which yields these elements. I do not wish to imply that Coltrane had no mind for "structuring" an individual solo; but these sorts of analyses stem from the critical tools of the listener rather than the improvisor. As a musician, I personally believe that the improvisor is concerned more with making individual improvisations relate to each other, and to his or her conception of personal sound, than he or she might be with obeying some standard of coherence on the scale of the single improvisation.

In this way, the temporally situated view of music cognition facilitates a nonlinear approach to musical narrative. Musical meaning is not conveyed only through formal hierarchies, motific development, contour, and temporal deferral of expectations; it is also embodied in improvisatory techniques. Musicians tell their stories, but not merely in the traditional linear narrative sense. An exploded narrative is conveyed through a holistic musical personality or attitude. That attitude is conveyed both musically through the skillful, individualistic, improvisatory manipulation of expressive parameters in combination, as well as extramusically in the sense that these sonic symbols "point" to a certain physical comportment, a certain way-of-being-embodied. In improvised music practices in general, the emphasis tends to be not on the single isolated performance but on the developing body of concepts or expressions as it exists over long periods of one's life. The only invariable guideline for a solo or a group improvisation is to feel in the end as though you have "said something" (Monson 1996). The details of how this is accomplished are as variable in music as they are in speech.

Embodiment as a complement to cognitivism. In concluding this lengthy chapter, it should be pointed out that the claims of embodied and situated cognition operate alongside the symbolic cognitive methods that traditional knowledge-based systems might exploit. There is a clichéd distinction between literate and oral cultures that runs, "The literate man stores information through writing; the oral man stores information through physical assimilation: he becomes the information." (Sidran 1971: 9) Actually, both kinds of "storage" occur in everyday life in the literate world; information is distributed among various embodied and situated dimensions, including learned sensorimotor patterns, written or memorized symbolic information, and social customs. These delineations correspond to three "vantage points" that we can consider to be part and parcel of cognition: "the development of the individual, the local support conditions leading to the mastery of the symbol system and materials of the domain, and the cultural setting which gives meaning and structure to the entire expression." (Davidson & Torff 1992: 120-121)

As an example, consider a concert violinist's performance of a composed piece from the standard repertoire. We have now discussed many ways of studying the violinist's performance. In working on the piece, she develops a personal, non-transferrable interpretation, which emerges as a highly specialized behavior from her hours of physical practice. A performance of this piece would represent both the retrieval of memorized, symbolic information and the enaction of physically assimilated behavior. The piece itself may possess a deeply hierarchical intramusical logic full of formal interplay, and the violinist may highlight these formal elements via certain expressive performance choices. Other performance variations in timing, dynamics, and intonation might stem from nervousness, fatigue, caprice, or the soloist's attempts to be audible over the polyphonic backdrop of a hundred accompanying musicians. One may also study how the performance may be framed socially in a concert hall as elite high-culture spectacle, rich in performative elements, from the "alienating social ritual of the concert itself," to the enhanced social distance between the audience and "the 'artist' in evening dress or tails," to the "listener's poignant speechlessness as he/she faces an onslaught of such refinement, articulation, and technique as almost to constitute a sadomasochistic experience." (Said 1991: 3) All of these dynamics inform our reception of the performed music.

Improvised music provides us with another example. A jazz pianist improvising over a standard tune would certainly require a working understanding of functional harmony, meter, and form, both in general and specifically applied to the song in question. This knowledge would fall mostly in the abstract, symbolic domain. However, a variety of other requirements draw on the situatedness of the pianist's body vis-a-vis the piano. These elements include sensorimotor functions like the placement and control of hands on the keyboard and foot on the pedal, the coordination of the digits, and the harnessing of these activities to an internally entrained pulse. In addition, cultural and learned factors such as the musician's relationship to the instrument, genre, and associated lifestyle may find their way into the improvising process in the form of personal sound, choice of musical material, or adherence to preestablished norms. The artist might Signify on established versions of the piece (including his own, as when pianist Ahmad Jamal performs "Poinciana" today, knowingly quoting and modifying his distinctive, extremely popular 1958 version [CD-27]). Or, the pianist might highlight his or her version by sheer contextualization (as with the quintessentially modern Thelonious Monk's old-fashioned, buoyant stride-piano version of the then-40-year-old tune popularized by Louis Armstrong, "I'm Confessin'" in 1963 [CD-49]). The pianist might deploy expressive timing and accent manipulations that highlight the relationships between the performed rhythm and the pulse, the melody and the chord changes, the melody and the (unheard) lyric, the right hand and the left hand, the left-hand pattern and a bassline from another piece, or any number of other kinds of rhythmic interplay.

The often implied characterization of the social and symbolic as high-level and the embodied as low-level is misleading, for these functions may interact with each other bilaterally. In particular, one should not claim that the high-level processes "direct" the low-level, for in some cases it is not clear that there is any such hierarchical organization. Indeed, the tendency to posit such a hierarchy stems from our prejudice of mental processes as more "elevated" than physical ones. Further consideration of the example at hand suggests that the cognitive organization between "body" and "mind" can be heterarchical, or non-hierarchically distributed. For instance, in the midst of an improvisation, the temporally situated pianist is always making choices. These choices are informed not simply by which note, phrase, or gesture is "correct," but rather by which activities are executable at the time that a given choice is made. (Similar observations have been made by Sudnow 1978.) That is, a skilled improvisor is always attuned to the constraints imposed by the musical moment. This requires an awareness of the palette of musical acts available in general, and particularly of the dynamically evolving subset of this palette that is physically possible at any given moment. In this way, for example, the improvising pianist is more likely to choose piano keys that lie under her current hand position than keys that do not. Such weak constraints (which may be overridden, with physical and melodic repercussions) combine holistically with formal directives such as melody and harmony (which may also be overridden). Indeed, improvisation -- musical and otherwise -- may be understood partially as a dialectic between formal/symbolic and situational/embodied constraints.

Hence the functions of situated or embodied cognition neither replace wholesale, nor obey blindly, but rather supplement and complement the abstract, symbolic cognitive processes that we usually associate with "thinking." In this chapter, I have attempted to show how the theoretical concepts of embodied and situated cognition can similarly enhance the study of music perception and cognition.


Table of Contents

List of Audio Examples



Previous Chapter

Next Chapter