*==+The+Perception+of+Vowel+Sounds+==

=__The Perception of Vowel Sounds__=

05/12/09
From the J.E. Cutting article we discussed in class ("The Magical Number Two in Speech Sounds"), we learned that humans have a lower limit of perception when discriminating between speech sounds which differ in only one dimension. Rather than perceiving such differences on a continuum-like scale, as one sound was slowly bent to sound like another, it was perceived as clearly belonging to one of the //two// given category sound choices. Depending on its placement on either side of a categorical boundary, if a sound shared qualities close to //both// of the possible classifications, instead of identifying it as belonging to an intermediate (and separate) domain lying between the two "extreme" sound classifications, the test subjects naturally perceived the sound as belonging to one of the two classifications. The speech pairings Cutting used in his tests consisted mostly of sounds containing a "stop consonant": /pa/ vs. /ba/, /ba/ vs. /da/, /ba/ vs. /ma/, and /cha/ vs. /sha/; only one test pair of phonemes consisted of vowel sounds (/i/ vs. /I/). After conducting some analysis, Cutting concluded that "the perception of stop //consonants// adheres more to the magical number two" than vowel sounds seemingly do.

I thought it would be interesting to learn more about our ability to perceive the differences between English vowel sounds, as the ability for us to recognize many of our other, more consonant, natural speaking sounds rely on first hearing such phonemes. What distinguishes one vowel sound from another, and do the distinctions between such phonemes exist in a clear-cut fashion, as did the consonant sounds in Cutting's study, or do they exist more on a continuum (their perceived identities can overlap)? Also, what kind of an affect do consonant sounds have on our vowel phoneme identification abilities? These were a few questions I tried to explore in my quest to learn more about vowel perception.

According to Dr. Lloyd Hanson at Northern Arizona University, five resonant frequencies (a.k.a. "formants") are created in our vocal tract when we talk, which we manipulate to create recognizable patterns of speech. The first two resonant frequencies (F1 and F2) are the most important in creating vowel sounds, and their presence or absence determine whether or the vowel sound is produced or not; in essence, certain variations in these first two formants create the "fingerprint" of each vowel phoneme, whereas F3 helps more with determining other (//non-vowel//) phonemic sounds, and F4 and F5 assist in determining a person’s vocal timbre. (1) (2) (The reason I present the importance of F1 and F2 here is because it helps to explain why the researchers in the studies I read about focused mainly on the first two formants while studying vowel sounds...)
 * __How Do We Differentiate Vowel Sounds from Other Phonemes?__**

**__How Do We Differentiate One Vowel Sound from Another?__** To demonstrate the importance in F1 and F2's //frequencies// in determining a vowel sound’s identity, in 1952, R. L. Miller performed auditory tests using synthetically-produced vowel sounds with F1 and F2 formant parameters close to those created naturally in speech. Miller and his research team used a tone synthesizer to create waves with frequencies lying in each formant’s “normal” speaking range, with F1’s “normal” range being between 400-1000 Hz, and F2’s range belonging within 500-1800 Hz. Test subjects had to listen to various combinations of F1 and F2 values, and determine which (if any) vowel phoneme the produced sound was perceived as closest to—the results were then plotted on a graph, with the F1 frequency values on the x-axis and F2 frequencies on the y-axis, as pictured //below//. What he found was that even if the fundamental frequency was changed (and thus, the resonant frequencies shifted accordingly), the general positions and shapes of the regions on the graph for each vowel sound remained relatively constant, suggesting that the vowel formants' frequency proportions used in identifying (and creating) vowel sounds are somewhat fixed. (3) Since the phoneme regions are distinct from each other in Miller's findings (no "new" types of sounds are perceived along the boundaries of each region), it seems that the perception of vowel sounds //does// abide to Cutting's "magical number two," after all.



L.A. Chistovich and T.G. Malinnikova found that the formants’ //loudness// and //pulse distance and/or amount of pulse overlap// also play a major part in determining a vowel phoneme identity; the differences in decibel level and distances (in time) between and/or the amount of overlap allowed between the F1 and F2 pulses determine which vowel sound is created and perceived. A graphical representation of their data is pictured //below//, where the y-axes equal percent identified as each labeled line’s phoneme, and the x-axes signifying pulse distances (K) or ΔL (difference in loudness, in dB). (4) There are large overalpping regions in Chistovich and Malinnikova's graphs--especially where the /[epsilon]/ peak stretches across the /i/ and /[alpha]/ lines; however, rather than concluding that such an overlap means the identification of the vowel phonemes lies on a continuum, the fact that the /[epsilon]/ line rises as the /[alpha]/ line falls and the /[epsilon]/ line falls as the /i/ line rises suggests that perceptual boundaries exist at these crossing points (further supporting the idea that vowel sound perception abides to the "magical number two"). However, if you only look at the /[alpha]/ and /i/ lines as symbolyzing two clearly different phonemes, the region created by the /[epsilon]/ line would suggest a third, intermediate, category that was created by sharing similar factors to both of the "extremes"--this middle category can be perceived as a "distinct" vowel sound, yet clearly "belongs" between the other two, suggesting that rather than following the "magical number two," these vowel sounds follow a "rule of three."

Although the findings of the discussed studies provided contradictory results in the way we perceive vowel phonemes, I believe that our perception of vowel-sounds //is// slightly different than how we perceive consonant sounds, and it seems that the possibility of a clear, intermediate phoneme type can and //does// exist within the ranges between very distinct vowel sounds, which was not seen in Cutting's consonant phonemes tests.

In a study conducted by James E. Cutting and Alive F. Healy, our abilities to identify certain vowel phonemes are enhanced when the vowels are paired with a consonant. The experimenters had their subjects listen to a combination of five vowel stimuli [“phonemes” /i, æ, I, aI (as in b__ye__), eI (as in b__ay__)/], and 10 vowel-consonant stimuli [“syllables” /it, iv, æn, In, Id, aIv, aIm, eId, eIm/]. The correctness and amount of time it took for the subjects to recognize and identify the vowel sound used in each case was measured; it was found that pairing the /aI/ and /eI/ vowel sounds with a //consonant// greatly decreased the percent error in identifying such vowel phonemes. However, there was no significant difference in //vowel-detection time// between the two types of stimuli. [5] This last finding was interesting to me because I had originally thought that a person's ability to correctly identify an input would have some relation to the immediacy of its identification.
 * __The Accompaniment of a Vowel with a Consonant__**


 * //Sources://**
 * 1) [|L. W. Hanson: "Vowels and Formants"]
 * 2) [|Spectral Cues for Broad Categories of Speech Sounds]
 * 3) [|R. L. Miller: "Auditory Tests with Synthetic Vowels"]
 * 4) [|L. A. Chistovich & T. G. Malinnikova: "Processing and Accumulation of Spectrum Shape Information over Vowel Duration"]
 * 5) [|A. F. Healy and J. E. Cutting: "Units of Speech Perception: Phoneme and Syllable"]

> According to this article, changes in vowel durations can be perceived by babies as young as 18 months, depending on the child's native language. I didn't include this in my page discussion because it's more about the //interpretation// of what has been perceived (which falls out of the scope of what we've been doing in class), versus the actual process of perception and identification of what is heard. > The authors of this article found that stutterers tend to form the F2 formants in the /i/ phoneme in a different fashion (via tongue movements and elevation in the oral cavity) than non-stutterers; also, F1 in the /i/ phoneme is dependent upon the speech rate in stutterers. (Such effects were not observed with the /u/ phoneme).
 * //Additional Readings to Check Out://**
 * [|C. Dietrich, //et. all//: "Native Language Governs Interpretation of Salient Speech Sound Differences at 18 Months"]
 * [|F. Hirsch, et. all: "Formant Structures of Vowels Produced by Stutterers in Normal and Fast Speech Rates"]