Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name



Speech-Language-Hearing Sciences


Brett A. Martin

Committee Members

Valerie L. Shafer

Susan Behrens

Subject Categories

Speech and Hearing Science | Speech Pathology and Audiology


perception, encoding, speech perception, neurophysiologic processing, auditory evoked potentials, T-complex, speech perception in noise, non-native speech perception, speech identification, speech discrimination, Tamil, Hindi, American English, stop consonants, acoustic-phonetic representation, phonemic representation


The perception and encoding of voice cues in consonants have been well studied, whereas there has been relatively little research on aspiration. The current study examined the encoding and perception of aspiration and voicing in Hindi, American English, and Tamil listeners when relevant cues were and were not degraded by noise. This study is novel because of the inclusion of aspiration, the language groups, inclusion of noise masking, and inclusion of auditory evoked potentials (in addition to behavioral testing).

The first aim was to determine whether language groups for whom aspiration and/or voicing is phonemically contrastive show better perception and differences in encoding these features in noise relative to those who do not use these features contrastively. The second aim was to determine how the speech perception and encoding of English aspiration and voicing was similar to or different from the responses to Hindi aspiration and voicing based on linguistic background, in quiet and in noise.

Sixteen participants between 20-45 years of age were included in each language group. Natural digitized speech sounds were used for the study. Allophonic variation and acoustic-phonetic representations are represented using ‘[ ]’, whereas phonemic representation is indicated using ‘/ /’. Therefore, stimuli were Hindi /ba/ [ba], /pa/ [pa], and /pha/ [pha] and American English /ba/ [pa], and /pa/ [pha]. These stimuli differed in voice-onset time (VOT) and aspiration. The speech sounds were presented randomly at 70 dB SPL using insert earphones in quiet, and in background noise at signal-to-noise ratio of 0. Each stimulus was presented in consonant-vowel (CV) and vowel only (V-only) contexts.

Behavioral testing included two-speech identification tests (one of these tests encouraged phonemic-level processing and the other encouraged phonetic-level processing), and a speech discrimination task. The analyses for the behavioral data included categorization responses (in percent), A’ scores, goodness ratings, and reaction times. Auditory evoked potential (AEP) recording to the speech sounds was done using a NeuroScan system and 32-channel cap. Averaged AEP waveforms to the different stimuli and condition for each participant were computed. AEP components of interest included P1, N1, P2, and N2 measured at the central electrode site (FCz) and Na, Ta, Tb, and P350 measured at temporal lateral electrode sites (T7 and T8). The peak amplitudes and latencies were measured for each participant, stimulus, and condition. Means and standard deviations were computed across language groups as a function of stimulus and condition.

The behavioral and the AEP responses were analyzed using descriptive statistics and parametric statistical tests, including mixed model analyses of variance. The native listeners had significantly higher percent correct categorization scores on the identification tasks and higher percent correct discrimination scores, relative to non-native listeners. Further, the percent correct scores were significantly higher, and the reaction times were shorter in quiet relative to in noise. As predicted, for perceptual assimilation, the American English listeners assimilated the Hindi /pha/ [pha] as English /pa/ [pha] and Hindi /pa/ [pa] as English /ba/ [pa]. The Hindi listeners categorized the American English /pa/ [pha] as Hindi /pha/ [pha] and English /ba/ [pa] as Hindi /pa/ [pa]. The Tamil participants categorized all five stimuli including the Hindi /ba/ [ba], /pa/ [pa], and /pha/ [pha] and American English stimuli /ba/ [pa] and /pa/ [pha] as Tamil /pa/ [pa].

The general pattern of AEP results was consistent with the behavioral findings. A significant effect of group was present for the P1, P2, and N2 peaks. A significant interaction of group x condition x stimulus was present, with larger P1 peak amplitudes in Hindi participants in quiet at FCz, specifically for Hindi /pha/ [pha], relative to other language groups and stimuli. Significant interactions of group x laterality were present for the American English stimuli with larger amplitudes in listeners to native-language speech stimuli at T7. The AEP responses were significantly larger in amplitude and shorter in latency in quiet relative to in noise, as predicted. Further, Na peak amplitudes were larger at T8 in noise. The AEP latencies were significantly longer for aspirated speech sounds and shorter for Hindi /ba/ [ba] and English /ba/ [pa].

Significant group differences were present for the AEPs at both the central electrode site and temporal-lateral electrode sites. Morphology of the waveforms were similar across language groups, irrespective of whether a speech sound was present in their language or not. The above finding suggests that non-native speech sounds are encoded acoustic-phonetically. Further, although a perceptual advantage was obtained in native listeners while processing speech in noise in the behavioral tasks, the AEP findings indicate that all the language groups showed similar encoding in noise. This finding also sheds some light on the level of processing indexed by the AEPs and reveals that it differs from that of perception. The results provide neural evidence for how language experience modulates speech encoding and contribute toward a better understanding of cross-linguistic speech processing in noise.