Date of Degree


Document Type


Degree Name





Ricardo Otheguy


Gita Martohardjono

Committee Members

Martin Chodorow

Subject Categories

Anthropological Linguistics and Sociolinguistics | First and Second Language Acquisition | Linguistics | Social and Behavioral Sciences


collocations, bilinguals, language contact, cross-linguistic influence, formulaic language, convergence


This study compares monolinguals and different kinds of bilinguals with respect to their knowledge of the type of lexical phenomenon known as collocation. Collocations are word combinations that speakers use recurrently, forming the basis of conventionalized lexical patterns that are shared by a linguistic community. Examples of collocations typically used by speakers of English in the United States are make a decision, take a step, and have a coffee. Examples of collocations typically used by speakers of Spanish in Latin America and Spain are tomar una decisión ('make a decision', lit.: take a decision), dar un paso (‘take a step', lit.: give a step), and tomar un café (‘have a coffee', lit.: take a coffee). While these examples in English and their translation counterparts in Spanish have roughly the same denotation, different verbs are used to express them.

Research on collocational knowledge has focused almost exclusively on cross-linguistic effects observed in bilinguals, in direct comparison to English monolinguals (e.g., Siyanova & Schmitt, 2008; Wolter & Gyllstad, 2013; Sonbul, 2015). Differences between bilinguals and monolinguals have typically been interpreted as indicating a deficit in bilinguals’ collocational knowledge, revealing an underlying assumption on the part of researchers that collocational knowledge is categorical, i.e., collocations are either ‘correct’ or ‘incorrect’, as attested in monolingual usage, and bilinguals have or have not managed to attain the knowledge of monolinguals.

We asked whether examining the linguistic input – the language speakers hear in their daily lives – in a contact setting like New York City would reveal more about collocational knowledge overall, and specifically about collocational knowledge in bilinguals, as well as about cross-linguistic effects in bilingual collocational knowledge. Linguistic input with regard to collocations can be broken down into its different properties, including (1) the frequency of the collocation and (2) the collocation's Mutual Information score (MI), which quantifies the degree to which the statistical association between the component words of the collocation is greater than chance. For bilinguals, an additional property of a collocation is the extent to which it overlaps with its translation counterpart in the other language in terms of meaning, context, and form. Sociolinguistic studies in contact settings like New York City (e.g., Ortigosa & Otheguy, 2007) and the Netherlands (e.g., Doğruöz and Backus, 2009) have shown that the property of overlap is related to the influence that collocational knowledge in the majority language can have on that of the minority language.

Based on widely attested conventional collocations consisting of combinations of verb plus direct object that are found in the Corpus of Contemporary American English (COCA) (Davies, 2008) and the Corpus del Español (CDE) (Davies, 2002), and based also on less commonly documented equivalent alternatives, e.g., The student made a question in class about the reading (cf. The student asked a question in class about the reading), the data in this study consist of experimental behavior by bilinguals in acceptability judgment tasks. Three groups of English-Spanish bilinguals, and a group of English monolinguals and one of Spanish monolinguals were tested on site in Mexico City, New York City (NYC), and Puerto Rico. The three bilingual groups were: First generation bilinguals (tested in NYC) who were born in Latin America or Spain and acquired English as adult newcomers to the United States; second generation bilinguals (also tested in NYC) who were born in the U.S. to first generation parents; and Latin American bilinguals residing in Puerto Rico (tested in Puerto Rico). For all three bilingual groups, we selected participants who were highly proficient in both English and Spanish. In addition, a group of English monolinguals was tested in NYC and a group of Spanish monolinguals was tested in Mexico City.

The results showed the following: (1) Both monolinguals and bilinguals similarly preferred collocations with higher levels of frequency and MI, challenging widely-held assumptions that bilingual collocational knowledge is deficient even in highly proficient bilinguals or that it deviates significantly from that of monolinguals; (2) All speakers, including members of all bilingual and monolingual groups, and irrespective of whether they were tested in NYC, Mexico City, or Puerto Rico, exhibited variability in their judgments of acceptability, showing that collocational knowledge is not categorical in either bilinguals or monolinguals; collocations are not simply judged as correct or incorrect, but induce gradient reactions; (3) While cross-linguistic effects were observable among all bilingual groups in both languages, second generation speakers exhibited the most significant effects in their acceptance of Spanish collocations that were direct translations from English, e.g., tomar un paso (instead of the conventional dar un paso) and hacer una decisión (instead of the conventional tomar una decisión).

The results are for the most part in line with existing findings, and tend to lend support to usage-based theories (e.g., Goldberg, 1995, 2006; Bybee, 2006, 2013) that view language as form-meaning pairings, or “constructions”, which are acquired through exposure to the linguistic input. Furthermore, the results show that bilinguals’ knowledge of collocations, even at high levels of proficiency, is affected by cross-linguistic influence from the language of more input. This suggests that in contact situations especially, bilinguals tend to converge their knowledge, or employ optimization strategies (Bullock & Toribio, 2004; Otheguy, 2011; Muysken, 2013), where one of two existing linguistic forms expressing the same meaning in the two different languages is chosen over the other.