Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name



Hispanic & Luso-Brazilian Literatures & Languages


Ricardo Otheguy

Subject Categories



Density; Journalism; Lexicon; Spanish; Vocabulary


This dissertation focuses on a quantitative and comparative analysis of lexical density in Spanish print news in the United States and Latin America. Lexical density is the statistical measure that calculates the percentage of terms in relation to the total number of running words contained in a text. Within the context of Spanish in the United States, questions arise pertaining to the level of lexical compatibility and variability in contact situations, as well as in situations where the convergence of different varieties of the Spanish language coexist. Because most existing lexical density studies of Spanish media are descriptive, there is a need for comparative studies in lexical density to provide a better understanding of the use of written journalistic Spanish in both monolingual and language contact situations.

This research quantifies, analyzes and compares the lexical inventories of the six most circulated Spanish newspapers published the United States and Latin America by focusing on three thematic contents: information, editorials and sports. Our corpora is comprised of at least 1,200 running words a day per content per newspaper, gathered over the course of six consecutive days. We also propose a new classification of terms according to whether or not they appear in the Diccionario de la Lengua Española de la Real Academia Española (DRAE) [Dictionary of the Spanish Language of the Royal Spanish Academy]. All terms are accounted for in the initial calculation of lexical density, but separate percentages are also computed to determine the density of lexical items codified as unregistered terms in the sample.

Within discussions regarding Spanish used in current journalistic print media, we quantify and classify unregistered terms to determine whether they belong to varieties of localized use, what we call semigeneral terms, belonging to a specialized lexicon defined as technical terms, and unregistered Anglicisms, those that have not been officially integrated into the Spanish language. The purpose of this categorization is to determine which newspaper in each continental area (United States or Latin America), zone (Argentina, Colombia, Mexico, Miami, Los Angeles, New York) and content (information, editorials and sports) incorporates the most unregistered terms.

We will demonstrate in this dissertation that even in a Spanish-English contact, the lexical density of Spanish written journalism in the United States is comparable to that of Spanish journalism in Latin America, with all zones showing similar high lexical density indeces according to ranks proposed by Raul Avila (1989). As questions arise regarding the status of Spanish in the United States, our results show that the majority of word inventories contained in Spanish journalism are found in DRAE, thus contributing to the mutual compatibility and comprehension among all zones and contents.

Included in

Linguistics Commons