Date of Degree

2-2021

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics | Other Linguistics | Spanish Linguistics

Keywords

translanguaging, computational linguistics, computer mediated communication, Doc2Vec, Word2Vec, KNN

Abstract

Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.

Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on identifying code-switching in both Twitter data and the Miami-Bangor corpus. This is done by conducting three different experiments. Our first experiment is a logistic regression model where we attempt to distinguish code-switched data from monolingual data. The second experiment is using a novel Word2Vec average nearest neighbor (WANN) classifier based on word embeddings to detect code-switching. The third experiment uses Doc2Vec, where the model uses the mean vector of each document to learn and distinguish between code-switched and monolingual data. Each of these experiments are performed twice, once with tweets and once with the Miami Bangor corpus. The results show that the WANN model performs best on Twitter data. The Doc2Vec model performs best on the Miami Bangor corpus. However, both approaches did well and the performances are comparable.

Recommended Citation

Polanco, Yohamy C., "A Computational Study in the Detection of English–Spanish Code-Switches" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4195

Download

Included in

Computational Linguistics Commons, Other Linguistics Commons, Spanish Linguistics Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

A Computational Study in the Detection of English–Spanish Code-Switches

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

A Computational Study in the Detection of English–Spanish Code-Switches

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links