Date of Degree

9-2019

Document Type

Master's Thesis

Degree Name

Master of Arts

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics | Linguistics

Abstract

Classic natural language processing resources such as the Penn Treebank (Marcus et al. 1993) have long been used both as evaluation data for many linguistic tasks and as training data for a variety of off-the-shelf language processing tools. Recent work has highlighted a gender imbalance in the authors of this text data (Garimella et al. 2019) and hypothesized that tools created with such resources will privilege users from particular demographic groups (Hovy and Søgaard 2015). Domain adaptation is typically employed as a strategy in machine learning to adjust models trained and evaluated with data from different genres. However, the present work seeks to evaluate whether domain adaptation to demographic groups such as age or gender may be an effective strategy to ameliorate the effects of biased or outdated training corpora in linguistic preprocessing tasks. We find adaptation to demographic groups to be an effective strategy for improving preprocessing performance across all demographic groups.

Recommended Citation

Morini, Sara, "Demographic Factors as Domains for Adaptation in Linguistic Preprocessing" (2019). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/3398

Download

Included in

Computational Linguistics Commons

COinS

Dissertations, Theses, and Capstone Projects

Demographic Factors as Domains for Adaptation in Linguistic Preprocessing

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations, Theses, and Capstone Projects

Demographic Factors as Domains for Adaptation in Linguistic Preprocessing

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links