Dissertations, Theses, and Capstone Projects

Date of Degree

2-2021

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics | Discourse and Text Linguistics | Lesbian, Gay, Bisexual, and Transgender Studies

Keywords

queer, lgbt, transgender, artificial intelligence, ethical AI

Abstract

As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an overview of the previous research, I apply pre-existing, well known statistical and machine learning methods to data from trans individuals in order to extract linguistic features and characterize their writing styles. I find that several of the features pattern with features found in previous research, but are in contradiction with the gender-marked writing styles they have been shown to characterize—suggesting that trans individuals are likely to be misclassified by standard state-of-the-art methods of gender prediction. Misclassification in gender prediction is indistinguishable from misgendering, and therefore has great capacity for harm to individuals of trans experience.

Share

COinS