Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name



Digital Humanities


Matthew Gold

Subject Categories

Digital Humanities | English Language and Literature


stylometry, transgender linguistics, sociolinguistics


Project MapLemon is a corpus for stylometric demographic identification of 54,000+ words across 345 participants, originally created to obtain a baseline corpus for linguistic variation among North American English speakers. The corpus contains responses from 30 linguistic backgrounds, and 40 US states and 6+ Canadian provinces. Project MapLemon has innovated a new method for data collection for linguistic variants in the natural, digital written word. Project MapLemon utilizes a hand-drawn map and asks the participant to give directions via this map, as well as asking participants for a recipe for lemonade. In addition to its novel collection methods, MapLemon contains responses from 212 transgender and non-binary people; analysis of which has shown that transgender people write most similarly (based on parts of speech) to their sex assigned at birth, then to their gender, and are dissimilar in their writing to other opposite-sex transgender people. Furthermore, the analysis suggests that non-binary people are their own gender category and cannot be classed with any other gender.
