Dissertations, Theses, and Capstone Projects
Date of Degree
6-2016
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
William Sakas
Subject Categories
Computational Linguistics
Keywords
authorship attribution, classification, privacy
Abstract
A person’s writing style is uniquely quantifiable and can serve reliably as a biometric. A writer who wishes to remain anonymous can use a number of privacy technologies but can still be identified simply by the words they choose to use — how frequently they use common words like “of,” for instance. Nondescript is a web tool designed first to identify the user’s writing style in terms of word frequency from a given writing sample and document, then to suggest how the author can change their document to lessen its probability of being attributed to them. While Nondescript does not guarantee anonymity, the web tool provides a user with an iterative interface to revise their writing and see results of a simulated authorship attribution scenario. Nondescript also provides a synonym-replacement feature, which significantly lowers the probability that a document will be attributed to the original author. (Code repository: https://github.com/robincamille/nondescript)
Recommended Citation
Davis, Robin, "Nondescript: A Web Tool to Aid Subversion of Authorship Attribution" (2016). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/1343
Code for Nondescript, with Git files and documentation