Date of Degree

6-2016

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor(s)

William Sakas

Subject Categories

Computational Linguistics

Keywords

authorship attribution, classification, privacy

Abstract

A person’s writing style is uniquely quantifiable and can serve reliably as a biometric. A writer who wishes to remain anonymous can use a number of privacy technologies but can still be identified simply by the words they choose to use — how frequently they use common words like “of,” for instance. Nondescript is a web tool designed first to identify the user’s writing style in terms of word frequency from a given writing sample and document, then to suggest how the author can change their document to lessen its probability of being attributed to them. While Nondescript does not guarantee anonymity, the web tool provides a user with an iterative interface to revise their writing and see results of a simulated authorship attribution scenario. Nondescript also provides a synonym-replacement feature, which significantly lowers the probability that a document will be attributed to the original author. (Code repository: https://github.com/robincamille/nondescript)

nondescript_code_2016-06-04.zip (3054 kB)
Code for Nondescript, with Git files and documentation

 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.