Dissertations, Theses, and Capstone Projects

Date of Degree

6-2016

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

William Sakas

Subject Categories

Computational Linguistics

Keywords

authorship attribution, classification, privacy

Abstract

A person’s writing style is uniquely quantifiable and can serve reliably as a biometric. A writer who wishes to remain anonymous can use a number of privacy technologies but can still be identified simply by the words they choose to use — how frequently they use common words like “of,” for instance. Nondescript is a web tool designed first to identify the user’s writing style in terms of word frequency from a given writing sample and document, then to suggest how the author can change their document to lessen its probability of being attributed to them. While Nondescript does not guarantee anonymity, the web tool provides a user with an iterative interface to revise their writing and see results of a simulated authorship attribution scenario. Nondescript also provides a synonym-replacement feature, which significantly lowers the probability that a document will be attributed to the original author. (Code repository: https://github.com/robincamille/nondescript)

nondescript_code_2016-06-04.zip (3054 kB)
Code for Nondescript, with Git files and documentation

Share

COinS