Date of Degree

9-2016

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Martin Chodorow

Subject Categories

Computational Linguistics

Keywords

Computational Linguistics, Natural Language Processing, Authorship Attribution, Twitter

Abstract

In recent years, Twitter has become a popular testing ground for techniques in authorship attribution. This is due to both the ease of building large corpora as well as the challenges associated with the character limit imposed by the service and the writing styles that have developed as a result. As both false and genuine claims of hacked Twitter accounts have made international news, there is an increasing need for this type of work. For newer Twitter accounts, however, there is little training data. Thus, this study looks to lay the groundwork for cross-domain authorship attribution: training on one source of writing, but testing on another. This work examines three types of feature sets – word n-grams, character n-grams, and stop words – and three machine learning algorithms – Naïve Bayes, Logistic Regression, and Linear Support Vector Classification.

Recommended Citation

Schwartz, Maxwell B., "An Examination of Cross-Domain Authorship Attribution Techniques" (2016). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/1573

Download

Included in

Computational Linguistics Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

An Examination of Cross-Domain Authorship Attribution Techniques

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

An Examination of Cross-Domain Authorship Attribution Techniques

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links