Date of Degree

9-2021

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics

Keywords

authorship identification, forensic linguistics, computational linguistics

Abstract

Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to measure their predictive ability. Second, two classification models were chosen for comparison: linear discriminant analysis and logistic regression. A combination of syntactic leaf node frequency and stylometric punctuation characters and NOT contraction variation assessed with logistic regression provided the most accurate predictions.

Share

COinS