Dissertations, Theses, and Capstone Projects
Date of Degree
9-2021
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
Kyle Gorman
Subject Categories
Computational Linguistics
Keywords
authorship identification, forensic linguistics, computational linguistics
Abstract
Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to measure their predictive ability. Second, two classification models were chosen for comparison: linear discriminant analysis and logistic regression. A combination of syntactic leaf node frequency and stylometric punctuation characters and NOT contraction variation assessed with logistic regression provided the most accurate predictions.
Recommended Citation
Manczur, Jonathan I., "From an Art to a Science: Features and Methodology in Computational Authorship Identification" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4597