Date of Degree
authorship identification, forensic linguistics, computational linguistics
Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to measure their predictive ability. Second, two classification models were chosen for comparison: linear discriminant analysis and logistic regression. A combination of syntactic leaf node frequency and stylometric punctuation characters and NOT contraction variation assessed with logistic regression provided the most accurate predictions.
Manczur, Jonathan I., "From an Art to a Science: Features and Methodology in Computational Authorship Identification" (2021). CUNY Academic Works.