Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name





Martin Chodorow

Subject Categories

Artificial Intelligence and Robotics | Linguistics


automated summarization, automated text classification, semantic similarity metrics, sentiment analysis, stance classification, stancetaking


This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance.

A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text.

We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models.

An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance.