Date of Degree


Document Type


Degree Name





William Gregory Sakas

Subject Categories

Computational Linguistics | First and Second Language Acquisition


Part-of-speech, Tagger, Tagging, MLU, Tagging Accuracy


This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging. By comparing the average error rate of each tagger, we found the size of training data set, and the length of utterance both plays a role to effect tagging accuracy.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.