Date of Degree

9-2016

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

William Gregory Sakas

Subject Categories

Computational Linguistics | First and Second Language Acquisition

Keywords

Part-of-speech, Tagger, Tagging, MLU, Tagging Accuracy

Abstract

This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging. By comparing the average error rate of each tagger, we found the size of training data set, and the length of utterance both plays a role to effect tagging accuracy.

Recommended Citation

Huang, Rui, "An Evaluation of POS Taggers for the CHILDES Corpus" (2016). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/1577

Download

Included in

Computational Linguistics Commons, First and Second Language Acquisition Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

An Evaluation of POS Taggers for the CHILDES Corpus

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

An Evaluation of POS Taggers for the CHILDES Corpus

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links