Dissertations, Theses, and Capstone Projects

Do It Like a Syntactician: Using Binary Gramaticality Judgements to Train Sentence Encoders and Assess Their Sensitivity to Syntactic Structure

Pablo Gonzalez Martinez, The Graduate Center, City University of New YorkFollow

Date of Degree

9-2019

Document Type

Dissertation

Degree Name

Ph.D.

Program

Linguistics

Advisor

William Sakas

Committee Members

Martin Chodorow

Michael Mandel

Subject Categories

Artificial Intelligence and Robotics | Computational Linguistics | Syntax

Keywords

Neural Networks, Grammaticality Judgments, Long Short Term Memory, Sentence Encoding

Abstract

The binary nature of grammaticality judgments and their use to access the structure of syntax are a staple of modern linguistics. However, computational models of natural language rarely make use of grammaticality in their training or application. Furthermore, developments in modern neural NLP have produced a myriad of methods that push the baselines in many complex tasks, but those methods are typically not evaluated from a linguistic perspective. In this dissertation I use grammaticality judgements with artificially generated ungrammatical sentences to assess the performance of several neural encoders and propose them as a suitable training target to make models learn specific syntactic rules. I generate artificial ungrammatical sentences via two methods. First by randomly pulling words following the n-gram distribution of a corpus of real sentences (I call these Word salads). Second, by corrupting sentences from a real corpus by altering them (changing verbal or adjectival agreement or removing the main verb). We then train models with an encoder using word embeddings and long short term memory (LSTMs) to discriminate between real sentences and ungrammatical sentences. We show that word salads can be distinguished by the model well for low order n-grams but that the model does not generalize well for higher orders. Furthermore, the word salads do not help the model in recognizing corrupted sentences. We then test the contributions of pre-trained word embeddings, deep LSTM and bidirectional LSTM. We find that the biggest contribution is adding pre-trained word embeddings. We also find that additional layers contribute differently to the performance of unidirectional and bidirectional models and that deeper models have more performance variability across training runs.

Recommended Citation

Gonzalez Martinez, Pablo, "Do It Like a Syntactician: Using Binary Gramaticality Judgements to Train Sentence Encoders and Assess Their Sensitivity to Syntactic Structure" (2019). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/3473

Download

Included in

Artificial Intelligence and Robotics Commons, Computational Linguistics Commons, Syntax Commons

COinS

Dissertations, Theses, and Capstone Projects

Do It Like a Syntactician: Using Binary Gramaticality Judgements to Train Sentence Encoders and Assess Their Sensitivity to Syntactic Structure

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations, Theses, and Capstone Projects

Do It Like a Syntactician: Using Binary Gramaticality Judgements to Train Sentence Encoders and Assess Their Sensitivity to Syntactic Structure

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links