Date of Award

2016

Document Type

Thesis

First Advisor

Jie Wei

Keywords

Natural language processing, distributed representation, sentence completion

Abstract

In recent years, the distributed representation of words in vector space or word embeddings have become very popular as they have shown significant improvements in many statistical natural language processing (NLP) tasks as compared to traditional language models like Ngram. In this thesis, we explored various state-of-the-art methods like Latent Semantic Analysis, word2vec, and GloVe to learn the distributed representation of words. Their performance was compared based on the accuracy achieved when tasked with selecting the right missing word in the sentence, given five possible options. For this NLP task we trained each of these methods using a training corpus that contained texts of around five hundred 19th century novels from Project Gutenberg. The test set contained 1040 sentences where one word was missing from each sentence. The training and test set were part of the Microsoft Research Sentence Completion Challenge data set. In this work, word vectors obtained by training skip-gram model of word2vec showed the highest accuracy in finding the missing word in the sentences among all the methods tested. We also found that tuning hyperparameters of the models helped in capturing greater syntactic and semantic regularities among words.

Recommended Citation

Saifee, Saniya, "EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES" (2016). CUNY Academic Works.
https://academicworks.cuny.edu/cc_etds_theses/623

Download

Included in

Databases and Information Systems Commons

COinS

Dissertations and Theses

EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES

Date of Award

Document Type

First Advisor

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations and Theses

EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES

Author

Date of Award

Document Type

First Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links