Date of Degree

2-2017

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

William Sakas

Subject Categories

Computational Linguistics

Keywords

semantic search, information retrieval

Abstract

Many modern information retrieval systems work by using keyword search to locate documents in an inverted index by matching those documents based on terms in a user’s query. While highly effective for many use-cases, one notable drawback to simple keyword-based searching is that the contextual knowledge surrounding the user’s underlying information need may be lost, particularly if the user’s query terms are ambiguous or have multiple meanings. Research in the field of semantic search aims to make progress towards resolving this. One methodology in particular, explicit semantic analysis, works by modeling a document not only as a set of the unique terms it contains but also as a set of concepts which describe it; these concepts are derived from some authoritative or curated source and assigned to each document in a collection. This paper presents a prototype information retrieval system called “ES-ESA” which borrows from the principles of explicit semantic analysis and implements them using the Elasticsearch framework. The ES-ESA system is qualitatively evaluated using a corpus of academic research abstracts.

This work is embargoed and will be available for download on Wednesday, January 16, 2019

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.

Share

COinS