Dissertations, Theses, and Capstone Projects
Date of Degree
6-2020
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
Rebecca Levitan
Subject Categories
Computational Linguistics | Public Policy
Keywords
computational linguistics, grants, text data, public policy, social science
Abstract
The UMETRICS database (Universities: Measuring the Effects of Research on Innovation, Competitiveness, and Science) contains rich information on grants from sponsored federal and non-federal research for 32 universities over a 15-year period. It is hosted at IRIS (Institute for Research on Innovation and Science, University of Michigan) and serves as a rich source of university administrative data; however, it does not contain information on research fields. Categorizing grants data by research field can help to measure results of investment in research and science and provide evidence for the data-driven policy-making; yet administrative data often lacks this type of categorization. In the UMETRICS database the funding source name is mentioned on the grant, but funders sponsor research from a variety of fields. For all/most grants, we discovered a grant title is known. Our goal is to find a simple and interpretable method of assigning research field categories using grant title text. We propose a straightforward and computationally inexpensive approach by using keywords from the Wikipedia research fields’ corpus and assigning “probability” scores to a grant title and its keywords as belonging to a given research field.
Recommended Citation
Levitskaya, Ekaterina, "Inferring Research Fields in Administrative Records Using Text Data" (2020). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/3807