Date of Degree
Computational Linguistics | Public Policy
computational linguistics, grants, text data, public policy, social science
The UMETRICS database (Universities: Measuring the Effects of Research on Innovation, Competitiveness, and Science) contains rich information on grants from sponsored federal and non-federal research for 32 universities over a 15-year period. It is hosted at IRIS (Institute for Research on Innovation and Science, University of Michigan) and serves as a rich source of university administrative data; however, it does not contain information on research fields. Categorizing grants data by research field can help to measure results of investment in research and science and provide evidence for the data-driven policy-making; yet administrative data often lacks this type of categorization. In the UMETRICS database the funding source name is mentioned on the grant, but funders sponsor research from a variety of fields. For all/most grants, we discovered a grant title is known. Our goal is to find a simple and interpretable method of assigning research field categories using grant title text. We propose a straightforward and computationally inexpensive approach by using keywords from the Wikipedia research fields’ corpus and assigning “probability” scores to a grant title and its keywords as belonging to a given research field.
Levitskaya, Ekaterina, "Inferring Research Fields in Administrative Records Using Text Data" (2020). CUNY Academic Works.
This work is embargoed and will be available for download on Thursday, June 02, 2022
Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.
See the GC’s lending policies to learn more.