Date of Degree


Document Type


Degree Name





Rebecca Levitan

Subject Categories

Computational Linguistics | Public Policy


computational linguistics, grants, text data, public policy, social science


The UMETRICS database (Universities: Measuring the Effects of Research on Innovation, Competitiveness, and Science) contains rich information on grants from sponsored federal and non-federal research for 32 universities over a 15-year period. It is hosted at IRIS (Institute for Research on Innovation and Science, University of Michigan) and serves as a rich source of university administrative data; however, it does not contain information on research fields. Categorizing grants data by research field can help to measure results of investment in research and science and provide evidence for the data-driven policy-making; yet administrative data often lacks this type of categorization. In the UMETRICS database the funding source name is mentioned on the grant, but funders sponsor research from a variety of fields. For all/most grants, we discovered a grant title is known. Our goal is to find a simple and interpretable method of assigning research field categories using grant title text. We propose a straightforward and computationally inexpensive approach by using keywords from the Wikipedia research fields’ corpus and assigning “probability” scores to a grant title and its keywords as belonging to a given research field.

This work is embargoed and will be available for download on Thursday, June 02, 2022

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.