Dissertations, Theses, and Capstone Projects

Date of Degree

6-2020

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Rebecca Levitan

Subject Categories

Computational Linguistics | Public Policy

Keywords

computational linguistics, grants, text data, public policy, social science

Abstract

The UMETRICS database (Universities: Measuring the Effects of Research on Innovation, Competitiveness, and Science) contains rich information on grants from sponsored federal and non-federal research for 32 universities over a 15-year period. It is hosted at IRIS (Institute for Research on Innovation and Science, University of Michigan) and serves as a rich source of university administrative data; however, it does not contain information on research fields. Categorizing grants data by research field can help to measure results of investment in research and science and provide evidence for the data-driven policy-making; yet administrative data often lacks this type of categorization. In the UMETRICS database the funding source name is mentioned on the grant, but funders sponsor research from a variety of fields. For all/most grants, we discovered a grant title is known. Our goal is to find a simple and interpretable method of assigning research field categories using grant title text. We propose a straightforward and computationally inexpensive approach by using keywords from the Wikipedia research fields’ corpus and assigning “probability” scores to a grant title and its keywords as belonging to a given research field.

Share

COinS