Publications and Research
Document Type
Article
Publication Date
7-21-2014
Abstract
While there is a vast amount of useful US government data on the web, some of it is in a raw state that is not readily accessible to the average user. Data librarians can improve accessibility and usability for their patrons by processing data to create subsets of local interest and by appending geographic identifiers to help users select and aggregate data. This case study illustrates how census geography crosswalks, Python, and OpenRefine were used to create spreadsheets of non-profit organizations in New York City from the IRS Tax-Exempt Organization Masterfile. This paper illustrates the utility of Python for data librarians and should be particularly insightful for those who work with address-based data.
Comments
This work was originally published by Code4lib Journal. Full-text with sample files and scripts for download are available at: http://journal.code4lib.org/articles/9652