While there is a vast amount of useful US government data on the web, some of it is in a raw state that is not readily accessible to the average user. Data librarians can improve accessibility and usability for their patrons by processing data to create subsets of local interest and by appending geographic identifiers to help users select and aggregate data. This case study illustrates how census geography crosswalks, Python, and OpenRefine were used to create spreadsheets of non-profit organizations in New York City from the IRS Tax-Exempt Organization Masterfile. This paper illustrates the utility of Python for data librarians and should be particularly insightful for those who work with address-based data.
Donnelly, Frank. "Processing Government Data: ZIP Codes, Python, and OpenRefine". Code4lib Journal, Issue 25, 2014-07-21. http://journal.code4lib.org/articles/9652