Dissertations, Theses, and Capstone Projects

Date of Degree

2-2017

Document Type

Capstone Project

Degree Name

M.A.

Program

Liberal Studies

Advisor

Matthew K. Gold

Subject Categories

Digital Humanities | Interdisciplinary Arts and Media | Other Computer Sciences | Visual Studies

Keywords

digital humanities, natural language processing, computer vision, multimodal, multimodal data generation, meteor web application, meteor.js, javascript web application

Abstract

First created as part of the Digital Humanities Praxis course in the spring of 2012 at the CUNY Graduate Center, Tandem explores the generation of datasets comprised of text and image data by leveraging Optical Character Recognition (OCR), Natural Language Processing (NLP) and Computer Vision (CV). This project builds upon that earlier work in a new programming framework. While other developers and digital humanities scholars have created similar tools specifically geared toward NLP (e.g. Voyant-Tools), as well as algorithms for image processing and feature extraction on the CV side, Tandem explores the process of developing a more robust and user-friendly web-based multimodal data generator using modern development processes with the intention of expanding the use of the tool among interested academics. Tandem functions as a full-stack JavaScript in-browser web application that allows a user to login, upload a corpus of image files for OCR, NLP, and CV based image processing to facilitate data generation. The corpora intended for this tool includes picture books, comics, and other types of image and text based manuscripts and is discussed in detail. Once images are processed, the application provides some key initial insights and data lightly visualized in a dashboard view for the user. As a research question, this project explores the viability of full-stack JavaScript application development for academic end products by looking at a variety of courses and literature that inspired the work alongside the documented process of development of the application and proposed future enhancements for the tool. For those interested in further research or development, the full codebase for this project is available for download.

Tandem2-master (1).zip (499 kB)
TANDEM 2.0 - Source Code, Read Me, etc.

tandem-20-20170118224919.warc.gz (3025 kB)
TANDEM 2.0 - Web Archive File. Archived website as a WARC file, created using webrecorder.io – web archive player available at http://github.com/ikreymer/webarchiveplayer

Tandem - Christopher Vitale - Screencapture.mp4 (4256 kB)
TANDEM 2.0 - Screencapture

Share

COinS