Date of Degree


Document Type


Degree Name





David A. Jaeger

Committee Members

Francesc Ortega

Lev Manovich

Subject Categories



national accounts, big data


This dissertation consists of two chapters that utilize distinct econometric methods and novel datasets.

In the first chapter, “From Twitter to GDP: Estimating Economic Activity From Social Media”, I collect all geo-located image tweets shared on Twitter in 2012-2013 to study whether the volume of tweets is a valid proxy for estimating current GDP in USD at the country level. My preferred model explains 94 percent of the cross-country variation and the residuals from the model are negatively correlated to a data quality index, indicating that my estimates of GDP are more accurate for countries with more reliable GDP data. I then compare Twitter with the more commonly-used proxy of night-light data, and find that the variation in Twitter activity explains slightly more of the cross-country variance in GDP. Interestingly, I find that Twitter data is also valuable in estimating within country changes in economic activity from one period to the next. Furthermore, the results indicate that combining tweets and night-light luminosity can be used together to produce a more accurate estimate of annual variations in economic activity. I then study the underlying relationship between the volume of image tweets and economic activity and present a hypothesis that social media applications serve as a medium to showcase consumption of goods and services among the network of users. I divide tweets between those posted during working hours and non-working hours and regress them on personal consumption expenditure. The results support the idea that tweets are a byproduct of consumption. The last part of the chapter exploits the continuous time and geographic granularity of social media posts in order to estimate the local economic effects of natural resource extraction. Overall, my findings suggest that Twitter can be used to measure economic activity in a more timely and more spatially disaggregate way than conventional data and that governments’ statistical agencies could incorporate social media data to complement and further reduce measurement error in their official GDP estimates.

In the second chapter, “What’s in a Tweet? Estimating Poverty Rates from Social Media Data”, I collect all geo-located image tweets posted from the US in 2012 to study whether they can estimate poverty rates in urban areas. In order to exploit the full potential of social media data, I not only use the volume of tweets from each location, but also extract several features from the content of tweets using natural language processing techniques. From the results presented in this chapter, it seems that data from tweets are not informative enough to replace survey data, as there is still substantial error in the estimates. But the chapter does present two different scenarios in which data from tweets can be valuable when estimating poverty rates. On one hand, social media data can be combined with alternative economic indicators to obtain reasonably accurate poverty rate estimates when such indicators are not officially available. On the other hand, social media data can be useful to study the relationship between poverty and an unobservable variable that can be proxied with social media data, such as human capital formation.

This work is embargoed and will be available for download on Thursday, September 30, 2021

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.

Included in

Econometrics Commons