Predicting and Interpolating State-level Polling using Twitter Textual Data

You are here

This paper sets out to interpolate and predict state-level polling at the daily level by employing a dataset of over 500GB of political tweets from the final months of the 2012 presidential campaign. The author contends that continuous real-time polling of states during a presidential, gubernatorial and senatorial elections are prohibitively expensive and in many cases, neglected for less competitive states. As such, by modeling the correlations between state-level polls and the textual content of state-located Twitter data (a combination of time series cross sectional methods plus bayesian shrinkage and model averaging) one can predict changes in opinion polls with a precision currently unfeasible with existing polling data. According to Beauchamp, this could allow researchers and others to estimate polling at the state and more minute levels, in addition to time periods shorter than 24 hours.


These findings prove useful to researchers attempting to understand Bayesian shrinkage, Bayesian model averaging, and time-series cross-sectional methods for in time prediction testing. The methods included in this paper are not just useful for generating poll-like data, but also present a methodology for investigating how what people say my help predict their intentions.

Nick Beauchamp

September 2013