Text-Based Twitter User Geolocation Prediction

You are here

This article reveals just how vital geographical location is to geospatial applications like local search and event detection. In this paper, the research team investigates and seeks to improve on the task of text-based geolocation prediction of Twitter users. As the researchers argue, previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author’s location. However, as their contention follows, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. The key tool of this study is their use of an integrated geolocation prediction framework, and how it investigates what factors impact on prediction accuracy. Their research finds that user-declared metadata can play an important role in geolocation, that modelling and inference on multilingual data is easier than on solely English data, and that including non-geotagged posts helps boosts accuracy in general.

Researchers interested in the topic will be able to use these findings to build upon the research in a number of ways, including incorporating hierarchical classification models and evaluating message-level geolocation.


Han Bo, Paul Cook, and Timothy Baldwin

March 2014