Geolocation Prediction in Social Media Data by Finding Location Indicative Words

You are here

This article addresses the degree to which geolocation prediction is vital to geospatial applications like localised search and local event detection. Predominately, social media geolocation models are based on full text data, including common words with no geospatial dimension and noisy strings, potentially hampering prediction and leading to slower/more memory-intensive models. This paper seeks to rectify the problem by focussing on finding location indicative terms via feature selection, and establishing whether the reduced feature set boosts geolocation accuracy. The results show that an information gain ratio-based approach surpasses other methods at term selection, outperforming state-of-the-art geolocation prediction methods remarkably in accuracy and reducing the mean and median of prediction error distance by on a public dataset.

This article is useful to lexicographers looking to formulate notions of prediction confidence, as it demonstrates that performance is even higher in cases where the author’s model is more confident, striking a trade-off between accuracy and coverage.


Han Bo, Paul Cook, and Timothy Baldwin