Learning Similarity Metrics for Event Identification in Social Media

You are here

The authors of this article propose a methodology for automatically identifying events and their associated user-contributed social media documents (such as those posted on Flickr, YouTube, and Facebook) to enable event browsing and search in state-of-the-art search engines. The authors seek to exploit the rich “context" associated with social media content, including user-provided annotations, titles, and tags, and automatically generated information such as content creation time. Using this rich context, which includes both textual and non-textual features, the authors argue that it is possible to define appropriate similarity metrics to enable clustering of social media content linked to events.

This article will be of particular use to practitioners and researchers interested in analyzing social media content as it relates to real-world events. A key contribution of this paper is that it explores a variety of techniques for learning multi-feature similarity metrics for social media content in a principled manner and tests these techniques on large-scale, real-world datasets of event images from Flickr. Social media sites have become one of the most popular distribution outlets for users to share their experiences and interests online. These sites thereby host substantial amounts of user-contributed materials (including photographs, videos, and textual content) for a wide and diverse variety of real-world event content.

Full article is available here >>

Hila Becker, Mor Naaman, and Luis Gravano

February 2010