Skip to main content
  • Home
  • About
    • Background
    • Government of Canada Kanishka Project
    • SecDev Kanishka Project
    • The SecDev Foundation
  • Methodologies
    • Event Detection
    • Machine Learning and Natural Language Processing (ML/NLP)
    • Netnography
    • Predictive Analytics and Data Mining
    • Social Network Analysis (SNA)
    • “Crowd Sourced” Information
  • Interviews
  • Events
    • Women, Technology, Partnerships - Countering Terrorist use of the Internet
  • Other Resources
    • Against Violent Extremism (AVE)
    • Counterextremism.org Project
    • Extreme Dialogue
    • Institute for Strategic Dialogue (ISD)
    • Montreal Institute for Genocide and Human Rights Studies
    • The FREE Initiative
    • Technology Against Terrorism

Search form

Machine Learning and Natural Language Processing (ML/NLP)

You are here

Home » Methodologies » Machine Learning and Natural Language Processing (ML/NLP)

Machine Learning and Natural Language Processing

"Machine Learning" (ML) is a sub-field of computer science related to artificial intelligence that is concerned with the construction and study of systems that can “learn” from data. Essentially, ML involves programming a “learning machine” to extract meaning from data processed automatically - i.e. data not seen by a human analyst - after the machine has been “trained” through exposure to a learning data set.

"Natural Language Processing" (NLP) is an associated sub-field of computer science which applies the principles of ML to detect the meaning of “natural” language in a given data set.  

The field of ML, and the associated application of NLP methods, hold great potential for applicability to counterterrorism. As methods that use artificial intelligence principles, these tools can be programmed to work through massive amounts of open source data (including social media) to look for signs of interest to the counterterrorism community. Some examples of this may be:

“Sentiment Analysis”—training a learning machine to recognize sentiments expressed in the natural language of the data set. This can be useful in the counterterrorism context, for example, to recognize “violent” messages in social media data.

“Latent Insight”—training a learning machine to use certain features of a given piece of data (in both the content and metadata) to probabilistically determine certain “latent” features about the author such as age, gender, geographic location, ethnicity, religion, political inclination…etc.) This can be useful in the counterterrorism context, for example, to provide information to officials to profile content of interest.

Using ML and NLP to triage vast amounts of data that can then be looked over more manageably by human analysts. This can be useful in the counterterrorism context as a way to spotting content of interest to be used by intelligence analysts and/or public safety officials.

Related Articles

A Lexicon-Based Approach for Hate Speech Detection

This paper addresses several of the key issues facing creation of a classifier for hate-speech on forums, blogs, or other areas of web discourse.

Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long
2015
Read More
An Analysis of Interactions Within and Between Extreme Right Communities in Social Media

This article studied a selection of right-wing extremist (RWE) groups on Twitter. The authors looked at particular language-based networks as case studies, collecting Twitter data for groups across eight countries.

Derek O’Callaghan, Derek Greene, Maura Conway, Joe Carthy, Padraig Cunningham
2012
Read More
Automatic Crime Prediction Using Events Extracted from Twitter Posts

This 2012 article focuses on predictive analytics and in particular crime prediction using events and data extracted from Twitter posts. The authors present a preliminary investigation of Twitter-based criminal incident prediction.

Xiaofeng Wang, Matthew S. Gerber, and Donald E. Brown
2012
Read More
Automatic Detection of Cyber-Recruitment by Violent Extremists

This article presents methods for identifying the recruitment activities of violent groups within extremist social media websites.

Jacob R. Scanlon and Matthew S. Gruber
August 2014
Read More
Beyond Trending Topics: Real-World Event Identification on Twitter

This paper examines approaches for analyzing Twitter messages to distinguish between those covering real-world events and non-event messages. To validate this work the authors applied their process to 2.6 million Twitter messages.

Hila Becker, Mor Naaman and Luis Gravano
2011
Read More
Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalization

This paper explores the use of crawling global social networking platforms to undercover previously unknown radicalized individuals. To prove the utility of this process the authors collect a YouTube dataset from a group that potentially has a radicalizing agenda.

Adam Bermingham, Maura Conway, Lisa McInerney, Neil O’Hare, and Alan F. Smeaton
2009
Read More
Detecting Linguistic Markers for Radical Violence in Social Media

This article focuses on the use of linguistic “weak signals”—digital traces of intent—in social media as a tool of counterterrorism aimed at preventing lone-wolf attacks.

Katie Cohen, Fredrik Johansson, Lisa Kaati, Jonas Clausen Mork
December 2013
Read More
Detecting Social Polarization and Radicalization

This paper presents a theoretical computerized system to detect social polarization and to estimate the related chances of violent radicalization. Existing technologies are analyzed to determine how they can be integrated into the proposed system to fulfil the authors’ objectives.

Pir Abdul Rasool Qureshi, Nasrulah Memon, Uff Kock Wiil and Panagiotis Karampelas
April 2011
Read More
Extreme Dialogue: Social media Target Audience Analysis and Impact Assessments in support of countering violent extremism. An abridged summary report of findings and lessons learned.
SecDev Foundation
2016
Read More
Geolocation Prediction in Social Media Data by Finding Location Indicative Words

This article addresses the degree to which geolocation prediction is vital to geospatial applications like localised search and local event detection.

Han Bo, Paul Cook, and Timothy Baldwin
2012
Read More
Hate Speech, Machine Classification and Statistical Modelling of Information Flows on Twitter: Interpretation and Communication for Policy Decision Making

This article deals with a supervised machine learning text classifier, trained and tested to distinguish between hateful and/or antagonistic response with a focus on race, ethnicity or religion; and more general responses.

Pete Burnap and Matthew L. Williams
2014
Read More
How ISIS Uses Twitter: Analyze how ISIS fanboys have been using Twitter since 2015 Paris Attacks

This resource contains the description and data analysis from a research project conducted by Fifth Tribe into ISIS’s use of Twitter in the aftermath of the 2015 Paris Attacks.

Khuram Zaman
May 2016
Read More
ISIS Has a Twitter Strategy and It Is Terrifying [Infographic]

This resource is an informative, visual and descriptive infographic by Fifth Tribe focused in ISIS’s use of Twitter. The infographic and descriptions in the article were written following the Paris attacks in November 2015.

Khuram Zaman
November 2015
Read More
National Security and Social Media Monitoring: A Presentation of the EMOTIVE and Related Systems

This 2013 article focuses on Natural Language Processing (NLP) and particularly social media monitoring of Twitter for purposes of national security.

Martin D. Sykora, Thomas W. Jackson, Ann O’Brien, Suzanne Elayan
2013
Read More
Performance Evaluation of a Natural Language Processing Approach Applied in White Collar Crime Investigation

This article focuses on challenges to law enforcement when dealing with massive amounts of data in criminal investigations.

Maarten van Banerveld, Nhien-An Le-Khac, M-Tahar Kechadi
November 2014
Read More
Predicting the Present with Google Trends

The authors of this 2011 article hypothesize that the results of Google Trends, given its daily and weekly reports on queries related to various industries may be correlated to the current level of economic activity in these industries.

Hyunyoung Choi and Hal Varian
December 2011
Read More
Raising and Rising Voices in Social Media

This 2012 paper develops a novel methodology for modeling cyber-collective social networks (CSMs) from individual, community, and transnational perspectives. The authors do this by utilizing existing collective action theories and computational approaches for social network analysis.

Nitin Agarwal, Merlyna Lim, Rolf Wigand
2012
Read More
Reshaping Terrorist Networks

This article examines the use of Machine Learning (ML) algorithms for predicting key nodes for targeting within terrorist organizational structures.

Francesca Spezzano, V. S. Subrahmanian, Aaron Mannes
August 2014
Read More
Semantic Lexicon Expansion for Concept-Based Aspect-Aware Sentiment Analysis

This paper concerns the creation of a prototype for sentiment analysis, capable of discerning key aspects of an entity under review, and the type of polarity in the response associated with it.

Anni Coden, Dan Gruhl, Neal Lewis, Pablo N. Mendes, Meena Nagarajan, Cartic Ramakrishnan, and Steve Welch
2014
Read More
State of the Art 2015: a literature review of social media intelligence for counter-terrorism

This is a foundational report and a seminal work in the study of social media intelligence and open source research. The paper reviews 245 papers in a semi-systematic literature review of how information and insight can be drawn from open social media sources.

Jamie Bartlett and Louis Reynolds
September 2015
Read More
SUIT: A Supervised Item-Based Topic Model for Sentiment Analysis

This 2014 article focuses on sentiment analysis and proposes a new model for analyzing user sentiments and opinions online.

Fangtao Li, Sheng Wang, Shenghua Liu and Ming Zhang
2014
Read More
SUPER: Towards the Use of Social Sensors for Security Assessments and Proactive Management of Emergencies

In this paper, the authors argue that despite the widespread use of social media in various domains (e.g.

Richard McCreadie, Karolin Kappler, Magdalini Kardara, Andreas Kaltenbrunner, Craig Macdonald, John Soldatos, Iadh Ounis
May 2015
Read More
Text-Based Twitter User Geolocation Prediction

This article reveals just how vital geographical location is to geospatial applications like local search and event detection. In this paper, the research team investigates and seeks to improve on the task of text-based geolocation prediction of Twitter users.

Han Bo, Paul Cook, and Timothy Baldwin
March 2014
Read More
The State of the Art: A Literature Review of Social Media Intelligence Capabilities for Counter-Terrorism

This article is a seminal piece and a foundational resource in the field of social media analytics and open source intelligence by some of the field’s leading authors.

Jamie Bartlett and Carl Miller
November 2013
Read More
Towards Detecting Rumours in Social Media

This paper describes the methodology that the authors have developed for the collection and sampling of conversational threads, as well as the tools they have developed to identify rumour-based threads.

Arkaitz Zubiaga, Maria Liakata, Rob Procter, Kalina Bontcheva, Peter Tolmie
April 2015
Read More
Twitterrank: Finding Topic-Sensitive Influential Twitterers

This article focuses on the problem of identifying influential Twitter users using Machine Learning (ML) techniques and Natural Language Processing (NLP).

Jianshu Weng, Ee Peng Lim, Jing Jiang, and Qi He
2010
Read More
Umati Final Report

This 44 page resource is the comprehensive final report of the Umati Project that focused on monitoring hate speech online.

iHub Research and Ushahidi
2013
Read More

Contact info

+1-613-755-4007 •  info@secdev.foundation

Copyright © 2014 - 2017 • The SecDev Foundation

  • Home
  • The SecDev Foundation