WWW 2021 Tutorial - Information extraction from social media: A hands-on tutorial on tasks, data, & open source tools

Tutorial Home

Tutorial description

In this hands-on tutorial, we introduce the participants to working with social media data, which are an example of Digital Social Trace Data (DSTD). The DSTD abstraction allows us to model social media data with rich information associated with social media text, such as authors, topics, and time stamps. We introduce the participants to several Python-based, open-source tools for performing Information Extraction (IE) on social media data. Furthermore, the participants will be familiarized with a catalogue of more than 30 publicly available social media corpora for various IE tasks such as named entity recognition (NER), part of speech (POS) tagging, chunking, super sense tagging, entity linking, sentiment classification, and hate speech identification. Finally, the participants will be introduced to the following applications of extracted information: a) combining network analysis and text-based signals to rank accounts, and b) correlation between sentiment and user-level attributes in existing corpora. The tutorial aims to serve the following use cases for social media researchers: a) high accuracy IE on social media text via multitask and semi-supervised learning, including the recent transformer based tools, b) rapid annotation of new data for text classification via active human-in-the-loop learning, c) temporal visualization of the communication structure in social media corpora via social communication temporal graph visualization technique, and d) detecting and prioritizing needs during crisis events (e.g., COVID19).

Intended audience: Researchers of social media datasets, computational social scientists, machine learning and NLP researchers.

Pre-arrival material

Software setup


This will be a 3-hours long tutorial session using Python based, open source tools. The tutorial will be structured as follows:

Setup and Introduction (30 mins)

Applications of information extraction (40 mins)

Collecting and distributing social media data (15 mins)

Break 10 mins

Improving IE on social media data via Machine Learning (1 hr 15 mins)

Conclusion and future directions (10 mins)

Open questions in social media IE, Tutorial feedback and additional questions

Resources to follow up and questions from participants.