Skip to the content.

IC2S2 2020 Tutorial

Tutorial Home

Tutorial description

In this hands-on tutorial, we introduce the participants to working with social media data, which are an example of Digital Social Trace Data (DSTD). The DSTD abstraction allows us to model social media data with rich information associated with social media text, such as authors, topics, and time. We introduce the participants to several Python based open source tools for performing IE on social media data. Furthermore, the participants will be familiarized with a catalogue of more than 30 publicly available social media corpora for various IE tasks, e.g., named entity recognition (NER), part of speech (POS) tagging, chunking, super sense tagging, entity linking, sentiment classification, and hate speech identification. Finally, the participants will be introduced to the following applications of extracted information: a) Ranking users based on their enthusiasm and support towards social causes, b) Correlation between sentiment and user-level attributes in existing corpora. The tutorial aims to serve the following use cases for social media researchers: a) high accuracy IE on social media text via multi-task and semi-supervised learning, b) rapid annotation of new data for text classification via active human-in-the-loop learning, c) temporal visualization of the communication structure in social media corpora via social communication temporal graph visualization technique.

Rationale

Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured information from unstructured data. While there are numbers of open source tools for performing IE on newswire and academic publication corpora, there is a lack of such tools when dealing with social media corpora, which tends to exhibit very different linguistic patterns compared to the other corpora. It has also been found that publicly available tools for IE, which are trained on news and academic corpora do not tend to perform very well on social media corpora.

Topics of interest include:

Benefit to IC2S2 community

Many scholars of computational social science work with social media text for their research. This tutorial will allow them to use state of the art methods for processing social media text which can strengthen the quality of their analysis.

This will be a 3-hours long tutorial session using Python based, open source tools. The tutorial will be structured as follows:

Pre-arrival material

Software setup

Tutorial day schedule

INTRODUCTION (SHUBHANSHU AND JANA) (30 MIN)

APPLICATIONS OF INFORMATION EXTRACTION (SHUBHANSHU, JANA, AND SHADI) (30 MIN)

COLLECTING AND DISTRIBUTING SOCIAL MEDIA DATA (SHUBHANSHU AND JANA) (20 MINS)

BREAK (10 MINS)

IMPROVING IE ON SOCIAL MEDIA DATA USING MACHINE LEARNING (SHUBHANSHU) (1 HRS)

CONCLUSION AND FUTURE DIRECTIONS (SHUBHANSHU, JANA, AND SHADI) (20 MINS)

TARGET AUDIENCE AND PRE-REQUISITES

The tutorial is aimed at scholars working in the area of social media analysis and who often utilize text-based analysis techniques. We expect the participants to have some familiarity with python programming or ability to run python scripts with their own data. We also expect familiarity with social media platforms like Twitter and Facebook.

References