Skip to the content.

UIUC Research Park

Tutorial Home

Tutorial description

This will be a 3-hours long tutorial session using Python based, open source tools. The tutorial will be structured as follows:

Introduction (15 mins)

Familiarize participants with various IE tasks for tweets, e.g.:

  1. Sequence tagging : named entity detection and classification, part of speech tagging, chunking, and super-sense tagging.
  2. Text classification : sentiment prediction, sarcasm detection, and abusive content detection.

Applications of information extraction (15 mins)

This includes:

  1. Query-based search on text corpora.
  2. Visualizing temporal trends in information.

Responsible and compliant data use of tweets (15 mins)

  1. Overview on available annotated tweet datasets.
  2. Clarify on terms of service, regulations such as privacy policies, and norms for working with tweets.

Break (15 mins)

Hands on session (1 hr. 30 mins)

  1. Setup Google colaboratory and install required dependencies (takes 15 mins) -
  2. Collecting and sharing samples of tweet data, with focus on following Twitter's terms of service and additional community norms. Covered in slides.
  3. Efficiently annotating classification data using active human-in-the-loop learning.
  4. Using TwitterNER for feature based high accuracy named entity recognition for Tweets -
  5. Using Multi-task learning for sequence tagging -
  6. Using Multi-task learning for text classification -
  7. Visualize extracted information and tweets using temporal network visualizations. Covered in slides. See:

NOTE: Access to SocialMediaIE library used for Multi-task learning was provided privately to the tutorial participants. We plan to release it as an open source library in coming months. You can check the status at:

Conclusion (15 mins)

Resources to follow up and questions from participants.