Skip to the content.

Hypertext Tutorial

Tutorial Home

Tutorial description

This will be a 3-hours long tutorial session using Python based, open source tools. The tutorial will be structured as follows:

Introduction (15 mins)

Familiarize participants with various IE tasks for tweets, e.g.:

  1. Sequence tagging : named entity detection and classification, part of speech tagging, chunking, and super-sense tagging.
  2. Text classification : sentiment prediction, sarcasm detection, and abusive content detection.

Applications of information extraction (15 mins)

This includes:

  1. Query-based search on text corpora.
  2. Visualizing temporal trends in information.

Responsible and compliant data use of tweets (15 mins)

  1. Overview on available annotated tweet datasets.
  2. Clarify on terms of service, regulations such as privacy policies, and norms for working with tweets.

Break (15 mins)

Hands on session (1 hr. 30 mins)

  1. Setup Google colaboratory and install required dependencies (takes 15 mins) -https://colab.research.google.com/drive/1YHMyGsnzUjTQ2GcRomGY5SD5eVPA1siR
  2. Collecting and sharing samples of tweet data, with focus on following Twitter's terms of service and additional community norms. - Covered in slides.
  3. Efficiently annotating classification data using active human-in-the-loop learning. - Covered in slides.
  4. Using TwitterNER for feature based high accuracy named entity recognition for Tweets
  5. Using Multi-task learning for sequence tagging - https://colab.research.google.com/drive/1YHMyGsnzUjTQ2GcRomGY5SD5eVPA1siR
  6. Using Multi-task learning for text classification - https://colab.research.google.com/drive/1YHMyGsnzUjTQ2GcRomGY5SD5eVPA1siR
  7. Visualize extracted information and tweets using temporal network visualizations. Covered in slides. See: https://shubhanshu.com/social-comm-temporal-graph/

Additional notebooks

NOTE: because colab doesn’t share VMs these notebooks don’t work. You need to copy the code into the install library notebook

Conclusion (15 mins)

Resources to follow up and questions from participants.