Today with advancement in internet technologies and penetration of smart phones has made information easily accessible in different forms such as text, images, audio and videos. Today communication has become very fast, it is very much possible that an event that happens in any part of the world gets communicated in few seconds/minutes to the rest of the world. For example the recent bomb blast in Paris was known to the world within an hours time. This event was broadcasted in various media. There is a great need to automatically identify various events such as bomb blasts, floods, cyclone, fires, political events etc., reported in various Newswires, Social Media text.

In this track, the objective is to encourage development of systems for the identification of Events in the various types of text such as Newswire, Facebook, and Twitter etc., for Indian languages. Here we will be providing data which is manually annotated with various types of events along with their spans. And the focus is on Indian language texts. English documents will also be provided. There have been several works In English for event extraction, evaluation exercises such as ACE, TAC by NIST and CLEF evaluation tracks.

The objectives of this evaluation exercise are:

  • There is no Event annotated data for Indian Languages. Thus, creation of benchmark data for Event Extraction in Indian language text found from various sources such as Newswires, Blogs, Facebook, Twitter etc.
  • Providing opportunity to researchers to develop systems for Event extraction for Indian language text.

Task Description

The task is to identify various events such as sport events, terrorist events, natural disasters, crime events, corporate events, political events, accidents etc in a given text. The texts can be in various sources such as Newswires, Blogs, and Micro blogs. These can be written in roman script, utf-8, and can be code mix, where an Indian Language is mixed with English.
In this initiative we have chosen three Indian languages Hindi, Malayalam and Tamil. In the training phase we will be providing two files, one which is the raw file and another file containing the annotations. This annotation file (second file), will be column format where it has 5 columns, Tweet ID, User ID, Event String, Event character start index and length offset. In the testing phase participants will be provided with the raw text file only. The participants are to submit a similar Annotation file for test data.
Participants will be allowed to use any pre-processing tools which are in the open source or developed in-house. But will not be allowed to use external training data.

Training Corpus

Training data distributed to Participants

Test Corpus

Test data distributed to Participants


Registration is now Closed !!!
Please register by sending email to with subject line "Registration for EventXtraction-IL 2017" with the following details
"Team Leader Name", "Team Affiliation", "Team Contact Person name" and "Email ID", "Languages for which participating".

Submission Format

The training data will be in column format as explained in Task description. The test data will be provided in the same format as given in the training, except that the Event annotation will not be there. The participants have to submit their test runs in the format as given in training data.
Note: There should be no changes/alterations in the format of the test run submission file.
Each team can submit maximum of 3 test runs for each language.

Evaluation Criteria

Will be announced soon !!!

Task Coordinators - Organizing Committee

Computational Linguistics Research Group (CLRG),
AU-KBC Research Centre

Pattabhi RK Rao, AU-KBC Research Centre, Chennai, India.
Sobha Lalitha Devi, AU-KBC Research Centre, Chennai, India.