Today communication has become very fast and is happening in real time. An event that happens in any part of the world gets communicated in few seconds/minutes to the rest of the world. For example the recent bomb blast in Syria was known to the world within few minutes. This event was broadcasted in various media. There is a great need to automatically identify various events such as bomb blasts, floods, cyclone, fires, political events etc., reported in various Newswires, Social Media text. This is the 2nd edition of the track.

    The first edition of this track was conducted last year at FIRE 2017. In that edition the task was to identify only the event and event span given in the data. Thus further going ahead in this track, along with the identification of event and its span, it is necessary to identify the cause and effects of a given event. The actual real time applications will be benefited only if the full information related to the event is identified. For example for a bomb blast, it will be required to know where it has occurred, when it has occurred, who and what are all got effected, is there any causalities etc. In this edition of the track we propose to provide data annotated with the cause and effect details of an event and participants are required to identify these details along with event identification. And as in the last year, the focus is on Indian languages text from social media.

The objectives of EventXtract-IL are:

  • Creation of benchmark data for Event Extraction in Indian language text from various sources such as Newswires, Blogs, Facebook, Twitter etc.
  • Encourage researchers to develop novel systems for Event extraction.
  • Providing opportunity to researchers to have comparison of different Machine learning techniques and other techniques as well.

Task Description

    The task is to identify various events such as sport events, terrorist events, natural disasters, crime events, corporate events, political events, accidents etc in a given text. The texts can be in various sources such as Newswires, Blogs, and Micro blogs. These can be written in roman script, utf-8, and can be code mix, where an Indian Language is mixed with English.
    In this initiative we have chosen three Indian languages Hindi, Malayalam and Tamil. In the training phase we will be providing two files, one which is the raw file and another file containing the annotations. This annotation file (second file), will be column format. The file format will be announced soon, when the sample data is released. In the testing phase participants will be provided with the raw text file only. The participants are to submit a similar Annotation file for test data.
    Participants will be allowed to use any pre-processing tools which are in the open source or developed in-house. But will not be allowed to use external training data.

Training Corpus

Training data will be released on Jun 5th. Sample of the data will be put here soon.


     Please register by sending email to with subject line "Registration for EventXtraction-IL2 2018" with the following details:
"Team Leader Name:"
"Team Affiliation (Proper full Address of the Organization):"
"Team Contact Person name:" and "Email ID:"
"Languages for which participating:"
"Team Members Names:"
(PS: Maximum of 4 members will be allowed in a team)

Submission Format

    The training data will be in column format as explained in Task description. The test data will be provided in the same format as given in the training, except that the Event annotation will not be there. The participants have to submit their test runs in the format as given in training data.
Note: There should be no changes/alterations in the format of the test run submission file.
Each team can submit maximum of 3 test runs for each language.

Evaluation Criteria

    We plan to use the standard evaluation metrics of Precision, Recall and F-measure. The methodology for calculating the Precision and Recall will be field based average score. For example, for an Event E1, if there are 6 fields such as Event Type, Event Location, Event Date, Event Actors/Participants, Causes, Effects. Then for that event E1, if all these fields are identified correctly then the system gets full score of 7/7 else according to the identified fields the score will be modified. And finally micro and macro-average of the Precision and Recall will be calculated and final score is arrived at.

Task Coordinators - Organizing Committee

Computational Linguistics Research Group (CLRG),
AU-KBC Research Centre

Pattabhi RK Rao, AU-KBC Research Centre, Chennai, India.
Sobha Lalitha Devi, AU-KBC Research Centre, Chennai, India.