Task Description


    Machine translation (MT) is one of the most active research areas in Natural Language Processing across the globe. MT in Indian languages has picked-up in the last decade. In developing a machine translation system, translation of the verb phrase from source language to target language is a challenging task. The verb phrases include finite verb, non-finite verb, auxiliary verb main verb, verbal particles and negation verb constructions. Verb phrases also carry information namely, tense, aspect, modal, and PNG (person, number and gender) other than the main verb. The characteristics of the verbs vary between languages. In languages such as Tamil, Telugu, Kannada, Hindi, the subject and the finite verb of the sentence agree in PNG. In languages such as English and Malayalam there is no agreement between the subject and finite verb. Languages vary in structures such as SVO, SOV. These characteristics make the translation of Verb phrases from one language to another a challenging task.
    The objective of this shared task is to boost the research in Machine translation in Indian languages. We have narrowed down the scope of the track to translation of Verb Phrases from English to Tamil and Hindi to Tamil. These three languages are from different language families namely, Indo-European, Indo-Aryan and Dravidian. Sentence structures and characteristics of the verbs vary largely across these languages and make the task an interesting and challenging task. The researchers can come-up with various methodologies such as rule-based, Machine Learning and Hybrid techniques. Participants will be allowed to use any pre-processing tools which are in the open source or developed in-house.

Training Corpus


Training data will be released on Jun 1st.
Data will be consisting of 2500 to 3000 parallel Verb Phrases for English to Tamil and Hindi to Tamil. Sample of the data will be put here soon.


Registration



Please register by sending email to sobha@au-kbc.org with subject line "Registration for VPT-IL 2018" with the following details
"Team Leader Name"
"Team Affiliation (Proper full Address of the Organization)"
"Team Contact Person name" and "Email ID"
"Languages for which participating"

Submission Format


To be announced.

Evaluation Criteria



    We plan to follow the established evaluation schemes for MT, BLEU and METEOR.

Task Coordinators - Organizing Committee


Computational Linguistics Research Group (CLRG),
AU-KBC Research Centre



Vijay Sundar Ram, AU-KBC Research Centre, Chennai, India.
Sobha Lalitha Devi, AU-KBC Research Centre, Chennai, India.