Machine Translation - Covid-19 MLIA @ Eval

Task Description

The goal of the Machine Translation (MT) task is to evaluate systems focused on the Covid-19 related text. The Covid-19 MT task addresses the following language pairs:

English-German.
English-French.
English-Spanish.
English-Italian.
English-Modern Greek.
English-Swedish.
English-Arabic. (New for round 2.)

All languages pairs only in the direction translating from English to the other language. The main challenge is that the text to be translated is specialized on the new and high-relevant topic of Covid-19. The task is open for beginners and established research groups from any area of interest in the scientific community, the public administration and the industry. At the end of each round, participants will write/update an incremental report explaining their system. The report will highlight which methods data have been used.

To participate in the Machine Translation task, groups need to register at the following link:

Register

Important Dates - Round 2:

Round starts: June 21, 2021.

Release of training data: June 21, 2021.

Release of test data: ~~October 8, 2021.~~ -> October 11, 2021.

Translations submission deadline: ~~October 15, 2021.~~ -> October 18, 2021.

Translations scored: October 22, 2021.

Rolling report submission deadline (camera ready): November 19, 2021.

Slot for a virtual meeting to discuss the results: February 17, 2022.

Round ends: December 2, 2021.

Important Dates - Round 1:

Round starts: October 23, 2020.

Release of training data: October 23, 2020.

Release of test data: November 20, 2020.

Translations submission deadline: ~~November 27, 2020~~ -> extended to December 2, 2020.

Translations scored: December 4, 2020.

Rolling report submission deadline (preliminary version): December 23, 2020.

Rolling report submission deadline (camera ready): January 8, 2021.

Slot for a virtual meeting to discuss the results: January 12-14, 2021.

Round ends: January 15, 2021.

Participation Guidelines

Organizers will provide training data for all language pairs. Participants must submit at least a system trained only with the provided data (constrained) for each of the language pairs they would like to participate. This includes constrained data from previous rounds. Additionally, participants can use additional training data (not provided by the organisers) or existing translation systems specifying a flag that the system uses additional data (unconstrained). System submissions that used the provided training data (constrained) will be distinguished from submissions that used additional data resources (unconstrained). Note that basic linguistic tools such as taggers, parsers, or morphological analyzers or multilingual systems are allowed in the constrained condition.

Participants will use their systems to translate a test set of unseen sentences in the source language. The translation quality is measured by various automatic evaluation metrics (BLEU will be the main evaluation metric for the 1st round). You may participate in any or all of the language pairs. Organizers will provide a framework to show the results to be compared.

Participant Repository:

Participants are provided with a single repository for all the tasks they take part in. The repository contains the runs, resources, code, and report of each participant.

The repository is organised as follows:

submission: this folder contains the runs submitted for the different tasks in the different evaluation rounds.
score: this folder contains the performance scores of the submitted runs.
code: this folder contains the source code of the developed system.
resource: this folder contains (language) resources created during the participation.
report: this folder contains the rolling technical report describing the techniques applied and insights gained during participation, round after round.

Covid-19 MLIA Eval consists of three tasks run in three rounds. Therefore, the submission and score folders are organized into sub-folders for each task and round as follows:

submission/task1/round1: for the runs submitted to the first round of the first task. Similar structure for the other tasks and rounds.
score/task1/round1: for the performance scores of the runs submitted to the first round of the first task. Similar structure for the other tasks and rounds.

Participants which do not take part in a given task or round can simply delete the corresponding sub-folders.

The goal of Covid-19 MLIA Eval is to speed up the creation of multilingual information access systems and (language) resources for Covid-19 as well as openly share these systems and resources as much as possible. Therefore, participants are more than encouraged to share their code and any additional (language) resources they have used or created.

All the contents of these repositories are released under the Creative Commons Attribution-ShareAlike 4.0 International License.

Rolling Technical Report:

The rolling technical report should be formatted according to the Springer LNCS format, using either the LaTeX template or the Word template. LaTeX is the preferred format.

Corpora:

Round 1:
- Training.
- Validation.
- Test.
For each tmx file for validation, we also provide a .filt file. This is a tsv file containing a TU per line. These .filt files hold per line:
- The score for a TU
- The EN part of a TU, the language X part of a TU, separated by tab
Moreover, for each language pair, we have merged the .filt files, and sorted the TUs according to their score (higher score implies better quality). These are the .sort files.

Round 2:
- Training.
- Validation.
- Test.
The training dataset consists of:
- Web-acquired data from multi-lingual websites of international health organizations and national health agencies.
- Web-acquired data from broadcast websites based on the MEDISYS metadata collection (comparable corpora).
It is likely that training data include:
- Cases in which an en sentence has been paired with more than one sentences of the target language, and vice versa.
- Comparable sentence pairs (with at least one major divergence).
All data is available in tmx format (a tool for converting then to plain text is available at the utilities section). Training is split into several 100000 segments files for an easier management.

Utilities:

Translation Memory eXchange reader: script for extracting source and target sentences from a tmx file.
SGM reader: script for extracting segments from a sgm file.
wrap-xml.perl: script for converting a translation hypotheses into the required SGML format (see submission guidelines).

Automatic Evaluation:

A ranking with the preliminary results of the automatic evaluation is available at this website. This ranking will be updated periodically until the translations submission deadline has passed. Final results have been published in the findings rolling report.

Submission Guidelines

Participating teams should satisfy the following guidelines:

Translated submissions should be recased, detokenized and submitted in SGML format: The script wrap-xml.perl makes the conversion of an output file in one-segment-per-line format into the required SGML file very easy:

Format: wrap-xml.perl LANGUAGE SRC_SGML_FILE SYSTEM_NAME < IN > OUT
Example: wrap-xml.perl en test.de.sgm Google < decoder-output > decoder-output.sgm

Each group can submit a maximum of five systems per language pair, one submission per language should be trained with constrained data;

Submission Upload:

Runs should be uploaded in the repository provided by the organizers. Following the repository structure discussed above, for example, a run submitted for the first round of the Machine Translation task should be included in submission/task3/round1.

Runs should be uploaded with the following name convention: <teamname>_task3_<round>_<languagedirection>_<constrainedfield>_<descriptionfield>.sgm where:

teamname is the name of the participating team;
task3 is the identifier of the Machine Translation task:
round is the round of Covid-19 MLIA @ Eval the run is submitted to. It could be round1, round2, or round3;
languagedirectionis the identifier of the source and target languages using ISO 639-1 codes . For this round, we always have English (en) as source and possible targets are German (de), French (fr), Spanish (es), Italian (it), Modern Greek (el), and Swedish (sv);
- For example: en2de stands for English to German translation;
constrainedfield is the flag to specify if the system is constrained (constrained) or unconstrained (uncontrained);
descriptionfield is a description field that participants can use to describe each submission.
.sgm indicates the file extension.

For example, a complete run identifier may look like pangeanic_task3_round1_en2de_constrained_bt.sgm where:

pangeanic is the name team;
task3 is fixed;
round1 indicates that the run has been submitted to the first round;
en2de indicates translation from English to German;
bt suggests that participants have used backtranslation (bt);
.sgm is fixed.

Performance scores for the submitted runs will be returned by the organizers in the score folder, which follows the same structure as the submission folder.

The rolling technical report has to be uploaded and kept update in the report folder.

Here, you can find a sample participant repository to get a better idea of its layout.

Evaluation:

The quality of the submitted systems will be evaluated using the following metrics:

BLEU.
TER.
BEER.

Organizers

Francisco Casacuberta, Universitat Politècnica de València, Spain
fcnprhlt.upv.es

Miguel Domingo, Universitat Politècnica de València, Spain
midobalprhlt.upv.es

Mercedes García-Martínez, Pangeanic, Spain
m.garciapangeanic.com

Manuel Herranz, Pangeanic, Spain
m.herranzpangeanic.es