24 février 2017

ICDAR2017 Competition on Post-OCR Text Correction

more information on http://l3i.univ-larochelle.fr/ICDAR2017PostOCR

*** GOAL ***

se OCR-ed texts, based on an a original corpus of OCRed documents composed of 12M characters (2.1M tokens) aligned with their corresponding Gold Standard, with an equal share of English- and French-written materials. You can participate in English, French, or both. Given the noisy OCR of printed text, the participants are invited to participate in the two following independent subtasks : 1) DETECTION and 2) CORRECTION. The participant are encouraged to compute separate scores for the English- and the French-written parts of the collection, allowing for the evaluation of language-specific approaches.

- TASK 1)
Detection of OCR errors: given the raw OCR-ed text, the participants are asked to provide the position and the length of the suspected errors.

- TASK 2) Correction of OCR errors: given the OCR errors in their context, the participants are asked to provide, for each error, either a) only their one correction or b) a ranked list of candidates for correction. Providing multiple candidates enables the indirect evaluation of semi-automated techniques, with a human in the loop.

• Training dataset available online : End-Feb. 2017
• Registration deadline : 30th March 2017
• Result submission : 15th June 2017
• ICDAR2017 Conference : 10th November 2017

A competition report including both the description of the methodology and a comparative analysis of participants' performances will be submitted for publication at the ICDAR2017 conference.

• Guillaume CHIRON (BnF/L3i) : guillaume.chiron(at)univ-lr.fr
• Antoine DOUCET (L3i) : antoine.doucet(at)univ-lr.fr
• Jean-Philippe MOREUX (BnF) : jean-philippe.moreux(at)bnf.fr

*** WEBSITE ***


