Arabic Handwriting Recognition Competition

 

 

Recognition of Arabic handwritten words on the basis of the IFN/ENIT-database

The ICDAR 2007 Arabic Handwriting Recognition Competition aims to bring together researchers working on Arabic handwriting recognition.

Since 2002 the freely available IFN/ENIT-database is used by more than 30 groups all over the world to develop Arabic handwriting recognition systems. This competition is the second in a series of competitions to establish the state of the art of recognizing Arabic handwritten words. Upon making the first competition at ICDAR 2005 this second competition at ICDAR 2007 gives the opportunity to further develop methods and discuss results.

A comparison and discussion of different algorithms and realizations should give a push in the field of Arabic handwritten word recognition.

   Evaluation Process

The object is to run each Arabic handwritten word recognizer (trained on a part of version 2.0 of the IFN/ENIT-database, available November 15th 2006) on an already published part of the IFN/ENIT-database and on a new sample not yet published. The recognition results on word level of each system are compared on the basis of correct recognised words / respective there dedicated ZIP(Post)-Code(*). A dictionary can be used and should include all 937 different Tunisian town/village names(*).

(*) Details are given in the database description (www.ifnenit.com).

A recognizer may return up to 10 candidates for each classification that not only the first ranked result can be used for comparison but also the correct result between the 5 or 10 candidates will be used for comparison.

In a first comparison we use no reject. In the case that a reject is implemented in the recognizer, the recognition rate in respect to the reject rate will be compared in a second run.

   Running a Recognizer

We run your recognizer (called myrec) by invoking it from the command line as follows:

Your recognizer will be invoked as follows:

 myrec dataset.txt output.txt

The plain text file formats are as follows:

    dataset.txt

The dataset is now just a list of relative paths to each binary *.tif or *.bmp image to be recognized. For example:

word/1/1.tif

word/1/2.tif

  ...

    output.txt

The output file should have one line for each input image. Each line should show the name of the image file that was recognized, followed by the responses (ZIP-Code of the Tunisian town/village name) for that image.

Each response is given as a pair of values: the text, followed by the confidence. In the following example the first line shows that for image word/1/1.tif the recognizer has produced three word hypotheses: town/village with ZIP-Code 1000, 2000 and 3000, with confidences of 1.0, 0.8 and 0.4 respectively.

word/1/1.tif 1000 1.0 2000 0.8 3000 0.4

word/1/2.tif

Note that the recognizer failed to produce an output for word/1/2.tif - so we get the file name, but nothing else on that line.

   Uploading your Recognizer and Registration

To upload your recognizer package all the files needed to run it in a zip file.

Then register and upload the zip file here.

Note that you may register for and enter the competition even if you do not plan to attend ICDAR 2007.

Once you've registered, we can then inform you of any updates.

Implementation must be Windows or Linux (static linked libs) executables, or Java .jar archives.

   Important Dates

·         Submission of system: March 01 2007.

·         Results Announced at ICDAR 2007.

   Any Problems

Please contact Volker Märgner if you have any problems concerning the procedure.

   Organiser

          Volker Märgner, Haikal El Abed
Technical University Braunschweig, Germany
August 01 2006