ICDAR#

Format specification#

ICDAR is a dataset for text recognition task, it’s available for download here. There is exists two most popular version of this dataset: ICDAR13 and ICDAR15, Datumaro supports both of them.

Original dataset contains the following subformats:

  • ICDAR word recognition;

  • ICDAR text localization;

  • ICDAR text segmentation.

Supported types of annotations:

  • ICDAR word recognition

    • Caption

  • ICDAR text localization

    • Polygon, Bbox

  • ICDAR text segmentation

    • Mask

Supported attributes:

  • ICDAR text localization

    • text: transcription of text is inside a Polygon/Bbox.

  • ICDAR text segmentation

    • index: identifier of the annotation object, which is encoded in the mask and coincides with the line number in which the description of this object is written;

    • text: transcription of text is inside a Mask;

    • color: RGB values of the color corresponding text in the mask image (three numbers separated by space);

    • center: coordinates of the center of text (two numbers separated by space).

Convert ICDAR dataset#

There is few ways to convert ICDAR dataset with Datumaro:

  • With convert command

datum convert --input-format icdar_text_localization --input-path <text_localization_dataset> \
    --output-format <desired_format> --output-dir <output/dir>
datum convert --input-format icdar_text_segmentation --input-path <text_segmentation_dataset> \
    --output-format <desired_format> --output-dir <output/dir>
datum convert --input-format icdar_word_recognition --input-path <word_recognition_dataset> \
    --output-format <desired_format> --output-dir <output/dir>
  • With Python API

import datumaro as dm
data1 = dm.Dataset.import_from('text_localization_path', 'icdar_text_localization')
data2 = dm.Dataset.import_from('text_segmentation_path', 'icdar_text_segmentation')
data3 = dm.Dataset.import_from('word_recognition_path', 'icdar_word_recognition')

For icdar_word_recognition

<dataset_path>/
├── <subset_name_1>
│   ├── gt.txt
│   └── images
│       ├── word_1.png
│       ├── word_2.png
│       ├── ...
├── <subset_name_2>
├── ...

For icdar_text_localization

<dataset_path>/
├── <subset_name_1>
│   ├── gt_img_1.txt
│   ├── gt_img_2.txt
│   ├── ...
│   └── images
│       ├── img_1.png
│       ├── img_2.png
│       ├── ...
├── <subset_name_2>
│   ├── ...
├── ...

For icdar_text_segmentation

<dataset_path>/
├── <subset_name_1>
│   ├── image_1_GT.bmp # mask for image_1
│   ├── image_1_GT.txt # description of mask objects on the image_1
│   ├── image_2_GT.bmp
│   ├── image_2_GT.txt
│   ├── ...
│   └── images
│       ├── image_1.png
│       ├── image_2.png
│       ├── ...
├── <subset_name_2>
│   ├── ...
├── ...

Export to other formats#

Datumaro can convert ICDAR dataset into any other format Datumaro supports. Examples:

# converting ICDAR text segmentation dataset into the VOC with `convert` command
datum convert -if icdar_text_segmentation -i source_dataset \
    -f voc -o export_dir -- --save-media

Note: some formats have extra export options. For particular format see the docs to get information about it.

With Datumaro you can also convert your dataset to one of the ICDAR formats, but to get expected result, the source dataset should contain required attributes, described in previous section.

Note: in case with icdar_text_segmentation format, if your dataset contains masks without attribute color then it will be generated automatically.

Available extra export options for ICDAR dataset formats:

  • --save-media allow to export dataset with saving media files (by default False)

  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)