CIFAR#
Format specification#
CIFAR format specification is available here.
Supported annotation types:
Label
Datumaro supports Python version CIFAR-10/100.
The difference between CIFAR-10 and CIFAR-100 is how labels are stored
in the meta files (batches.meta
or meta
) and in the annotation files.
The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image
comes with a “fine” label (the class to which it belongs) and a “coarse” label
(the superclass to which it belongs). In CIFAR-10 there are no superclasses.
CIFAR formats contain 32 x 32 images. As an extension, Datumaro supports reading and writing of arbitrary-sized images.
Convert CIFAR dataset#
The CIFAR dataset is available for free download:
cifar-10-python.tar.gz: CIFAR-10 python version
cifar-100-python.tar.gz: CIFAR-100 python version
A CIFAR dataset can be converted in the following way:
datum convert --input-format cifar --input-path <path/to/dataset> \
--output-format <desired_format> --output-dir <output/dir>
CIFAR-10 dataset directory should have the following structure:
└─ Dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── batches.meta
├── <subset_name1>
├── <subset_name2>
└── ...
CIFAR-100 dataset directory should have the following structure:
└─ Dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── meta
├── <subset_name1>
├── <subset_name2>
└── ...
Dataset files use the Pickle data format.
Meta files:
CIFAR-10:
num_cases_per_batch: 1000
label_names: list of strings (['airplane', 'automobile', 'bird', ...])
num_vis: 3072
CIFAR-100:
fine_label_names: list of strings (['apple', 'aquarium_fish', ...])
coarse_label_names: list of strings (['aquatic_mammals', 'fish', ...])
Annotation files:
Common:
'batch_label': 'training batch 1 of <N>'
'data': numpy.ndarray of uint8, layout N x C x H x W
'filenames': list of strings
If images have non-default size (32x32) (Datumaro extension):
'image_sizes': list of (H, W) tuples
CIFAR-10:
'labels': list of strings
CIFAR-100:
'fine_labels': list of integers
'coarse_labels': list of integers
To add custom classes, you can use dataset_meta.json
.
Export to other formats#
Datumaro can convert a CIFAR dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports the classification task (e.g. MNIST, ImageNet, PascalVOC, etc.)
There are several ways to convert a CIFAR dataset to other dataset formats using CLI:
datum convert --input-format cifar --input-path <path/to/dataset> \
--output-format imagenet --output-dir <output/dir> -- --save-media
Or, using Python API:
import datumaro as dm
dataset = dm.Dataset.import_from('<path/to/dataset>', 'cifar')
dataset.export('save_dir', 'imagenet', save_media=True)
Export to CIFAR#
There are several ways to convert a dataset to CIFAR format:
# converting to CIFAR format from other format
datum convert --input-format imagenet --input-path <path/to/dataset> \
--output-format cifar --output-dir <output/dir> -- --save-media
Extra options for exporting to CIFAR format:
--save-media
allow to export dataset with saving media files (by defaultFalse
)--image-ext <IMAGE_EXT>
allow to specify image extension for exporting the dataset (by default.png
)--save-dataset-meta
- allow to export dataset with saving dataset meta file (by defaultFalse
)
The format (CIFAR-10 or CIFAR-100) in which the dataset will be
exported depends on the presence of superclasses in the LabelCategories
.
Examples#
Datumaro supports filtering, transformation, merging etc. for all formats and for the CIFAR format in particular. Follow the user manual to get more information about these operations.
There are several examples of using Datumaro operations to solve particular problems with CIFAR dataset:
Example 1. How to create a custom CIFAR-like dataset#
import numpy as np
import datumaro as dm
dataset = dm.Dataset.from_iterable([
dm.DatasetItem(id=0, image=np.ones((32, 32, 3)),
annotations=[dm.Label(3)]
),
dm.DatasetItem(id=1, image=np.ones((32, 32, 3)),
annotations=[dm.Label(8)]
)
], categories=['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
dataset.export('./dataset', format='cifar')
Example 2. How to filter and convert a CIFAR dataset to ImageNet#
Convert a CIFAR dataset to ImageNet format, keep only images with the
dog
class present:
# Download CIFAR-10 dataset:
# https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
datum convert --input-format cifar --input-path <path/to/cifar> \
--output-format imagenet --output-dir <output/dir> \
--filter '/item[annotation/label="dog"]'
Examples of using this format from the code can be found in the format tests