Level 1: Dataset download#
Datumaro supports downloading public datasets from multiple sources: TensorFlow Datasets and Kaggle Datasets
Prepare installation#
To use Datumaro download
feature, you should install Datumaro with [tf,tfds]
extras for TensorFlow Datasets or [kaggle]
for Kaggle Datasets:
pip install datumaro[tf,tfds]
pip install datumaro[kaggle]
Which datasets are available?#
You can browse the list of available TensorFlow Datasets here or using the command below. For Kaggle Datasets, you can check here.
You can see the list of available DATASET_ID
using the following command.
datum download tfds describe [--report-format {text,json}] [--report-file REPORT_FILE]
How can we download datasets?#
You can actually download the dataset using the following command.
You have to input -i DATASET_ID
according to the id of dataset you want to download.
Additionally, you can specify the output format (-f OUTPUT_FORMAT
) and path (-o DST_DIR
).
datum download tfds get [-h] -i DATASET_ID [-f OUTPUT_FORMAT] [-o DST_DIR] [--overwrite] [-s SUBSET] ...
Note
By default, download
does not export the media files (e.g. images).
We recommand you to run this command with --save-media
option to export the media files as well,
for example, datum download tfds get -i tfds:mnist -- --save-media
.
You can actually download the dataset using the following command.
You have to input -i DATASET_ID
according to the id of dataset you want to download.
Additionally, you can specify the output format (-f OUTPUT_FORMAT
) and path (-o DST_DIR
).
datum download kaggle get [-h] -i DATASET_ID [-f OUTPUT_FORMAT] [-o DST_DIR] [--overwrite] [-s SUBSET] ...
Note
By default, download
does not export the media files (e.g. images).
We recommand you to run this command with --save-media
option to export the media files as well,
for example, datum download kaggle get -i tfds:mnist -- --save-media
.
In the next level, we will look into how to import and export the dataset using Datumaro!