Release Notes#

Release 1.12.0#

This release streamlines Datumaro by removing a number of lesser-used features, helping to simplify the tool and reduce its dependencies. These changes are part of an effort to keep Datumaro focused on its core strengths: dataset management and integration with machine learning frameworks. As part of this update, inference-related features have been removed. For inference tasks, we recommend using the [OpenVINO model API](open-edge-platform/model_api). If you rely on a specific feature that is no longer available, you can still access it from [the previous version of Datumaro](open-edge-platform/datumaro).

Removed features#

CLI commmands: - explain, explore, generate, prune - model: add, remove, run, info - project: add, create, export, import, remove, checkout, commit, log, info, status - source: import, add, remove
API features: - Model inference - Model-based transformations - Crypter - Synthetic dataset generation - Data exploration - BBox to mask using SAM - Telemetry - Anchor generation - Missing annotation detection - Model inference explanation - Near-duplicate removal - Pruning - Pseudo-labels - Project - Noisy label detection - Data shift analysis
SAM Docker image

New features#

Experimental dataset class

Enhancements#

Mark several dependencies as optional
Removal of unneeded dependencies
Documentation tidy-up
DCO introduction; readme, PR template, and contribution guide tidy-up
Fix code coverage upload to Codecov in the CI
Fix crashes with certain datasets in the compare command
Added Semgrep security scan in the CI

v1.11.1 (2024 Q3)#

Enhancements#

Bump version of NumPy and OpenVINO
Support for Python 3.13 and MacOS

v1.11.0 (2024 Q3)#

This release includes a significant number of deprecations in the CLI and API. This is a one-off action to remove unused features as well as features such as inference which do not fit well in Datumaro. We intend to remove those features in Datumaro 1.12.0.

New features#

Convert Cuboid2D annotation to/from 3D data
Add label groups for hierarchical classification in ImageNet

Enhancements#

Add non-strict mode to JsonPageMapper in rust API and enable it for COCO
Enhance ‘id_from_image_name’ transform to ensure each identifier is unique
Optimize path assignment to handle point cloud in JSON without images
Add documentation for framework conversion

Bug fixes#

Fix assertion to compare hashkeys against expected value
Mark pyemd as optional since it does not support Python 3.12

Deprecations#

The following CLI commmands are deprecated: - explain, explore, generate, prune - model: add, remove, run, info - project: add, create, export, import, remove, checkout, commit, log, info, status - source: import, add, remove
The following APIs are deprecated: - Model inference - Model-based transformations - Crypter - Synthetic dataset generation - Data exploration - BBox to mask using SAM - Telemetry - Anchor generation - Missing annotation detection - Model inference explanation - Near-duplicate removal - Pruning - Pseudo-labels - Projects - SAM Docker image

v1.10.0 (2024 Q4)#

New features#

Support KITTI 3D format
Add PseudoLabeling transform for unlabeled dataset

Enhancements#

Raise an appropriate error when exporting a datumaro dataset if its subset name contains path separators.
Update docs for transform plugins
Update ov ir model for explorer openvino launcher with CLIP ViT-L/14@336px model
Optimize path assignment to handle point cloud in JSON without images
Set TabularTransform to process clean transform in parallel

Bug fixes#

Fix datumaro format to load visibility information from Points annotations

v1.9.1 (2024 Q3)#

Enhancements#

Support multiple labels for kaggle format
Use DataFrame.map instead of DataFrame.applymap

Bug fixes#

Fix StreamDataset merging when importing in eager mode

v1.9.0 (2024 Q3)#

New features#

Add a new CLI command: datum format
Support language dataset for DmTorchDataset

Enhancements#

Change _Shape to Shape and add comments for subclasses of Shape

Bug fixes#

Fix KITTI-3D importer and exporter

v1.8.0 (2024 Q3)#

New features#

Add TabularValidator
Add Clean Transform for tabular data type

Enhancements#

Set label name with parents to avoid duplicates for AstypeAnnotations
Pass Keyword Argument to TabularDataBase

Bug fixes#

Preserve end_frame information of a video when it is zero.
Changed the Datumaro format to ensure exported videos have relative paths and to prevent the same video from being overwritten.

v1.7.0 (2024 Q2)#

New features#

Add ann_types property for dataset
Add AnnotationType.rotated_bbox for oriented object detection
Add DOTA data format for oriented object detection task
Add AstypeAnnotations Transform

Enhancements#

Fix ambiguous COCO format detector
Get target information for tabular dataset
Add ExtractedMask and update importers who can use it to use it

v1.6.1 (2024.05)#

Enhancements#

Prevent AcLauncher for OpenVINO 2024.0

Bug fixes#

Modify lxml dependency constraint
Fix CLI error occurring when installed with default option only
Relax Pillow dependency constraint
Modify Numpy dependency constraint
Relax old pandas version constraint

v1.6.0 (2024.04)#

New features#

Changed supported Python version range (>=3.9, <=3.11)
Support MMDetection COCO format
Develop JsonSectionPageMapper in Rust API
Add Filtering via User-Provided Python Functions
Remove supporting MacOS platform
Support Kaggle image data (KaggleImageCsvBase, KaggleImageTxtBase, KaggleImageMaskBase, KaggleVocBase, KaggleYoloBase)
Add __getitem__() for random accessing with O(1) time complexity
Add Data-aware Anchor Generator
Support bounding box import within Kaggle extractors and add KaggleCocoBase

Enhancements#

Optimize Python import to make CLI entrypoint faster
Add ImageColorScale context manager
Enhance visualizer to toggle plot title visibility
Enhance Datumaro data format detect() to be memory-bounded and performant
Change RoIImage and MosaicImage to have np.uint8 dtype as default
Enable image backend and color channel format to be selectable
Boost up CityscapesBase and KaggleImageMaskBase by dropping np.unique
Enhance RISE algortihm for explainable AI
Enhance explore unit test to use real dataset from ImageNet
Fix each method of the comparator to be used separately

Bug fixes#

Fix wrong example of Datumaro dataset creation in document
Fix wrong command to install datumaro from github
Update document to correct wrong datum project import command and add filtering example to filter out items containing annotations.
Fix label compare of distance method
Fix Datumaro visualizer’s import errors after introducing lazy import
Fix broken link to supported formats in readme
Fix Kinetics data format to have media data
Handling undefined labels at the annotation statistics
Add unit test for item rename
Fix a bug in the previous behavior when importing nested datasets in the project
Fix Kaggle importer when adding duplicated labels
Fix input tensor shape in model interpreter for OpenVINO 2023.3
Add default value for target in prune cli
Remove deprecated MediaManager
Fix explore command without project

v1.5.2 (2024.01)#

Enhancements#

Add memory bounded datumaro data format detect
Remove Protobuf version limitation (<4)

v1.5.1 (2023.11)#

Enhancements#

Enhance Datumaro data format stream importer performance
Change image default dtype from float32 to uint8
Add comparison level-up doc
Add ImportError to catch GitPython import error

Bug fixes#

Modify the draw function in the visualizer not to raise an error for unsupported annotation types.
Correct explore path in the related document.
Fix errata in the voc document. Color values in the labelmap.txt should be separated by commas, not colons.
Fix hyperlink errors in the document.
Fix memory unbounded Arrow data format export/import.
Update CVAT format doc to bypass warning.

v1.5.0 (2023.09)#

New features#

Add tabular data import/export
Support video annotation import/export
Add multiframework (PyTorch, Tensorflow) converter
Add SAM OVMS and Triton server Docker image builders
Add SAMBboxToInstanceMask transform
Add ConfigurableValidator

Enhancements#

Enhance ClassificationValidator for multi-label classification datasets with label_groups
Replace Roboflow xml.etree with defusedxml
Define GroupType with IntEnum for, where 0 is EXCLUSIVE
Add Rust API to optimize COCOPageMapper performance
Support a dictionary input in addition to a single image input for the model launcher to support Segment Anything Model
Remove deprecates announced to be removed in 1.5.0
Add multi-threading option to ModelTransform and SAMBboxToInstanceMask

Bug fixes#

Fix bugs for Tile transform
Disable Roboflow Tfrecord format when Tensorflow is not installed
Raise VcsAlreadyExists error if vcs directory exists

v1.4.1 (2023.07)#

Bug fixes#

Report errors for COCO (stream) and Datumaro importers

v1.4.0 (2023.07)#

New features#

Add documentation and notebook example for Prune API
Changed supported Python version range (>=3.8, <=3.11)
Migrate OpenVINO v2023.0.0
Add Roboflow data format support (COCO JSON, Pascal VOC XML, YOLOv5-PyTorch, YOLOv7-PyTorch, YOLOv8, YOLOv5 Oriented Bounding Boxes, Multiclass CSV, TFRecord, CreateML JSON)
Add MissingAnnotationDetection transform
Add OVMSLauncher
Add Prune API
Add TritonLauncher
Migrate DVC v3.0.0
Stream dataset import/export
Support mask annotations for CVAT data format

Enhancements#

Support list query for explorer
update contributing.md
Update 3rd-party.txt for release 1.4.0
Give notice that the deprecation works will be done in datumaro==1.5.0
Unify COCO, Datumaro, VOC, YOLO importer/exporter progress reporter descriptions
Enhance import performance for built-in plugins
Change default dtype of load_image() to np.uint8
Add OTX ATSS detector model interpreter & refactor interfaces
Refactor Launcher and ModelInterpreter
Add CVAT data format document
Reduce peak memory usage when importing COCO and Datumaro formats
Enhance the error message for datum stats to be more user friendly
Refactor dataset.py to seperate DatasetStorage

Bug fixes#

Create cache dir under only writable filesystem
Fix: Dataset infos() can be broken if a transform not redefining infos() is stacked on the top
Fix warnings in test_visualizer.py
Fix LabelMe data format
Prevent installing protobuf>=4
Fix UnionMerge

v1.3.2 (2023.06)#

Enhancements#

Let CocoBase continue even if an InvalidAnnotationError is raised

Bug fixes#

Install dvc version to 2.x
Replace np.append() in Validator

v1.3.1 (2023.05)#

Bug fixes#

Fix Cityscapes format mis-detection problem

v1.3.0 (2023.05)#

New features#

Add CocoRoboflowImporter
Add SynthiaSfImporter and SynthiaAlImporter
Add intermediate skill document for filter
Add VocInstanceSegmentationImporter and VocInstanceSegmentationExporter
Add Segment Anything data format support
Add Correct transformation
Add ReindexAnnotations transform

Enhancements#

Use autosummary for fully-automatic Python module docs generation
Enrich stack trace for better user experience when importing
Save and load hashkey for explorer
Add MOT and MOTS data format documents
Improve RemoveAnnotations to remove specific annotations with ids
Add Jupyter notebook example of noisy label detection for detection tasks
Add Juypter notebook examples for importing/exporting detection and segmentation data

Bug fixes#

Fix Mapillary Vistas data format
Fix bytes property returning None if function is given to data
Fix Synthia-Rand data format
Fix person_layout categories and action_classification attributes in imported Pascal-VOC dataset
Drop a malformed transform from StackedTransform automatically
Fix Cityscapes to drop ImgsFine directory

v1.2.1 (2023.05)#

Bug fixes#

Fix project level CVAT for images format import
Fix an info message when using the convert CLI command with no args.input_format
Fix media contents not returning bytes in arrow format

v1.2.0 (2023.04)#

New features#

Add Skill Up section to documentation
Add LossDynamicsAnalyzer for noisy label detection
Add Apache Arrow format support
Add sort transform

Enhancements#

Add multiprocessing to DatumaroBinaryBase
Refactor merge code
Refactor download CLI commands
Refactor CLI commands w/ and w/o project
Refactor Media to be initialized from explicit sources
Refactor hl_ops.py
Add tfds:uc_merced and tfds:eurosat download
Migrate documentation framework to Sphinx
Update merge tutorial for real life usecase
Abbreviate “detect-format” to “detect” for prettifying

Bug fixes#

Add UserWarning if an invalid media_type comes to image statistics computation
Fix negated is_encrypted
Save extra images of PointCloud when exporting to datumaro format
Fix log issue when importing celeba and align celeba dataset

v1.1.0 (2023.03)#

New features#

Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter)
Add CommonSemanticSegmentationWithSubsetDirsImporter
Add DatumaroBinary format
Add Searcher CLI documentation
Add version to dataset exported as datumaro format
Add Ava action data format support
Add Shift Analyzer (both covariate and label shifts)
Add YOLO Loose format
Add Ultralytics YOLO format

Enhancements#

Refactor Datumaro format code and test code

Bug fixes#

Fix image filenames and anomaly mask appearance in MVTec exporter
Fix CIFAR10 and 100 detect function
Fix celeba and align_celeba detect function
Choose the top priority detect format for all directory depths
Fix MVTec format detect function
Fix wrong __len__() of Subset when the item is removed
Fix mask visualization bug

v1.0.0 (2023.02)#

New features#

Add Data Explorer
Add Ellipse annotation type
Add MVTec anomaly data support

Enhancements#

Refactor existing tests
Raise ImportError on importing malformed COCO directory
Remove the duplicated and cyclical category context in documentation

Bug fixes#

Fix for importing CVAT image 1.1 data format exported to project level
Fix a problem on setting log-level via CLI
Fix code format with the latest black==23.1.0
Fix ‘Explain command cannot find the model’
Fix a problem found on model remove CLI command

Note

About the release of the developed version can be read in the CHANGELOG.md of the develop branch.