datumaro.experimental.legacy#

Legacy dataset conversion functionality.

This module provides functionality to convert legacy Datumaro datasets to the new experimental dataset format with automatic schema inference and type conversion.

Functions

analyze_experimental_dataset(...)

Analyze experimental dataset schema to determine legacy format.

analyze_legacy_dataset(legacy_dataset)

Analyze legacy dataset and generate schema using registered converters.

convert_from_legacy(legacy_dataset)

Convert legacy dataset to experimental format with automatic schema inference.

convert_to_legacy(experimental_dataset)

Convert experimental dataset to legacy format.

get_forward_annotation_converter(...)

Get forward converter for an annotation type from the dataset.

get_forward_media_converter(dataset)

Get forward converter for a dataset by trying registered converters.

register_backward_annotation_converter(...)

Register a backward converter class for an annotation type.

register_backward_media_converter(...)

Register a backward converter class for a media type.

register_builtin_backward_converters()

Register built-in backward converters.

register_builtin_forward_converters()

Register built-in forward converters for common types.

register_forward_annotation_converter(...)

Register a forward converter class for annotation types it supports.

register_forward_media_converter(converter_class)

Register a forward converter class for media types it supports.

Classes

AnalysisResult(schema, media_converter, ...)

Result of legacy dataset analysis.

BackwardAnalysisResult(media_type, ...)

Result of experimental dataset analysis for backward conversion.

BackwardAnnotationConverter()

Base class for backward annotation type converters.

BackwardBboxAnnotationConverter(bboxes_attr, ...)

Backward converter for Bbox annotations.

BackwardImageMediaConverter(image_path_attr)

Backward converter for Image media type.

BackwardMediaConverter()

Base class for backward media type converters.

BackwardPolygonAnnotationConverter(...)

Backward converter for Polygon annotations.

BackwardRotatedBboxAnnotationConverter(...)

Backward converter for RotatedBbox annotations.

ForwardAnnotationConverter()

Base class for forward annotation type converters.

ForwardBboxAnnotationConverter(...)

Forward converter for Bbox annotations.

ForwardImageMediaConverter(media_mixin, ...)

Forward converter for Image media type supporting both file paths and byte data.

ForwardKeypointAnnotationConverter(...)

Forward converter for Points (keypoints) annotations.

ForwardLabelAnnotationConverter(label_attribute)

Forward converter for Label (single label classification) annotations.

ForwardMaskAnnotationConverter(...)

Forward converter for mask annotations handling both semantic and instance segmentation.

ForwardMediaConverter()

Base class for forward media type converters.

ForwardPolygonAnnotationConverter(...)

Forward converter for Polygon annotations.

ForwardRotatedBboxAnnotationConverter(...[, ...])

Forward converter for RotatedBbox annotations.

class datumaro.experimental.legacy.ForwardMediaConverter[source]#

Bases: ABC

Base class for forward media type converters.

abstract classmethod get_supported_media_types() list[Type[MediaElement[Any]]][source]#

Return list of media types this converter can handle.

abstract classmethod create(dataset: Dataset) ForwardMediaConverter | None[source]#

Create converter instance if dataset is supported, None otherwise.

abstract get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this media type.

abstract convert_item_media(item: DatasetItem) dict[str, Any][source]#

Convert media from a DatasetItem to experimental format.

class datumaro.experimental.legacy.ForwardAnnotationConverter[source]#

Bases: ABC

Base class for forward annotation type converters.

abstract classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

abstract classmethod create(dataset: Dataset) ForwardAnnotationConverter | None[source]#

Create converter instance if dataset supports this annotation type.

abstract get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

abstract convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

datumaro.experimental.legacy.register_forward_media_converter(converter_class: Type[ForwardMediaConverter]) None[source]#

Register a forward converter class for media types it supports.

datumaro.experimental.legacy.register_forward_annotation_converter(converter_class: Type[ForwardAnnotationConverter]) None[source]#

Register a forward converter class for annotation types it supports.

datumaro.experimental.legacy.get_forward_media_converter(dataset: Dataset) ForwardMediaConverter | None[source]#

Get forward converter for a dataset by trying registered converters.

datumaro.experimental.legacy.get_forward_annotation_converter(annotation_type: AnnotationType, dataset: Dataset) ForwardAnnotationConverter | None[source]#

Get forward converter for an annotation type from the dataset.

Parameters:
  • annotation_type – The type of annotation to get a converter for

  • dataset – The legacy dataset to create a converter from

Returns:

A forward converter instance if one can handle the annotation type, None otherwise

class datumaro.experimental.legacy.ForwardImageMediaConverter(media_mixin: type, has_image_info: bool, has_callable_data: bool = False)[source]#

Bases: ForwardMediaConverter

Forward converter for Image media type supporting both file paths and byte data.

Initialize converter with format preference and image info availability.

classmethod get_supported_media_types() list[Type[MediaElement[Any]]][source]#

Return list of media types this converter can handle.

classmethod create(dataset: Dataset) ForwardImageMediaConverter | None[source]#

Create converter instance, detecting whether to use paths or bytes.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this media type.

convert_item_media(item: DatasetItem) dict[str, Any][source]#

Convert media from a DatasetItem to experimental format.

class datumaro.experimental.legacy.ForwardBboxAnnotationConverter(bbox_attribute: AttributeInfo, bbox_labels_attribute: AttributeInfo | None)[source]#

Bases: ForwardAnnotationConverter

Forward converter for Bbox annotations.

Initialize with bbox attributes and label attribute name.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

classmethod create(dataset: Dataset) ForwardBboxAnnotationConverter | None[source]#

Create converter instance for bbox annotations.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

class datumaro.experimental.legacy.ForwardRotatedBboxAnnotationConverter(rotated_bbox_attribute: AttributeInfo, rotated_bbox_labels_attribute: AttributeInfo | None = None)[source]#

Bases: ForwardAnnotationConverter

Forward converter for RotatedBbox annotations.

Initialize converter with rotated bbox attributes.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

classmethod create(dataset: Dataset) ForwardRotatedBboxAnnotationConverter | None[source]#

Create converter instance from dataset.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

class datumaro.experimental.legacy.ForwardPolygonAnnotationConverter(polygon_attribute: AttributeInfo, polygon_labels_attribute: AttributeInfo | None)[source]#

Bases: ForwardAnnotationConverter

Forward converter for Polygon annotations.

Initialize with polygon attributes and label attribute.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

classmethod create(dataset: Dataset) ForwardPolygonAnnotationConverter | None[source]#

Create converter instance for polygon annotations.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

class datumaro.experimental.legacy.ForwardLabelAnnotationConverter(label_attribute: AttributeInfo)[source]#

Bases: ForwardAnnotationConverter

Forward converter for Label (single label classification) annotations.

Initialize with label attribute.

classmethod create(dataset: Dataset) ForwardLabelAnnotationConverter | None[source]#

Create converter instance for label annotations.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

class datumaro.experimental.legacy.ForwardKeypointAnnotationConverter(keypoints_attribute: AttributeInfo, keypoints_labels_attribute: AttributeInfo | None)[source]#

Bases: ForwardAnnotationConverter

Forward converter for Points (keypoints) annotations.

Initialize with keypoints attributes and label attribute name.

classmethod create(dataset: Dataset) ForwardKeypointAnnotationConverter | None[source]#

Create converter instance for keypoints annotations.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes for this annotation type.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert annotations of this type to experimental format.

datumaro.experimental.legacy.register_builtin_forward_converters()[source]#

Register built-in forward converters for common types.

class datumaro.experimental.legacy.AnalysisResult(schema: Schema, media_converter: ForwardMediaConverter | None, ann_converters: dict[AnnotationType, ForwardAnnotationConverter])[source]#

Bases: object

Result of legacy dataset analysis.

schema: Schema#
media_converter: ForwardMediaConverter | None#
ann_converters: dict[AnnotationType, ForwardAnnotationConverter]#
datumaro.experimental.legacy.analyze_legacy_dataset(legacy_dataset: Dataset) AnalysisResult[source]#

Analyze legacy dataset and generate schema using registered converters.

Parameters:

legacy_dataset – The legacy Datumaro dataset to analyze

Returns:

AnalysisResult containing the inferred schema and converters

datumaro.experimental.legacy.convert_from_legacy(legacy_dataset: Dataset) Dataset[Sample][source]#

Convert legacy dataset to experimental format with automatic schema inference.

Parameters:

legacy_dataset – The legacy Datumaro dataset to convert

Returns:

A new experimental Dataset with inferred schema and converted data

Example

>>> legacy_ds = Dataset.import_from("path/to/coco", "coco")
>>> experimental_ds = convert_from_legacy(legacy_ds)
>>> sample = experimental_ds[0]
>>> print(sample.image_path)
>>> print(sample.bboxes.shape)
class datumaro.experimental.legacy.BackwardMediaConverter[source]#

Bases: ABC

Base class for backward media type converters.

abstract classmethod create_from_schema(schema: Schema) BackwardMediaConverter | None[source]#

Create converter instance if schema is supported, None otherwise.

abstract get_media_type() Type[MediaElement[Any]][source]#

Get the legacy media type this converter produces.

abstract convert_to_legacy_media(sample: Sample) MediaElement[Any][source]#

Convert experimental sample media to legacy MediaElement.

class datumaro.experimental.legacy.BackwardAnnotationConverter[source]#

Bases: ABC

Base class for backward annotation type converters.

abstract classmethod create_from_schema(schema: Schema) BackwardAnnotationConverter | None[source]#

Create converter instance if schema is supported, None otherwise.

abstract get_annotation_type() AnnotationType[source]#

Get the legacy annotation type this converter produces.

abstract infer_categories(experimental_dataset: Dataset[Sample]) Dict[AnnotationType, Categories][source]#

Infer legacy categories from experimental dataset.

abstract convert_to_legacy_annotations(sample: Sample, categories: Dict[AnnotationType, Categories]) list[Annotation][source]#

Convert experimental sample annotations to legacy format.

datumaro.experimental.legacy.register_backward_media_converter(converter_class: Type[BackwardMediaConverter]) None[source]#

Register a backward converter class for a media type.

datumaro.experimental.legacy.register_backward_annotation_converter(converter_class: Type[BackwardAnnotationConverter]) None[source]#

Register a backward converter class for an annotation type.

class datumaro.experimental.legacy.ForwardMaskAnnotationConverter(mask_attribute: AttributeInfo, instance_mask_attribute: AttributeInfo, mask_labels_attribute: AttributeInfo | None, is_semantic: bool)[source]#

Bases: ForwardAnnotationConverter

Forward converter for mask annotations handling both semantic and instance segmentation.

For semantic segmentation: - Creates a single uint8 mask where pixel values = class labels

For instance segmentation: - Creates N binary masks (N = number of instances) - Each mask represents a single instance - Labels array stores class label for each instance

Initialize with mask attributes and optional label attribute.

classmethod create(dataset: Dataset) ForwardMaskAnnotationConverter | None[source]#

Create converter instance for mask annotations.

Determines if the dataset uses semantic or instance segmentation by checking if mask indices match their labels across all mask annotations.

classmethod get_supported_annotation_types() list[AnnotationType][source]#

Return list of annotation types this converter can handle.

get_schema_attributes() dict[str, AttributeInfo][source]#

Return schema attributes.

convert_annotations(annotations: list[Annotation], item: DatasetItem) dict[str, Any][source]#

Convert legacy mask annotations to either semantic or instance segmentation format.

class datumaro.experimental.legacy.BackwardImageMediaConverter(image_path_attr: str)[source]#

Bases: BackwardMediaConverter

Backward converter for Image media type.

Initialize with the name of the image path attribute.

classmethod create_from_schema(schema: Schema) BackwardImageMediaConverter | None[source]#

Create converter instance if schema contains image_path field.

get_media_type() Type[MediaElement[Any]][source]#

Get the legacy media type this converter produces.

convert_to_legacy_media(sample: Sample) MediaElement[Any][source]#

Convert image_path back to Image MediaElement.

class datumaro.experimental.legacy.BackwardBboxAnnotationConverter(bboxes_attr: str, bbox_labels_attr: str)[source]#

Bases: BackwardAnnotationConverter

Backward converter for Bbox annotations.

Initialize with the names of the bbox-related attributes.

classmethod create_from_schema(schema: Schema) BackwardBboxAnnotationConverter | None[source]#

Create converter instance if schema contains bbox-related fields.

get_annotation_type() AnnotationType[source]#

Get the legacy annotation type this converter produces.

convert_to_legacy_annotations(sample: Sample, categories: Dict[AnnotationType, Categories]) list[Annotation][source]#

Convert bboxes and bbox_labels back to legacy Bbox annotations.

infer_categories(experimental_dataset: Dataset[Sample]) Dict[AnnotationType, Categories][source]#

Infer label categories from bbox_labels.

class datumaro.experimental.legacy.BackwardRotatedBboxAnnotationConverter(rotated_bboxes_attr: str, rotated_bbox_labels_attr: str | None)[source]#

Bases: BackwardAnnotationConverter

Backward converter for RotatedBbox annotations.

Initialize with the names of the rotated bbox-related attributes.

classmethod create_from_schema(schema: Schema) BackwardRotatedBboxAnnotationConverter | None[source]#

Create converter if schema contains rotated bbox fields.

get_annotation_type() AnnotationType[source]#

Get the legacy annotation type this converter produces.

convert_to_legacy_annotations(sample: Sample, categories: Dict[AnnotationType, Categories]) list[Annotation][source]#

Convert experimental rotated bbox data to legacy RotatedBbox annotations.

infer_categories(experimental_dataset: Dataset[Sample]) Dict[AnnotationType, Categories][source]#

Infer label categories from rotated_bbox_labels.

class datumaro.experimental.legacy.BackwardPolygonAnnotationConverter(polygons_attr: str, polygon_labels_attr: str | None)[source]#

Bases: BackwardAnnotationConverter

Backward converter for Polygon annotations.

Initialize with the names of the polygon-related attributes.

classmethod create_from_schema(schema: Schema) BackwardPolygonAnnotationConverter | None[source]#

Create converter instance if schema contains polygon-related fields.

get_annotation_type() AnnotationType[source]#

Get the legacy annotation type this converter produces.

convert_to_legacy_annotations(sample: Sample, categories: Dict[AnnotationType, Categories]) list[Annotation][source]#

Convert polygons and polygon_labels back to legacy Polygon annotations.

infer_categories(experimental_dataset: Dataset[Sample]) Dict[AnnotationType, Categories][source]#

Infer label categories from polygon_labels.

class datumaro.experimental.legacy.BackwardAnalysisResult(media_type: Type[MediaElement[Any]] | None, ann_types: set[AnnotationType], categories: Dict[AnnotationType, Categories], media_converter: BackwardMediaConverter | None, ann_converters: dict[AnnotationType, BackwardAnnotationConverter])[source]#

Bases: object

Result of experimental dataset analysis for backward conversion.

media_type: Type[MediaElement[Any]] | None#
ann_types: set[AnnotationType]#
categories: Dict[AnnotationType, Categories]#
media_converter: BackwardMediaConverter | None#
ann_converters: dict[AnnotationType, BackwardAnnotationConverter]#
datumaro.experimental.legacy.analyze_experimental_dataset(experimental_dataset: Dataset[Sample]) BackwardAnalysisResult[source]#

Analyze experimental dataset schema to determine legacy format.

Parameters:

experimental_dataset – The experimental dataset to analyze

Returns:

BackwardAnalysisResult containing legacy format information

datumaro.experimental.legacy.convert_to_legacy(experimental_dataset: Dataset[Sample]) Dataset[source]#

Convert experimental dataset to legacy format.

Parameters:

experimental_dataset – The experimental Dataset to convert

Returns:

A new legacy Datumaro Dataset with converted data

Example

>>> experimental_ds = Dataset(MySchema)
>>> # ... add samples to experimental_ds
>>> legacy_ds = convert_to_legacy(experimental_ds)
>>> legacy_ds.export("output", "coco")
datumaro.experimental.legacy.register_builtin_backward_converters()[source]#

Register built-in backward converters.