datumaro.experimental.converter_registry#

Converter system for transforming data between different field representations.

This module provides the foundation for data transformation pipelines, including converter registration, schema mapping, and automatic conversion path discovery using graph algorithms.

Functions

converter()

Register a converter class and configure its lazy loading behavior.

find_conversion_path(from_schema, to_schema)

Find an optimal sequence of converters using A* search, grouped by semantic.

Classes

AttributeSpec(name, field)

Specification for an attribute used in converters.

ConversionPaths(batch_converters, ...)

Container for separated batch and lazy conversion paths.

Converter(**kwargs)

Base class for data converters with input/output specifications.

ConverterRegistry()

Registry for managing and discovering data converters.

Exceptions

ConversionError

Exception raised when conversion fails.

class datumaro.experimental.converter_registry.ConversionPaths(batch_converters: List[Converter], lazy_converters: List[Converter])[source]#

Bases: NamedTuple

Container for separated batch and lazy conversion paths.

The batch converters can be applied immediately to the entire DataFrame, while lazy converters must be deferred and applied at sample access time.

Create new instance of ConversionPaths(batch_converters, lazy_converters)

batch_converters: List[Converter]#

Alias for field number 0

lazy_converters: List[Converter]#

Alias for field number 1

class datumaro.experimental.converter_registry.AttributeSpec(name: str, field: TField)[source]#

Bases: Generic[TField]

Specification for an attribute used in converters.

Links an attribute name with its corresponding field type definition, providing the complete specification needed for converter operations.

Parameters:

TField – The specific Field type, defaults to Field

name#

The attribute name

Type:

str

field#

The field type specification

Type:

datumaro.experimental.converter_registry.TField

name: str#
field: TField#
class datumaro.experimental.converter_registry.Converter(**kwargs: Any)[source]#

Bases: ABC

Base class for data converters with input/output specifications.

Converters transform data between different field representations by implementing the convert() method and optionally filtering their applicability through filter_output_spec().

Initialize converter with input and output AttributeSpec instances.

Parameters:

**kwargs – AttributeSpec instances for converter inputs/outputs based on input_*/output_* class attributes

lazy: bool = False#

Whether this converter performs lazy operations.

Lazy converters defer expensive operations (like loading images from disk) until data is actually accessed. When a lazy converter is in the conversion path, all dependent converters must also be executed lazily.

classmethod get_from_types() dict[str, Type[Field]][source]#

Extract input field types from input_* class attributes.

Returns:

Dictionary mapping input attribute names to their Field types

classmethod get_to_types() dict[str, Type[Field]][source]#

Extract output field types from output_* class attributes.

Returns:

Dictionary mapping output attribute names to their Field types

abstract convert(df: DataFrame) DataFrame[source]#

Convert a DataFrame using the stored AttributeSpec instances.

Parameters:

df – Input DataFrame

Returns:

Converted DataFrame

filter_output_spec() bool[source]#

Filter and modify the converter’s output specification in-place.

This method allows converters to inspect and modify their output specifications based on input characteristics. It should return True if the converter can handle the given input/output combination.

Returns:

True if the converter is applicable, False otherwise

get_output_attr_specs() List[AttributeSpec[Field]][source]#

Get the current output AttributeSpec instances from output_* attributes.

Returns:

List of output AttributeSpec instances currently configured on the converter

class datumaro.experimental.converter_registry.ConverterRegistry[source]#

Bases: object

Registry for managing and discovering data converters.

This class maintains a global registry of converter classes and provides functionality for finding and instantiating appropriate converters for schema transformations.

classmethod add_converter(converter: Type[Converter])[source]#

Add a converter class to the registry.

classmethod remove_converter(converter: Type[Converter]) None[source]#

Remove a converter class from the registry.

Parameters:

converter – The converter class to remove

Raises:

ValueError – If the converter is not found in the registry

classmethod list_converters() Sequence[Type[Converter]][source]#

List all registered converter classes as an immutable sequence.

datumaro.experimental.converter_registry.converter(cls: Type[Converter], /) Type[Converter][source]#
datumaro.experimental.converter_registry.converter(*, lazy: bool = False) Callable[[Type[Converter]], Type[Converter]]

Register a converter class and configure its lazy loading behavior.

This decorator automatically registers converter classes with the global converter registry and sets their lazy evaluation mode. The converter class must define at least one output_* attribute with type hints.

Parameters:

lazy – If True, this converter will only be applied during lazy evaluation in Dataset.__getitem__. If False, it will be applied during batch conversion operations. Lazy converters automatically make all dependent converters lazy as well.

Usage:

@converter class ImageToTensorConverter(Converter):

input_image: AttributeSpec output_tensor: AttributeSpec

def convert(self, df: pl.DataFrame) -> pl.DataFrame:

# conversion logic return df

@converter(lazy=True) class ImagePathToImageConverter(Converter):

input_path: AttributeSpec output_image: AttributeSpec

def convert(self, df: pl.DataFrame) -> pl.DataFrame:

# lazy conversion logic return df

exception datumaro.experimental.converter_registry.ConversionError[source]#

Bases: Exception

Exception raised when conversion fails.

datumaro.experimental.converter_registry.find_conversion_path(from_schema: Schema, to_schema: Schema) ConversionPaths[source]#

Find an optimal sequence of converters using A* search, grouped by semantic.

Fields with the same semantic can be converted between each other, but conversion across semantic boundaries is not allowed.

Parameters:
  • from_schema – Source schema

  • to_schema – Target schema

Returns:

ConversionPaths with separated batch and lazy converter lists

Raises:

ConversionError – If no conversion path is found