datumaro.experimental.converter_registry#
Converter system for transforming data between different field representations.
This module provides the foundation for data transformation pipelines, including converter registration, schema mapping, and automatic conversion path discovery using graph algorithms.
Functions
Register a converter class and configure its lazy loading behavior. |
|
|
Find an optimal sequence of converters using A* search, grouped by semantic. |
Classes
|
Specification for an attribute used in converters. |
|
Container for separated batch and lazy conversion paths. |
|
Base class for data converters with input/output specifications. |
Registry for managing and discovering data converters. |
Exceptions
Exception raised when conversion fails. |
- class datumaro.experimental.converter_registry.ConversionPaths(batch_converters: List[Converter], lazy_converters: List[Converter])[source]#
Bases:
NamedTuple
Container for separated batch and lazy conversion paths.
The batch converters can be applied immediately to the entire DataFrame, while lazy converters must be deferred and applied at sample access time.
Create new instance of ConversionPaths(batch_converters, lazy_converters)
- class datumaro.experimental.converter_registry.AttributeSpec(name: str, field: TField)[source]#
Bases:
Generic
[TField
]Specification for an attribute used in converters.
Links an attribute name with its corresponding field type definition, providing the complete specification needed for converter operations.
- Parameters:
TField – The specific Field type, defaults to Field
- field#
The field type specification
- Type:
datumaro.experimental.converter_registry.TField
- field: TField#
- class datumaro.experimental.converter_registry.Converter(**kwargs: Any)[source]#
Bases:
ABC
Base class for data converters with input/output specifications.
Converters transform data between different field representations by implementing the convert() method and optionally filtering their applicability through filter_output_spec().
Initialize converter with input and output AttributeSpec instances.
- Parameters:
**kwargs – AttributeSpec instances for converter inputs/outputs based on input_*/output_* class attributes
- lazy: bool = False#
Whether this converter performs lazy operations.
Lazy converters defer expensive operations (like loading images from disk) until data is actually accessed. When a lazy converter is in the conversion path, all dependent converters must also be executed lazily.
- classmethod get_from_types() dict[str, Type[Field]] [source]#
Extract input field types from input_* class attributes.
- Returns:
Dictionary mapping input attribute names to their Field types
- classmethod get_to_types() dict[str, Type[Field]] [source]#
Extract output field types from output_* class attributes.
- Returns:
Dictionary mapping output attribute names to their Field types
- abstract convert(df: DataFrame) DataFrame [source]#
Convert a DataFrame using the stored AttributeSpec instances.
- Parameters:
df – Input DataFrame
- Returns:
Converted DataFrame
- filter_output_spec() bool [source]#
Filter and modify the converter’s output specification in-place.
This method allows converters to inspect and modify their output specifications based on input characteristics. It should return True if the converter can handle the given input/output combination.
- Returns:
True if the converter is applicable, False otherwise
- get_output_attr_specs() List[AttributeSpec[Field]] [source]#
Get the current output AttributeSpec instances from output_* attributes.
- Returns:
List of output AttributeSpec instances currently configured on the converter
- class datumaro.experimental.converter_registry.ConverterRegistry[source]#
Bases:
object
Registry for managing and discovering data converters.
This class maintains a global registry of converter classes and provides functionality for finding and instantiating appropriate converters for schema transformations.
- classmethod add_converter(converter: Type[Converter])[source]#
Add a converter class to the registry.
- datumaro.experimental.converter_registry.converter(cls: Type[Converter], /) Type[Converter] [source]#
- datumaro.experimental.converter_registry.converter(*, lazy: bool = False) Callable[[Type[Converter]], Type[Converter]]
Register a converter class and configure its lazy loading behavior.
This decorator automatically registers converter classes with the global converter registry and sets their lazy evaluation mode. The converter class must define at least one output_* attribute with type hints.
- Parameters:
lazy – If True, this converter will only be applied during lazy evaluation in Dataset.__getitem__. If False, it will be applied during batch conversion operations. Lazy converters automatically make all dependent converters lazy as well.
- Usage:
@converter class ImageToTensorConverter(Converter):
input_image: AttributeSpec output_tensor: AttributeSpec
- def convert(self, df: pl.DataFrame) -> pl.DataFrame:
# conversion logic return df
@converter(lazy=True) class ImagePathToImageConverter(Converter):
input_path: AttributeSpec output_image: AttributeSpec
- def convert(self, df: pl.DataFrame) -> pl.DataFrame:
# lazy conversion logic return df
- exception datumaro.experimental.converter_registry.ConversionError[source]#
Bases:
Exception
Exception raised when conversion fails.
- datumaro.experimental.converter_registry.find_conversion_path(from_schema: Schema, to_schema: Schema) ConversionPaths [source]#
Find an optimal sequence of converters using A* search, grouped by semantic.
Fields with the same semantic can be converted between each other, but conversion across semantic boundaries is not allowed.
- Parameters:
from_schema – Source schema
to_schema – Target schema
- Returns:
ConversionPaths with separated batch and lazy converter lists
- Raises:
ConversionError – If no conversion path is found