datumaro.experimental.fields#

Field implementations for various data types including tensors, images, and bounding boxes.

This module provides concrete field implementations that handle serialization to/from Polars DataFrames for different data types commonly used in machine learning and computer vision applications.

Functions

bbox_field(dtype[, format, normalize, semantic])

Create a BBoxField instance with the specified parameters.

image_field(dtype[, format, semantic])

Create an ImageField instance with the specified parameters.

image_info_field([semantic])

Create an ImageInfoField instance for storing image width and height.

image_path_field([semantic])

Create an ImagePathField instance with the specified semantic tags.

tensor_field(dtype[, semantic])

Create a TensorField instance with the specified semantic tags and data type.

Classes

BBoxField(semantic, dtype, format, normalize)

Represents a bounding box field with format and normalization options.

ImageField(semantic, dtype, format)

Represents an image tensor field with format information.

ImageInfo(width, height)

Container for image metadata (width and height).

ImageInfoField(semantic)

Represents image metadata (width, height) as a Polars struct.

ImagePathField(semantic)

Represents a field containing the file path to an image on disk.

TensorField(semantic, dtype)

Represents a tensor field with semantic tags and data type information.

class datumaro.experimental.fields.TensorField(semantic: Semantic, dtype: Any)[source]#

Bases: Field

Represents a tensor field with semantic tags and data type information.

This field handles n-dimensional tensor data by flattening it for storage and preserving shape information separately for reconstruction.

semantic#

Semantic tags describing the tensor’s purpose

Type:

datumaro.experimental.schema.Semantic

dtype#

Polars data type for tensor elements

Type:

Any

semantic: Semantic#
dtype: Any#
to_polars_schema(name: str) dict[str, DataType][source]#

Generate Polars schema with separate columns for data and shape.

to_polars(name: str, value: Any) dict[str, Series][source]#

Convert tensor to flattened data and shape information.

from_polars(name: str, row_index: int, df: DataFrame, target_type: type[T]) T[source]#

Reconstruct tensor from flattened data using stored shape.

datumaro.experimental.fields.tensor_field(dtype: Any, semantic: Semantic = Semantic.Default) Any[source]#

Create a TensorField instance with the specified semantic tags and data type.

Parameters:
  • dtype – Polars data type for tensor elements

  • semantic – Semantic tags describing the tensor’s purpose (optional)

Returns:

TensorField instance configured with the given parameters

class datumaro.experimental.fields.ImageField(semantic: Semantic, dtype: Any, format: str)[source]#

Bases: TensorField

Represents an image tensor field with format information.

Extends TensorField to include image-specific metadata such as color format (RGB, BGR, etc.).

format#

Image color format (e.g., “RGB”, “BGR”, “RGBA”)

Type:

str

format: str#
datumaro.experimental.fields.image_field(dtype: Any, format: str = 'RGB', semantic: Semantic = Semantic.Default) Any[source]#

Create an ImageField instance with the specified parameters.

Parameters:
  • dtype – Polars data type for pixel values

  • format – Image color format (defaults to “RGB”)

  • semantic – Semantic tags describing the image’s purpose (optional)

Returns:

ImageField instance configured with the given parameters

class datumaro.experimental.fields.BBoxField(semantic: Semantic, dtype: Any, format: str, normalize: bool)[source]#

Bases: Field

Represents a bounding box field with format and normalization options.

Handles bounding box data with support for different coordinate formats and optional normalization to [0,1] range.

semantic#

Semantic tags describing the bounding box purpose

Type:

datumaro.experimental.schema.Semantic

dtype#

Polars data type for coordinate values

Type:

Any

format#

Coordinate format (e.g., “x1y1x2y2”, “xywh”)

Type:

str

normalize#

Whether coordinates are normalized to [0,1] range

Type:

bool

semantic: Semantic#
dtype: Any#
format: str#
normalize: bool#
to_polars_schema(name: str) dict[str, DataType][source]#

Generate schema for bounding box as list of 4-element arrays.

to_polars(name: str, value: Any) dict[str, Series][source]#

Convert bounding box tensor to Polars list format.

from_polars(name: str, row_index: int, df: DataFrame, target_type: type[T]) T[source]#

Reconstruct bounding box tensor from Polars data.

datumaro.experimental.fields.bbox_field(dtype: Any, format: str = 'x1y1x2y2', normalize: bool = False, semantic: Semantic = Semantic.Default) Any[source]#

Create a BBoxField instance with the specified parameters.

Parameters:
  • dtype – Polars data type for coordinate values

  • format – Coordinate format (defaults to “x1y1x2y2”)

  • normalize – Whether coordinates are normalized (defaults to False)

  • semantic – Semantic tags describing the bounding box purpose (optional)

Returns:

BBoxField instance configured with the given parameters

class datumaro.experimental.fields.ImageInfo(width: int, height: int)[source]#

Bases: object

Container for image metadata (width and height).

width: int#
height: int#
class datumaro.experimental.fields.ImageInfoField(semantic: Semantic)[source]#

Bases: Field

Represents image metadata (width, height) as a Polars struct.

semantic: Semantic#
to_polars_schema(name: str) dict[str, DataType][source]#

Generate Polars schema definition for this field.

Parameters:

name – The column name for this field

Returns:

Dictionary mapping column names to Polars data types

Raises:

NotImplementedError – Must be implemented by subclasses

to_polars(name: str, value: ImageInfo) dict[str, Series][source]#

Convert the field value to Polars-compatible format.

Parameters:
  • name – The column name for this field

  • value – The value to convert

Returns:

Dictionary mapping column names to Polars Series

from_polars(name: str, row_index: int, df: DataFrame, target_type: type) ImageInfo[source]#

Convert from Polars-compatible format back to the field’s value.

Parameters:
  • name – The column name for this field

  • row_index – The row index to extract

  • df – The source DataFrame

  • target_type – The target type to convert to

Returns:

The converted value in the target type

datumaro.experimental.fields.image_info_field(semantic: Semantic = Semantic.Default) Any[source]#

Create an ImageInfoField instance for storing image width and height.

Parameters:

semantic – Optional semantic tags for disambiguation (e.g., Semantic.Left)

Returns:

ImageInfoField instance configured with the given semantic tags

class datumaro.experimental.fields.ImagePathField(semantic: Semantic)[source]#

Bases: Field

Represents a field containing the file path to an image on disk.

This field stores image file paths as strings and is typically used as input for lazy loading operations where images are loaded on-demand.

semantic#

Semantic tags describing the image path’s purpose

Type:

datumaro.experimental.schema.Semantic

semantic: Semantic#
to_polars_schema(name: str) dict[str, DataType][source]#

Generate schema for string path column.

to_polars(name: str, value: Any) dict[str, Series][source]#

Convert path string to Polars series.

from_polars(name: str, row_index: int, df: DataFrame, target_type: type)[source]#

Extract path string from Polars data.

datumaro.experimental.fields.image_path_field(semantic: Semantic = Semantic.Default) Any[source]#

Create an ImagePathField instance with the specified semantic tags.

Parameters:

semantic – Semantic tags describing the image path’s purpose (optional)

Returns:

ImagePathField instance configured with the given semantic tags