datumaro.experimental.dataset#

Functions

convert_sample_to_schema(sample, ...)

Convert a sample to a new schema using registered converters.

has_schema(dataset, target_dtype_or_schema)

Check if a dataset has the specified schema.

Classes

Dataset(dtype_or_schema)

Represents a typed dataset with schema validation and conversion capabilities.

Sample(**kwargs)

Base class for all samples in a dataset.

class datumaro.experimental.dataset.Sample(**kwargs: Any)[source]#

Bases: object

Base class for all samples in a dataset.

This class provides a foundation for creating sample objects with schema inference capabilities and flexible attribute assignment.

Initialize sample with provided attributes.

classmethod infer_schema() Schema[source]#

Infer schema from this Sample class definition.

Returns:

The inferred schema containing attribute information

Return type:

Schema

Raises:

TypeError – If attributes don’t have proper Field annotations

class datumaro.experimental.dataset.Dataset(dtype_or_schema: Schema | Type[DType])[source]#

Bases: Generic[DType]

Represents a typed dataset with schema validation and conversion capabilities.

This class provides a strongly-typed container for tabular data with support for complex field types, schema inference, and automatic conversions between different schema representations.

Parameters:

DType – The sample type this dataset contains

Initialize dataset with either a schema or sample type.

Parameters:

dtype_or_schema – Either a Schema instance or a Sample class type

classmethod from_dataframe(df: DataFrame, dtype_or_schema: Schema | Type[DTargetType], lazy_converters: List[Converter] | None = None) Dataset[DTargetType][source]#

Create a Dataset from an existing DataFrame and lazy converters.

Parameters:
  • df – The Polars DataFrame containing the data

  • dtype_or_schema – Either a Schema instance or a Sample class type

  • lazy_converters – Optional list of lazy converters to apply during sample access

Returns:

A new Dataset instance with the provided DataFrame and converters

property schema: Schema#

Get the schema of this dataset.

property lazy_converters: Sequence[Converter]#

Get the list of lazy converters applied to this dataset.

append(sample: DType)[source]#

Add a new sample to the dataset.

Parameters:

sample – The sample instance to add to the dataset

convert_to_schema(target_dtype_or_schema: Schema | Type[DTargetType]) Dataset[DTargetType][source]#

Convert this dataset to a new schema using registered converters.

Parameters:

target_dtype_or_schema – The target schema or sample type to convert to

Returns:

A new Dataset instance with the converted schema

datumaro.experimental.dataset.convert_sample_to_schema(sample: Sample, source_schema: Schema, target_dtype_or_schema: Schema | Type[DTargetType]) DTargetType[source]#

Convert a sample to a new schema using registered converters.

This function creates a temporary dataset, converts it, and returns the converted sample. It’s useful for one-off conversions without creating a full dataset.

Parameters:
  • sample – The sample instance to convert

  • source_schema – The source schema of the sample

  • target_schema – The target schema to convert to

Returns:

A new Sample instance with the converted schema

datumaro.experimental.dataset.has_schema(dataset: Dataset[Any], target_dtype_or_schema: Schema | Type[DTargetType]) TypeGuard[Dataset[DTargetType]][source]#

Check if a dataset has the specified schema.

This function performs schema compatibility checking and serves as a type guard for type narrowing.

Parameters:
  • dataset – The dataset to check

  • target_dtype_or_schema – The target schema or sample type to check against

Returns:

True if the dataset has the specified schema, False otherwise