datumaro.experimental.dataset#
Functions
|
Convert a sample to a new schema using registered converters. |
|
Check if a dataset has the specified schema. |
Classes
|
Represents a typed dataset with schema validation and conversion capabilities. |
|
Base class for all samples in a dataset. |
- class datumaro.experimental.dataset.Sample(**kwargs: Any)[source]#
Bases:
object
Base class for all samples in a dataset.
This class provides a foundation for creating sample objects with schema inference capabilities and flexible attribute assignment.
Initialize sample with provided attributes.
- class datumaro.experimental.dataset.Dataset(dtype_or_schema: Schema | Type[DType])[source]#
Bases:
Generic
[DType
]Represents a typed dataset with schema validation and conversion capabilities.
This class provides a strongly-typed container for tabular data with support for complex field types, schema inference, and automatic conversions between different schema representations.
- Parameters:
DType – The sample type this dataset contains
Initialize dataset with either a schema or sample type.
- Parameters:
dtype_or_schema – Either a Schema instance or a Sample class type
- classmethod from_dataframe(df: DataFrame, dtype_or_schema: Schema | Type[DTargetType], lazy_converters: List[Converter] | None = None) Dataset[DTargetType] [source]#
Create a Dataset from an existing DataFrame and lazy converters.
- Parameters:
df – The Polars DataFrame containing the data
dtype_or_schema – Either a Schema instance or a Sample class type
lazy_converters – Optional list of lazy converters to apply during sample access
- Returns:
A new Dataset instance with the provided DataFrame and converters
- property lazy_converters: Sequence[Converter]#
Get the list of lazy converters applied to this dataset.
- append(sample: DType)[source]#
Add a new sample to the dataset.
- Parameters:
sample – The sample instance to add to the dataset
- convert_to_schema(target_dtype_or_schema: Schema | Type[DTargetType]) Dataset[DTargetType] [source]#
Convert this dataset to a new schema using registered converters.
- Parameters:
target_dtype_or_schema – The target schema or sample type to convert to
- Returns:
A new Dataset instance with the converted schema
- datumaro.experimental.dataset.convert_sample_to_schema(sample: Sample, source_schema: Schema, target_dtype_or_schema: Schema | Type[DTargetType]) DTargetType [source]#
Convert a sample to a new schema using registered converters.
This function creates a temporary dataset, converts it, and returns the converted sample. It’s useful for one-off conversions without creating a full dataset.
- Parameters:
sample – The sample instance to convert
source_schema – The source schema of the sample
target_schema – The target schema to convert to
- Returns:
A new Sample instance with the converted schema
- datumaro.experimental.dataset.has_schema(dataset: Dataset[Any], target_dtype_or_schema: Schema | Type[DTargetType]) TypeGuard[Dataset[DTargetType]] [source]#
Check if a dataset has the specified schema.
This function performs schema compatibility checking and serves as a type guard for type narrowing.
- Parameters:
dataset – The dataset to check
target_dtype_or_schema – The target schema or sample type to check against
- Returns:
True if the dataset has the specified schema, False otherwise