datumaro.components.transformer#
Classes
|
|
|
A transformation class for processing dataset items in batches with optional parallelism. |
|
A base class for dataset transformations that change dataset items or their annotations. |
- class datumaro.components.transformer.Transform(extractor: IDataset)[source]#
Bases:
DatasetBase
,CliPlugin
A base class for dataset transformations that change dataset items or their annotations.
- class datumaro.components.transformer.ItemTransform(extractor: IDataset)[source]#
Bases:
Transform
- transform_item(item: DatasetItem) DatasetItem | None [source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.components.transformer.TabularTransform(extractor: IDataset, batch_size: int = 1, num_workers: int = 0)[source]#
Bases:
Transform
A transformation class for processing dataset items in batches with optional parallelism.
This class takes a dataset extractor, batch size, and number of worker threads to process dataset items. Depending on the number of workers specified, it can process items either sequentially (single-process) or in parallel (multi-process), making it efficient for batch transformations.
- Parameters:
extractor – The dataset extractor to obtain items from.
batch_size – The batch size for processing items. Default is 1.
num_workers – The number of worker threads to use for parallel processing. Set to 0 for single-process mode. Default is 0.
- transform_item(item: DatasetItem) DatasetItem | None [source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.