datumaro.components.algorithms.hash_key_inference.prune#

Functions

match_num_item_for_cluster(ratio, ...)

Classes

`Centroid`(args, *kwargs)	Select items through clustering with centers targeting the desired number.
`ClusteredRandom`(args, *kwargs)	Select items through clustering and choose randomly within each cluster.
`Entropy`(args, *kwargs)	Select items through clustering and choose them based on label entropy in each cluster.
`NDRSelect`(args, *kwargs)	Select items based on NDR among each subset.
`Prune`(dataset[, cluster_method, hash_type])	Prune make a representative and manageable subset.
`PruneBase`(args, *kwargs)
`QueryClust`(args, *kwargs)	Select items through clustering with inits that imply each label.
`RandomSelect`(args, *kwargs)	Select items randomly from the dataset.

datumaro.components.algorithms.hash_key_inference.prune.match_num_item_for_cluster(ratio, dataset_len, cluster_num_item_list)[source]#

class datumaro.components.algorithms.hash_key_inference.prune.PruneBase(*args, **kwargs)[source]#

Bases: ABC

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.RandomSelect(*args, **kwargs)[source]#

Bases: PruneBase

Select items randomly from the dataset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Centroid(*args, **kwargs)[source]#

Bases: PruneBase

Select items through clustering with centers targeting the desired number.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.ClusteredRandom(*args, **kwargs)[source]#

Bases: PruneBase

Select items through clustering and choose randomly within each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.QueryClust(*args, **kwargs)[source]#

Bases: PruneBase

Select items through clustering with inits that imply each label.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Entropy(*args, **kwargs)[source]#

Bases: PruneBase

Select items through clustering and choose them based on label entropy in each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.NDRSelect(*args, **kwargs)[source]#

Bases: PruneBase

Select items based on NDR among each subset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Prune(dataset: Dataset, cluster_method: str = 'random', hash_type: str = 'img')[source]#

Bases: HashInference

Prune make a representative and manageable subset.

get_pruned(ratio: float = 0.5) → Dataset[source]#