Action Classification#

Description#

The ActionClassificationModel is a wrapper class designed for action classification models. This class encapsulates preprocessing and postprocessing for action classification OpenVINO models satisfying a certain specification. Unlike ImageModel, it accepts video clips as input, so it performs data preparation outside the OpenVINO graph.

Parameters#

The following parameters can be provided via Python API or RT Info embedded into an OpenVINO model:

labels (list[str]): List of class labels.
path_to_labels (str): Path to file with labels. Labels are overridden if this is set.
mean_values (list[int | float]): Normalization values subtracted from image channels during preprocessing.
pad_value (int): Pad value used during the resize_image_letterbox operation embedded within the model.
resize_type (str): Resizing method. Valid options include crop, standard, fit_to_window, and fit_to_window_letterbox.
reverse_input_channels (bool): Whether to reverse input channel order.
scale_values (list[int | float]): Normalization values used to divide image channels during preprocessing.

OpenVINO Model Specifications#

Inputs#

A single 6D tensor with the following layout:

N: Batch size.
S: Number of clips x number of crops.
C: Number of channels.
T: Time.
H: Height.
W: Width.

NSTHWC layout is also supported.

Outputs#

A single tensor containing softmax-activated logits.

Wrapper input-output specifications#

Inputs#

A single clip in THWC format.

Outputs#

The output is represented as a ClassificationResult object, which includes the indices, labels, and logits of the top predictions. At present, saliency maps, feature vectors, and raw scores are not provided.

Example#

import cv2
import numpy as np

from model_api.adapters import OpenvinoAdapter, create_core
from model_api.models import ActionClassificationModel


model_path = "action_classification.xml"
inference_adapter = OpenvinoAdapter(create_core(), model_path, device="CPU")
action_cls_model = ActionClassificationModel(inference_adapter, preload=True)

cap = cv2.VideoCapture("sample.mp4")
input_data = np.stack([cap.read()[1] for _ in range(action_cls_model.clip_size)])

results = action_cls_model(input_data)

class model_api.models.action_classification.ActionClassificationModel(inference_adapter, configuration={}, preload=False)#

Bases: Model

A wrapper for an action classification model

The model given by inference_adapter can have two input formats. One is ‘NSCTHW’ and another one is ‘NSTHWC’. What each letter means are as below. N => batch size / S => number of clips x number of crops / C => number of channels T => time / H => height / W => width The ActionClassificationModel should gets single input with video - 4D tensors, which means N and S should be 1.

Video format is different from image format, so OpenVINO PrePostProcessors isn’t available. For that reason, postprocessing operations such as resize and normalize are conducted in this class.

image_blob_names#

names of all image-like inputs (6D tensors)

Type:: List[str]

image_blob_name#

name of the first image input

Type:: str

resize_type#

the type for image resizing (see RESIZE_TYPE for info)

Type:: str

resize#

resizing function corresponding to the resize_type

Type:: function

input_transform#

instance of the InputTransform for image normalization

Type:: InputTransform

Action classaification model constructor

Parameters:

inference_adapter (InferenceAdapter) – allows working with the specified executor
configuration (dict[str, Any]) – it contains values for parameters accepted by specific wrapper (labels mean_values, etc.) which are set as data attributes
preload (bool) – a flag whether the model is loaded to device while initialization. If preload=False, the model must be loaded via load method before inference

Raises:

WrapperError – if the wrapper failed to define appropriate inputs for images

classmethod parameters()#

Defines the description and type of configurable data parameters for the wrapper.

See types.py to find available types of the data parameter. For each parameter the type, default value and description must be provided.

The example of possible data parameter:

‘confidence_threshold’: NumericalValue(: default_value=0.5, description=”Threshold value for detection box confidence”

)

The method must be implemented in each specific inherited wrapper.

Return type:

dict[str, Any]

Returns:

the dictionary with defined wrapper data parameters

base_preprocess(inputs)#

Data preprocess method

It performs basic preprocessing of a single video:

Resizes the image to fit the model input size via the defined resize type
Normalizes the image: subtracts means, divides by scales, switch channels BGR-RGB
Changes the image layout according to the model input layout

Also, it keeps the size of original image and resized one as original_shape and resized_shape in the metadata dictionary.

Note

It supports only models with single image input. If the model has more image inputs or has additional supported inputs, the preprocess should be overloaded in a specific wrapper.

Parameters:

inputs (ndarray) – a single image as 4D array.

Returns:

{
‘input_layer_name’: preprocessed_image

}

the input metadata, which might be used in postprocess method

Return type:

tuple[dict[str, ndarray], dict[str, tuple[int, ...]]]

postprocess(outputs, meta)#

Post-process.

Return type:: ClassificationResult

property clip_size: int#