Action Classification#
Description#
The ActionClassificationModel is a wrapper class designed for action classification models.
This class encapsulates preprocessing and postprocessing for action classification OpenVINO models satisfying a certain specification.
Unlike ImageModel, it accepts video clips as input, so it performs data preparation outside the OpenVINO graph.
Parameters#
The following parameters can be provided via Python API or RT Info embedded into an OpenVINO model:
labels(list[str]): List of class labels.path_to_labels(str): Path to file with labels. Labels are overridden if this is set.mean_values(list[int | float]): Normalization values subtracted from image channels during preprocessing.pad_value(int): Pad value used during theresize_image_letterboxoperation embedded within the model.resize_type(str): Resizing method. Valid options includecrop,standard,fit_to_window, andfit_to_window_letterbox.reverse_input_channels(bool): Whether to reverse input channel order.scale_values(list[int | float]): Normalization values used to divide image channels during preprocessing.
OpenVINO Model Specifications#
Inputs#
A single 6D tensor with the following layout:
N: Batch size.S: Number of clips x number of crops.C: Number of channels.T: Time.H: Height.W: Width.
NSTHWC layout is also supported.
Outputs#
A single tensor containing softmax-activated logits.
Wrapper input-output specifications#
Inputs#
A single clip in THWC format.
Outputs#
The output is represented as a ClassificationResult object, which includes the indices, labels, and logits of the top predictions.
At present, saliency maps, feature vectors, and raw scores are not provided.
Example#
import cv2
import numpy as np
from model_api.adapters import OpenvinoAdapter, create_core
from model_api.models import ActionClassificationModel
model_path = "action_classification.xml"
inference_adapter = OpenvinoAdapter(create_core(), model_path, device="CPU")
action_cls_model = ActionClassificationModel(inference_adapter, preload=True)
cap = cv2.VideoCapture("sample.mp4")
input_data = np.stack([cap.read()[1] for _ in range(action_cls_model.clip_size)])
results = action_cls_model(input_data)
- class model_api.models.action_classification.ActionClassificationModel(inference_adapter, configuration={}, preload=False)#
Bases:
ModelA wrapper for an action classification model
The model given by inference_adapter can have two input formats. One is ‘NSCTHW’ and another one is ‘NSTHWC’. What each letter means are as below. N => batch size / S => number of clips x number of crops / C => number of channels T => time / H => height / W => width The ActionClassificationModel should gets single input with video - 4D tensors, which means N and S should be 1.
Video format is different from image format, so OpenVINO PrePostProcessors isn’t available. For that reason, postprocessing operations such as resize and normalize are conducted in this class.
- image_blob_names#
names of all image-like inputs (6D tensors)
- Type:
List[str]
- image_blob_name#
name of the first image input
- Type:
str
- resize_type#
the type for image resizing (see RESIZE_TYPE for info)
- Type:
str
- resize#
resizing function corresponding to the resize_type
- Type:
function
- input_transform#
instance of the InputTransform for image normalization
- Type:
Action classaification model constructor
- Parameters:
inference_adapter (
InferenceAdapter) – allows working with the specified executorconfiguration (
dict[str,Any]) – it contains values for parameters accepted by specific wrapper (labels mean_values, etc.) which are set as data attributespreload (
bool) – a flag whether the model is loaded to device while initialization. If preload=False, the model must be loaded via load method before inference
- Raises:
WrapperError – if the wrapper failed to define appropriate inputs for images
- classmethod parameters()#
Defines the description and type of configurable data parameters for the wrapper.
See types.py to find available types of the data parameter. For each parameter the type, default value and description must be provided.
- The example of possible data parameter:
- ‘confidence_threshold’: NumericalValue(
default_value=0.5, description=”Threshold value for detection box confidence”
)
The method must be implemented in each specific inherited wrapper.
- Return type:
dict[str,Any]- Returns:
the dictionary with defined wrapper data parameters
- base_preprocess(inputs)#
Data preprocess method
- It performs basic preprocessing of a single video:
Resizes the image to fit the model input size via the defined resize type
Normalizes the image: subtracts means, divides by scales, switch channels BGR-RGB
Changes the image layout according to the model input layout
Also, it keeps the size of original image and resized one as original_shape and resized_shape in the metadata dictionary.
Note
It supports only models with single image input. If the model has more image inputs or has additional supported inputs, the preprocess should be overloaded in a specific wrapper.
- Parameters:
inputs (
ndarray) – a single image as 4D array.- Returns:
- {
‘input_layer_name’: preprocessed_image
}
the input metadata, which might be used in postprocess method
- Return type:
tuple[dict[str,ndarray],dict[str,tuple[int,...]]]
- postprocess(outputs, meta)#
Post-process.
- Return type:
ClassificationResult
- property clip_size: int#