Model#

exception model_api.models.model.WrapperError(wrapper_name, message)#

Bases: Exception

The class for errors occurred in Model API wrappers

class model_api.models.model.Model(inference_adapter, configuration={}, preload=False)#

Bases: object

An abstract model wrapper

The abstract model wrapper is free from any executor dependencies. It sets the InferenceAdapter instance with the provided model and defines model inputs/outputs.

Next, it loads the provided configuration variables and sets it as wrapper attributes. The keys of the configuration dictionary should be presented in the parameters method.

Also, it decorates the following adapter interface:
  • Loading the model to the device

  • The model reshaping

  • Synchronous model inference

  • Asynchronous model inference

The preprocess and postprocess methods must be implemented in a specific inherited wrapper.

logger#

instance of the Logger

Type:

Logger

inference_adapter#

allows working with the specified executor

Type:

InferenceAdapter

inputs#

keeps the model inputs names and Metadata structure for each one

Type:

dict

outputs#

keeps the model outputs names and Metadata structure for each one

Type:

dict

model_loaded#

a flag whether the model is loaded to device

Type:

bool

params#

provides attribute-style access to model parameters

Type:

ParameterDescriptor

Model constructor

Parameters:
  • inference_adapter (InferenceAdapter) – allows working with the specified executor

  • configuration (dict) – it contains values for parameters accepted by specific wrapper (confidence_threshold, labels etc.) which are set as data attributes

  • preload (bool) – a flag whether the model is loaded to device while initialization. If preload=False, the model must be loaded via load method before inference

Raises:

WrapperError – if the wrapper configuration is incorrect

classmethod available_wrappers()#

Prepares a list of all discoverable wrapper names (including custom ones inherited from the core wrappers).

Return type:

list[str]

classmethod create_model(model, model_type=None, configuration={}, preload=True, core=None, weights_path=None, adaptor_parameters={}, device='AUTO', nstreams='1', nthreads=None, max_num_requests=0, precision='FP16', download_dir=None, cache_dir=None)#

Create an instance of the Model API model

Parameters:
  • model (str | InferenceAdapter) – model name from OpenVINO Model Zoo, path to model, OVMS URL, or an adapter

  • configuration (dict[str, Any]) – dictionary of model config with model properties, for example confidence_threshold, labels

  • model_type (Any | None) – name of model wrapper to create (e.g. “ssd”)

  • preload (bool) – whether to call load_model(). Can be set to false to reshape model before loading.

  • core (Any | None) – openvino.Core instance, passed to OpenvinoAdapter

  • weights_path (PathLike | None) – path to .bin file with model weights

  • adaptor_parameters (dict[str, Any]) – parameters of ModelAdaptor

  • device (str) – name of OpenVINO device (e.g. “CPU, GPU”)

  • nstreams (str) – number of inference streams

  • nthreads (int | None) – number of threads to use for inference on CPU

  • max_num_requests (int) – number of infer requests for asynchronous inference

  • precision (str) – inference precision (e.g. “FP16”)

  • download_dir (PathLike | None) – directory where to store downloaded models

  • cache_dir (PathLike | None) – directory where to store compiled models to reduce the load time before the inference.

Return type:

Any

Returns:

Model object

classmethod detect_model_type(inference_adapter)#

Detects model type on available information

Return type:

str

classmethod from_pretrained(pretrained_model_name_or_path, *, cache_dir=None, force_download=False, local_files_only=False, token=None, revision=None, local_dir=None, subfolder=None, repo_type='model', filename=None, model_type=None, configuration={}, preload=True, core=None, weights_path=None, adaptor_parameters={}, device='AUTO', nstreams='1', nthreads=None, max_num_requests=0, precision='FP16')#

Load a model from a Hugging Face Hub repository.

Downloads the model files from the given Hugging Face repository and creates a Model API wrapper using create_model(). Supports private repositories via token authentication and all standard Hugging Face download options.

Parameters:
  • pretrained_model_name_or_path (str) – Hugging Face repository identifier (e.g. user/model-name).

  • cache_dir (str | PathLike | None) – Custom Hugging Face cache directory.

  • force_download (bool) – Re-download even if the files are cached.

  • local_files_only (bool) – Only use locally cached files; error if the requested files are not available.

  • token (str | bool | None) – Authentication token for private repos. True reads from the cached HF login.

  • revision (str | None) – Git revision — branch, tag, or full-length commit hash.

  • local_dir (str | PathLike | None) – Download files into this local directory with their original layout.

  • subfolder (str | None) – Subfolder inside the repository to look for the model.

  • repo_type (str) – Repository type (model, dataset, or space). Defaults to model.

  • filename (str | None) – Model API-specific parameter for selecting a specific model file to download (e.g. model.xml). When omitted the repository is scanned for xml / .onnx files automatically.

  • model_type (Any | None) – name of model wrapper to create (e.g. ssd). Detected automatically when omitted.

  • configuration (dict[str, Any]) – dictionary of model config with model properties, e.g. confidence_threshold, labels.

  • preload (bool) – Whether to load the model onto the device immediately.

  • core (Any | None) – openvino.Core instance, passed to OpenvinoAdapter.

  • weights_path (PathLike | None) – explicit path to .bin weights file.

  • adaptor_parameters (dict[str, Any]) – extra parameters for the inference adapter.

  • device (str) – OpenVINO device name (e.g. CPU, GPU, AUTO).

  • nstreams (str) – Number of inference streams.

  • nthreads (int | None) – Number of CPU threads.

  • max_num_requests (int) – Maximum number of asynchronous inference requests.

  • precision (str) – Inference precision (e.g. FP16).

Return type:

Any

Returns:

A fully initialized Model API wrapper.

Raises:
  • ImportError – If huggingface_hub is not installed.

  • FileNotFoundError – If no model file is found in the repository.

  • huggingface_hub.errors.RepositoryNotFoundError – If the repository does not exist or is not accessible.

  • ValueError – If multiple model files are found and filename is not specified.

classmethod get_model_class(name)#

Retrieves a wrapper class by a given wrapper name.

Parameters:

name (str) – Wrapper name.

Returns:

Model class.

Return type:

Type

classmethod get_subclasses()#

Retrieves all the subclasses of the model class given.

Return type:

list[Any]

classmethod parameters()#

Defines the description and type of configurable data parameters for the wrapper.

See types.py to find available types of the data parameter. For each parameter the type, default value and description must be provided.

The example of possible data parameter:
‘confidence_threshold’: NumericalValue(

default_value=0.5, description=”Threshold value for detection box confidence”

)

The method must be implemented in each specific inherited wrapper.

Return type:

dict[str, Any]

Returns:

  • the dictionary with defined wrapper data parameters

classmethod raise_error(message)#

Raises the WrapperError.

Parameters:

message (str) – error message to be shown in the following format: “WrapperName: message”

Return type:

NoReturn

__call__(inputs)#

Applies preprocessing, synchronous inference, postprocessing routines while one call.

Parameters:

inputs (ndarray) – raw input data, the data type is defined by wrapper

Returns:

  • postprocessed data in the format defined by wrapper

await_all()#

Waits for all async inference requests to be completed.

await_any()#

Waits for model to be available for an async infer request.

base_preprocess(inputs)#

Interface for preprocess method.

Parameters:

inputs – raw input data, the data type is defined by wrapper

Returns:

  • the preprocessed data which is submitted to the model for inference

    and has the following format: {

    ’input_layer_name_1’: data_1, ‘input_layer_name_2’: data_2, …

    }

  • the input metadata, which might be used in postprocess method

get_cached_parameters()#

Get cached parameters, initializing cache if needed.

Returns:

Dictionary of parameter definitions

Return type:

dict[str, Any]

get_model()#

Returns underlying adapter-specific model.

Returns:

Model object.

Return type:

Any

get_param(name)#

Gets a parameter value, either from instance attribute (if set by config) or from parameter default.

Parameters:

name (str) – parameter name

Returns:

parameter value

Return type:

Any

get_performance_metrics()#

Returns performance metrics of the model.

Returns:

Performance metrics object.

Return type:

PerformanceMetrics

infer_async(input_data, user_data)#

Runs asynchronous model inference.

Parameters:
  • input_data (dict) – Input dict containing model input name as keys and data object as values.

  • user_data (Any) – data to be passed to the callback alongside with inference results.

infer_async_raw(dict_data, callback_data)#

Runs asynchronous inference on raw data skipping preprocess() call.

Parameters:
  • dict_data (dict) – data to be passed to the model

  • callback_data (Any) – data to be passed to the callback alongside with inference results.

infer_batch(inputs)#

Applies preprocessing, asynchronous inference, postprocessing routines to a collection of inputs.

Parameters:

inputs (list) – a list of inputs for inference

Returns:

a list of inference results

Return type:

list[Any]

infer_sync(dict_data)#

Performs the synchronous model inference. The infer is a blocking method. See InferenceAdapter documentation for details.

Return type:

dict[str, ndarray]

is_ready()#

Checks if model is ready for async inference.

load(force=False)#

Prepares the model to be executed by the inference adapter.

Parameters:

force (bool) – Forces the process even if the model is ready. Defaults to False.

Return type:

None

log_layers_info()#

Prints the shape, precision and layout for all model inputs/outputs.

postprocess(outputs, meta)#

Interface for postprocess method.

Parameters:
  • outputs (dict[str, Any]) –

    model raw output in the following format: {

    ’output_layer_name_1’: raw_result_1, ‘output_layer_name_2’: raw_result_2, …

    }

  • meta (dict[str, Any]) – the input metadata obtained from preprocess method

Returns:

  • postprocessed data in the format defined by wrapper

preprocess(dict_inputs, meta)#

Interface for preprocess hook.

Parameters:
  • dict_inputs – preprocessed data

  • meta – input metadata

Returns:

  • the preprocessed data

  • the input metadata

reshape(new_shape)#

Reshapes the model inputs to fit the new input shape.

Parameters:
  • new_shape (dict) – a dictionary with inputs names as keys and

  • format. (list of new shape as values in the following)

save(path, weights_path=None, version=None)#

Serializes model to the filesystem. Model format depends in the InferenceAdapter being used.

Parameters:
  • path (str) – Path to write the resulting model.

  • weights_path (str | None) – Optional path to save weights if they are stored separately.

  • version (str | None) – Optional model version.

set_callback(callback_fn)#

Sets callback that grabs results of async inference.

Parameters:

callback_fn (Callable) – _description_

params#

Descriptor that provides parameter access for models.