otx.data.transform_libs.torchvision#

Helper to support TorchVision data transform functions.

Functions

custom_query_size(flat_inputs)

Classes

CachedMixUp([img_scale, ratio_range, ...])

Implementation of mmdet.datasets.transforms.CachedMixup with torchvision format.

CachedMosaic([img_scale, ...])

Implementation of mmdet.datasets.transforms.CachedMosaic with torchvision format.

Compose(transforms)

Re-implementation of torchvision.transforms.v2.Compose.

EfficientNetRandomCrop(scale[, min_covered, ...])

EfficientNet style RandomResizedCrop.

MinIoURandomCrop([min_ious, min_crop_size, ...])

Implementation of mmdet.datasets.transforms.MinIoURandomCrop with torchvision format.

NumpytoTVTensorMixin()

Convert numpy to tv tensors.

Pad([size, size_divisor, pad_to_square, ...])

Implementation of mmdet.datasets.transforms.Pad with torchvision format.

PhotoMetricDistortion([brightness_delta, ...])

Implementation of mmdet.datasets.transforms.PhotoMetricDistortion with torchvision format.

RandomAffine([max_rotate_degree, ...])

Implementation of mmdet.datasets.transforms.RandomAffine with torchvision format.

RandomCrop(crop_size[, crop_type, ...])

Implementation of mmdet.datasets.transforms.RandomCrop with torchvision format.

RandomFlip([prob, direction, ...])

Implementation of mmdet.datasets.transforms.RandomFlip with torchvision format.

RandomIoUCrop([min_scale, max_scale, ...])

Random IoU crop with the option to set probability.

RandomResize(scale[, ratio_range, ...])

Implementation of mmcv.transforms.RandomResize with torchvision format.

RandomResizedCrop(scale[, crop_ratio_range, ...])

Crop the given image to random scale and aspect ratio.

Resize([scale, scale_factor, keep_ratio, ...])

Implementation of mmdet.datasets.transforms.Resize with torchvision format.

TopdownAffine(input_size[, ...])

Get the bbox image as the model input by affine transform.

TorchVisionTransformLib()

Helper to support TorchVision transforms (only V2) in OTX.

YOLOXHSVRandomAug([hue_delta, ...])

Implementation of mmdet.datasets.transforms.YOLOXHSVRandomAug with torchvision format.

class otx.data.transform_libs.torchvision.CachedMixUp(img_scale: tuple[int, int] | list[int] = (640, 640), ratio_range: tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True, max_cached_images: int = 20, random_pop: bool = True, prob: float = 1.0, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.CachedMixup with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Parameters:
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (height, width). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (float) – Pad value. Defaults to 114.0.

  • max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

MixUp transform function.

class otx.data.transform_libs.torchvision.CachedMosaic(img_scale: tuple[int, int] | list[int] = (640, 640), center_ratio_range: tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, prob: float = 1.0, max_cached_images: int = 40, random_pop: bool = True, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.CachedMosaic with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Parameters:
  • img_scale (Sequence[int]) – Image size before mosaic pipeline of single image. The shape order should be (height, width). Defaults to (640, 640).

  • center_ratio_range (tuple[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (float) – Pad value. Defaults to 114.0.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Forward for CachedMosaic.

class otx.data.transform_libs.torchvision.Compose(transforms: Sequence[Callable])[source]#

Bases: Compose

Re-implementation of torchvision.transforms.v2.Compose.

MMCV transforms can produce None, so it is required to skip the result.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*inputs: OTXDataItem) OTXDataItem | None[source]#

Forward with skipping None.

class otx.data.transform_libs.torchvision.EfficientNetRandomCrop(scale: int, min_covered: float = 0.1, crop_padding: int = 32, interpolation: str = 'bicubic', **kwarg)[source]#

Bases: RandomResizedCrop

EfficientNet style RandomResizedCrop.

This class implements mmpretrain.datasets.transforms.EfficientNetRandomCrop reimplemented as torchvision.transform.

Parameters:
  • scale (int) – Desired output scale of the crop. Only int size is accepted, a square crop (size, size) is made.

  • min_covered (Number) – Minimum ratio of the cropped area to the original area. Defaults to 0.1.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Defaults to 32.

  • crop_ratio_range (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

  • aspect_ratio_range (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

  • max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bicubic’.

  • backend (str) – The image resize backend type, accepted values are ‘cv2’ and ‘pillow’. Defaults to ‘cv2’.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

class otx.data.transform_libs.torchvision.MinIoURandomCrop(min_ious: Sequence[float] = (0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size: float = 0.3, bbox_clip_border: bool = True, is_numpy_to_tvtensor: bool = False, prob: float = 1.0)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.MinIoURandomCrop with torchvision format.

Reference : open-mmlab/mmdetection

Parameters:
  • min_ious (Sequence[float]) – minimum IoU threshold for all intersections with bounding boxes.

  • min_crop_size (float) – minimum crop’s size (i.e. h,w := a*h, a*w, where a >= min_crop_size).

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

  • prob (float) – probability of applying this transformation. Defaults to 1.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Forward for MinIoURandomCrop.

class otx.data.transform_libs.torchvision.NumpytoTVTensorMixin[source]#

Bases: object

Convert numpy to tv tensors.

convert(inputs: OTXDataItem | None) OTXDataItem | None[source]#

Convert numpy to tv tensors.

class otx.data.transform_libs.torchvision.Pad(size: tuple[int, int] | None = None, size_divisor: int | None = None, pad_to_square: bool = False, pad_val: int | float | dict | None = None, padding_mode: str = 'constant', transform_point: bool = False, transform_mask: bool = False, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.Pad with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Parameters:
  • size (tuple, optional) – Fixed padding size. Expected padding shape (height, width). Defaults to None.

  • size_divisor (int, optional) – The divisor of padded size. Defaults to None.

  • pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Defaults to False.

  • pad_val (int | float | dict[str, int | float], optional) –

    the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:

    • img: The value to pad the image.

    • seg: The value to pad the semantic segmentation map.

    Defaults to dict(img=0, seg=255).

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Defaults to ‘constant’.

    • constant: pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

  • transform_mask (bool) – Whether to transform masks. Defaults to False.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Forward function to pad images.

class otx.data.transform_libs.torchvision.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[int | float] = (0.5, 1.5), saturation_range: Sequence[int | float] = (0.5, 1.5), hue_delta: int = 18, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.PhotoMetricDistortion with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

  8. randomly swap channels

Parameters:
  • brightness_delta (int) – delta of brightness.

  • contrast_range (sequence) – range of contrast.

  • saturation_range (sequence) – range of saturation.

  • hue_delta (int) – delta of hue.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to perform photometric distortion on images.

class otx.data.transform_libs.torchvision.RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: tuple[int, int] = (0, 0), border_val: tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.RandomAffine with torchvision format.

Reference : open-mmlab/mmdetection

RandomAffine only supports images and bounding boxes in mmdetection.

TODO : optimize logic to torcivision pipeline

Parameters:
  • max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.

  • max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.

  • scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).

  • max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.

  • border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).

  • border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Forward for RandomAffine.

class otx.data.transform_libs.torchvision.RandomCrop(crop_size: tuple[int, int], crop_type: str = 'absolute', cat_max_ratio: int | float = 1, allow_negative_crop: bool = False, recompute_bbox: bool = False, bbox_clip_border: bool = True, ignore_index: int = 255, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.RandomCrop with torchvision format.

Reference : open-mmlab/mmcv

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Parameters:
  • crop_size (tuple[int, int]) – The relative ratio or absolute pixels of (height, width).

  • crop_type (str, optional) – One of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Defaults to “absolute”.

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

  • allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Defaults to False.

  • recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Defaults to False.

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

  • ignore_index (int) – The label index to be ignored. Defaults to 255.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to randomly crop images, bounding boxes, masks, and polygons.

class otx.data.transform_libs.torchvision.RandomFlip(prob: float | Iterable[float] | None = None, direction: str | Sequence[str | None] = 'horizontal', is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.RandomFlip with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

  • prob is float, direction is string: the image will be

    direction``ly flipped with probability of ``prob . E.g., prob=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

  • prob is float, direction is list of string: the image will

    be direction[i]``ly flipped with probability of ``prob/len(direction). E.g., prob=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.

  • prob is list of float, direction is list of string:

    given len(prob) == len(direction), the image will be direction[i]``ly flipped with probability of ``prob[i]. E.g., prob=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Parameters:
  • prob (float | list[float], optional) – The flipping probability. Defaults to None.

  • direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal prob. Each element in prob indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Flip images, bounding boxes, and semantic segmentation map.

class otx.data.transform_libs.torchvision.RandomIoUCrop(min_scale: float = 0.3, max_scale: float = 1, min_aspect_ratio: float = 0.5, max_aspect_ratio: float = 2, sampler_options: list[float] | None = None, trials: int = 40, p: float = 1.0)[source]#

Bases: RandomIoUCrop

Random IoU crop with the option to set probability.

Parameters:
  • min_scale (float, optional) – the same as RandomIoUCrop. Defaults to 0.3.

  • max_scale (float, optional) – the same as RandomIoUCrop. Defaults to 1.

  • min_aspect_ratio (float, optional) – the same as RandomIoUCrop. Defaults to 0.5.

  • max_aspect_ratio (float, optional) – the same as RandomIoUCrop. Defaults to 2.

  • sampler_options (list[float] | None, optional) – the same as RandomIoUCrop. Defaults to None.

  • trials (int, optional) – the same as RandomIoUCrop. Defaults to 40.

  • p (float, optional) – probability. Defaults to 1.0.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__call__(*inputs: Any) Any[source]#

Apply the transform to the given inputs.

class otx.data.transform_libs.torchvision.RandomResize(scale: Sequence[int | tuple[int, int]], ratio_range: tuple[float, float] | None = None, is_numpy_to_tvtensor: bool = False, **resize_kwargs)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmcv.transforms.RandomResize with torchvision format.

Reference : open-mmlab/mmcv

Parameters:
  • scale (Sequence) – Images scales for resizing with (height, width). Defaults to None.

  • ratio_range (tuple[float], optional) – (min_ratio, max_ratio). Defaults to None.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

  • **resize_kwargs – Other keyword arguments for the resize_type.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to resize images, bounding boxes, semantic segmentation map.

class otx.data.transform_libs.torchvision.RandomResizedCrop(scale: Sequence[int] | int, crop_ratio_range: tuple[float, float] = (0.08, 1.0), aspect_ratio_range: tuple[float, float] = (0.75, 1.3333333333333333), max_attempts: int = 10, interpolation: str = 'bilinear', transform_mask: bool = False, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Crop the given image to random scale and aspect ratio.

This class implements mmpretrain.datasets.transforms.RandomResizedCrop reimplemented as torchvision.transform. A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

Parameters:
  • scale (Sequence[int] | int) – Desired output scale of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • crop_ratio_range (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

  • aspect_ratio_range (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

  • max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

  • transform_mask (bool) – Whether to transform masks. Defaults to False.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to randomly resized crop images and masks.

class otx.data.transform_libs.torchvision.Resize(scale: int | tuple[int, int] | None = None, scale_factor: float | tuple[float, float] | None = None, keep_ratio: bool = False, clip_object_border: bool = True, interpolation: str = 'bilinear', interpolation_mask: str = 'nearest', transform_bbox: bool = False, transform_keypoints: bool = False, transform_mask: bool = False, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.Resize with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Parameters:
  • scale (int or tuple) – Images scales for resizing with (height, width). Defaults to None

  • scale_factor (float or tuple[float]) – Scale factors for resizing with (height, width). Defaults to None.

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • interpolation (str) – Interpolation method. Defaults to ‘bilinear’.

  • interpolation_mask (str) – Interpolation method for mask. Defaults to ‘nearest’.

  • transform_bbox (bool) – Whether to transform bounding boxes. Defaults to False.

  • transform_keypoints (bool) – Whether to transform keypoints. Defaults to False.

  • transform_mask (bool) – Whether to transform masks. Defaults to False.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to resize images, bounding boxes, and masks.

class otx.data.transform_libs.torchvision.TopdownAffine(input_size: tuple[int, int], affine_transforms_prob: float = 1.0, is_numpy_to_tvtensor: bool = False, shift_factor: float = 0.16, shift_prob: float = 0.3, scale_factor: tuple[float, float] = (0.5, 1.5), scale_prob: float = 1.0, rotate_factor: float = 80.0, rotate_prob: float = 0.5, interpolation: str = 'bilinear')[source]#

Bases: Transform, NumpytoTVTensorMixin

Get the bbox image as the model input by affine transform.

Parameters:
  • input_size (tuple[int, int]) – The size of the model input.

  • affine_transforms_prob (float) – The probability of applying affine transforms. Defaults to 0.5.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

  • shift_factor (float) – The factor of shift. Defaults to 0.16.

  • shift_prob (float) – The probability of shift. Defaults to 0.3.

  • scale_factor (tuple[float, float]) – The factor of scale. Defaults to (0.5, 1.5).

  • scale_prob (float) – The probability of scale. Defaults to 1.0.

  • rotate_factor (float) – The factor of rotate. Defaults to 80.0.

  • rotate_prob (float) – The probability of rotate. Defaults to 0.5.

  • interpolation (str) – The interpolation method. Defaults to “bilinear”.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__call__(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Transform function to affine image through warp matrix.

class otx.data.transform_libs.torchvision.TorchVisionTransformLib[source]#

Bases: object

Helper to support TorchVision transforms (only V2) in OTX.

classmethod generate(config: SubsetConfig) Compose[source]#

Generate TorchVision transforms from the configuration.

classmethod list_available_transforms() list[type[Transform]][source]#

List available TorchVision transform (only V2) classes.

class otx.data.transform_libs.torchvision.YOLOXHSVRandomAug(hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30, is_numpy_to_tvtensor: bool = False)[source]#

Bases: Transform, NumpytoTVTensorMixin

Implementation of mmdet.datasets.transforms.YOLOXHSVRandomAug with torchvision format.

Reference : open-mmlab/mmdetection

TODO : optimize logic to torcivision pipeline

Parameters:
  • hue_delta (int) – delta of hue. Defaults to 5.

  • saturation_delta (int) – delta of saturation. Defaults to 30.

  • value_delta (int) – delat of value. Defaults to 30.

  • is_numpy_to_tvtensor (bool) – Whether convert outputs to tensor. Defaults to False.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*_inputs: OTXDataItem) OTXDataItem | None[source]#

Forward for random hsv transform.