otx.data.transform_libs.utils#

Utils for data transform functions.

Functions

area_polygon(x, y)

Compute the area of a component of a polygon.

centers_bboxes(boxes)

Return a tensor representing the centers of boxes.

clip_bboxes(boxes, img_shape)

Clip boxes according to the image shape in-place.

corner2hbox(corners)

Convert box coordinates from corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)) to (x1, y1, x2, y2).

crop_masks(masks, bbox)

Crop each mask by the given bbox.

crop_polygons(polygons, bbox, height, width)

Crop each polygon by the given bbox.

flip_bboxes(boxes, img_shape[, direction])

Flip boxes horizontally or vertically in-place.

flip_image(img[, direction])

Flip an image horizontally or vertically.

flip_masks(masks[, direction])

Flip masks alone the given direction.

flip_polygons(polygons, height, width[, ...])

Flip polygons alone the given direction.

fp16_clamp(x[, min, max])

Clamp fp16 tensor.

get_bboxes_from_masks(masks)

Create boxes from masks.

get_bboxes_from_polygons(polygons, height, width)

Create boxes from polygons.

get_image_shape(img)

Get image(s) shape with (height, width).

hbox2corner(boxes)

Convert box coordinates from (x1, y1, x2, y2) to corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)).

is_inside_bboxes(boxes, img_shape[, ...])

Find boxes inside the image.

overlap_bboxes(bboxes1, bboxes2[, mode, ...])

Calculate overlap between two set of bboxes.

project_bboxes(boxes, homography_matrix)

Geometric transformat boxes in-place.

rescale_bboxes(boxes, scale_factor)

Rescale boxes w.r.t.

rescale_keypoints(keypoints, scale_factor)

Rescale keypoints as large as possible while keeping the aspect ratio.

rescale_masks(masks, scale_factor[, ...])

Rescale masks as large as possible while keeping the aspect ratio.

rescale_polygons(polygons, scale_factor)

Rescale polygons as large as possible while keeping the aspect ratio.

rescale_size(old_size, scale[, return_scale])

Calculate the new size to be rescaled to.

scale_size(size, scale)

Rescale a size by a ratio.

to_np_image(img)

Convert torch.Tensor 3D image to numpy 3D image.

translate_bboxes(boxes, distances)

Translate boxes in-place.

translate_masks(masks, out_shape, offset[, ...])

Translate the masks.

translate_polygons(polygons, out_shape, offset)

Translate polygons.

Classes

cache_randomness(func)

Decorator that marks the method with random return value(s) in a transform class.

class otx.data.transform_libs.utils.cache_randomness(func)[source]#

Bases: object

Decorator that marks the method with random return value(s) in a transform class.

Reference : open-mmlab/mmcv

This decorator is usually used together with the context-manager :func`:cache_random_params`. In this context, a decorated method will cache its return value(s) at the first time of being invoked, and always return the cached values when being invoked again.

Note

Only an instance method can be decorated with cache_randomness.

__call__(*args, **kwargs)[source]#

Call self as a function.

otx.data.transform_libs.utils.area_polygon(x: ndarray, y: ndarray) ndarray[source]#

Compute the area of a component of a polygon.

Using the shoelace formula: https://stackoverflow.com/questions/24467972/calculate-area-of-polygon-given-x-y-coordinates

Parameters:
  • x (ndarray) – x coordinates of the component

  • y (ndarray) – y coordinates of the component

Returns:

the are of the component

Return type:

(float)

otx.data.transform_libs.utils.centers_bboxes(boxes: Tensor) Tensor[source]#

Return a tensor representing the centers of boxes.

otx.data.transform_libs.utils.clip_bboxes(boxes: Tensor, img_shape: tuple[int, int]) Tensor[source]#

Clip boxes according to the image shape in-place.

Parameters:

img_shape (tuple[int, int]) – A tuple of image height and width.

Returns:

Clipped boxes.

Return type:

(Tensor)

otx.data.transform_libs.utils.corner2hbox(corners: Tensor) Tensor[source]#

Convert box coordinates from corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)) to (x1, y1, x2, y2).

Reference : open-mmlab/mmdetection

Parameters:

corners (Tensor) – Corner tensor with shape of (…, 4, 2).

Returns:

Horizontal box tensor with shape of (…, 4).

Return type:

Tensor

otx.data.transform_libs.utils.crop_masks(masks: ndarray, bbox: ndarray) ndarray[source]#

Crop each mask by the given bbox.

otx.data.transform_libs.utils.crop_polygons(polygons: list[Polygon], bbox: np.ndarray, height: int, width: int) list[Polygon][source]#

Crop each polygon by the given bbox.

otx.data.transform_libs.utils.flip_bboxes(boxes: Tensor, img_shape: tuple[int, int], direction: str = 'horizontal') Tensor[source]#

Flip boxes horizontally or vertically in-place.

Parameters:
  • boxes (Tensor) – Bounding boxes to be flipped.

  • img_shape (Tuple[int, int]) – A tuple of image height and width.

  • direction (str) – Flip direction, options are “horizontal”, “vertical” and “diagonal”. Defaults to “horizontal”

Returns:

Flipped bounding boxes.

Return type:

(Tensor)

otx.data.transform_libs.utils.flip_image(img: ndarray | list[ndarray], direction: str = 'horizontal') ndarray | list[ndarray][source]#

Flip an image horizontally or vertically.

Parameters:
  • img (ndarray) – Image to be flipped.

  • direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.

Returns:

The flipped image.

Return type:

ndarray

otx.data.transform_libs.utils.flip_masks(masks: ndarray, direction: str = 'horizontal') ndarray[source]#

Flip masks alone the given direction.

otx.data.transform_libs.utils.flip_polygons(polygons: list[Polygon], height: int, width: int, direction: str = 'horizontal') list[Polygon][source]#

Flip polygons alone the given direction.

otx.data.transform_libs.utils.fp16_clamp(x: Tensor, min: float | None = None, max: float | None = None) Tensor[source]#

Clamp fp16 tensor.

otx.data.transform_libs.utils.get_bboxes_from_masks(masks: Tensor) ndarray[source]#

Create boxes from masks.

otx.data.transform_libs.utils.get_bboxes_from_polygons(polygons: list[Polygon], height: int, width: int) np.ndarray[source]#

Create boxes from polygons.

otx.data.transform_libs.utils.get_image_shape(img: ndarray | Tensor | list) tuple[int, int][source]#

Get image(s) shape with (height, width).

otx.data.transform_libs.utils.hbox2corner(boxes: Tensor) Tensor[source]#

Convert box coordinates from (x1, y1, x2, y2) to corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)).

Reference : open-mmlab/mmdetection

Parameters:

boxes (Tensor) – Horizontal box tensor with shape of (…, 4).

Returns:

Corner tensor with shape of (…, 4, 2).

Return type:

Tensor

otx.data.transform_libs.utils.is_inside_bboxes(boxes: Tensor, img_shape: tuple[int, int], all_inside: bool = False, allowed_border: int = 0) BoolTensor[source]#

Find boxes inside the image.

Parameters:
  • boxes (Tensor) – Bounding boxes to be checked.

  • img_shape (tuple[int, int]) – A tuple of image height and width.

  • all_inside (bool) – Whether the boxes are all inside the image or part inside the image. Defaults to False.

  • allowed_border (int) – Boxes that extend beyond the image shape boundary by more than allowed_border are considered “outside” Defaults to 0.

Returns:

A BoolTensor indicating whether the box is inside

the image. Assuming the original boxes have shape (m, n, 4), the output has shape (m, n).

Return type:

(BoolTensor)

otx.data.transform_libs.utils.overlap_bboxes(bboxes1: Tensor, bboxes2: Tensor, mode: str = 'iou', is_aligned: bool = False, eps: float = 1e-06) Tensor[source]#

Calculate overlap between two set of bboxes.

FP16 Contributed by open-mmlab/mmdetection#4889 .. note:

Assume bboxes1 is M x 4, bboxes2 is N x 4, when mode is 'iou',
there are some new generated variable when calculating IOU
using overlap_bboxes function:

1) is_aligned is False
    area1: M x 1
    area2: N x 1
    lt: M x N x 2
    rb: M x N x 2
    wh: M x N x 2
    overlap: M x N x 1
    union: M x N x 1
    ious: M x N x 1

    Total memory:
        S = (9 x N x M + N + M) * 4 Byte,

    When using FP16, we can reduce:
        R = (9 x N x M + N + M) * 4 / 2 Byte
        R large than (N + M) * 4 * 2 is always true when N and M >= 1.
        Obviously, N + M <= N * M < 3 * N * M, when N >=2 and M >=2,
                   N + 1 < 3 * N, when N or M is 1.

    Given M = 40 (ground truth), N = 400000 (three anchor boxes
    in per grid, FPN, R-CNNs),
        R = 275 MB (one times)

    A special case (dense detection), M = 512 (ground truth),
        R = 3516 MB = 3.43 GB

    When the batch size is B, reduce:
        B x R

    Therefore, CUDA memory runs out frequently.

    Experiments on GeForce RTX 2080Ti (11019 MiB):

    |   dtype   |   M   |   N   |   Use    |   Real   |   Ideal   |
    |:----:|:----:|:----:|:----:|:----:|:----:|
    |   FP32   |   512 | 400000 | 8020 MiB |   --   |   --   |
    |   FP16   |   512 | 400000 |   4504 MiB | 3516 MiB | 3516 MiB |
    |   FP32   |   40 | 400000 |   1540 MiB |   --   |   --   |
    |   FP16   |   40 | 400000 |   1264 MiB |   276MiB   | 275 MiB |

2) is_aligned is True
    area1: N x 1
    area2: N x 1
    lt: N x 2
    rb: N x 2
    wh: N x 2
    overlap: N x 1
    union: N x 1
    ious: N x 1

    Total memory:
        S = 11 x N * 4 Byte

    When using FP16, we can reduce:
        R = 11 x N * 4 / 2 Byte

So do the 'giou' (large than 'iou').

Time-wise, FP16 is generally faster than FP32.

When gpu_assign_thr is not -1, it takes more time on cpu
but not reduce memory.
There, we can reduce half the memory and keep the speed.

If is_aligned is False, then calculate the overlaps between each bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned pair of bboxes1 and bboxes2.

Parameters:
  • bboxes1 (Tensor) – shape (B, m, 4) in <x1, y1, x2, y2> format or empty.

  • bboxes2 (Tensor) – shape (B, n, 4) in <x1, y1, x2, y2> format or empty. B indicates the batch dim, in shape (B1, B2, …, Bn). If is_aligned is True, then m and n must be equal.

  • mode (str) – “iou” (intersection over union), “iof” (intersection over foreground) or “giou” (generalized intersection over union). Default “iou”.

  • is_aligned (bool, optional) – If True, then m and n must be equal. Default False.

  • eps (float, optional) – A value added to the denominator for numerical stability. Default 1e-6.

Returns:

shape (m, n) if is_aligned is False else shape (m,)

Return type:

Tensor

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> overlaps = overlap_bboxes(bboxes1, bboxes2)
>>> assert overlaps.shape == (3, 3)
>>> overlaps = overlap_bboxes(bboxes1, bboxes2, is_aligned=True)
>>> assert overlaps.shape == (3, )

Example

>>> empty = torch.empty(0, 4)
>>> nonempty = torch.FloatTensor([[0, 0, 10, 9]])
>>> assert tuple(overlap_bboxes(empty, nonempty).shape) == (0, 1)
>>> assert tuple(overlap_bboxes(nonempty, empty).shape) == (1, 0)
>>> assert tuple(overlap_bboxes(empty, empty).shape) == (0, 0)
otx.data.transform_libs.utils.project_bboxes(boxes: Tensor, homography_matrix: Tensor | ndarray) Tensor[source]#

Geometric transformat boxes in-place.

Reference : open-mmlab/mmdetection

Parameters:

homography_matrix (Tensor or np.ndarray]) – Shape (3, 3) for geometric transformation.

Returns:

Projected bounding boxes.

Return type:

(Tensor | np.ndarray)

otx.data.transform_libs.utils.rescale_bboxes(boxes: Tensor, scale_factor: tuple[float, float]) Tensor[source]#

Rescale boxes w.r.t. rescale_factor in-place.

Note

Both rescale_ and resize_ will enlarge or shrink boxes w.r.t scale_facotr. The difference is that resize_ only changes the width and the height of boxes, but rescale_ also rescales the box centers simultaneously.

Parameters:
  • boxes (Tensor) – bounding boxes to be rescaled.

  • scale_factor (tuple[float, float]) – factors for scaling boxes with (height, width). It will be used after flipped. The length should be 2.

Returns:

rescaled bounding boxes.

Return type:

(Tensor)

otx.data.transform_libs.utils.rescale_keypoints(keypoints: Tensor, scale_factor: float | tuple[float, float]) Tensor[source]#

Rescale keypoints as large as possible while keeping the aspect ratio.

Parameters:
  • keypoints (Tensor) – Keypoints to be rescaled.

  • scale_factor (float | tuple[float, float]) – Scale factor to be applied to keypoints with (height, width) or single float value.

Returns:

The rescaled keypoints.

Return type:

(Tensor)

otx.data.transform_libs.utils.rescale_masks(masks: ndarray, scale_factor: float | tuple[float, float], interpolation: str = 'nearest') ndarray[source]#

Rescale masks as large as possible while keeping the aspect ratio.

Parameters:
  • masks (np.ndarray) – Masks to be rescaled.

  • scale_factor (float | tuple[float, float]) – Scale factor to be applied to masks with (height, width).

  • interpolation (str) – Interpolation mode. Defaults to nearest.

Returns:

The rescaled masks.

Return type:

(np.ndarray)

otx.data.transform_libs.utils.rescale_polygons(polygons: list[Polygon], scale_factor: float | tuple[float, float]) list[Polygon][source]#

Rescale polygons as large as possible while keeping the aspect ratio.

Parameters:
  • polygons (np.ndarray) – Polygons to be rescaled.

  • scale_factor (float | tuple[float, float]) – Scale factor to be applied to polygons with (height, width) or single float value.

Returns:

The rescaled polygons.

Return type:

(np.ndarray)

otx.data.transform_libs.utils.rescale_size(old_size: tuple, scale: float | int | tuple[float, float] | tuple[int, int], return_scale: bool = False) tuple[int, int] | tuple[tuple[int, int], float | int][source]#

Calculate the new size to be rescaled to.

Parameters:
  • old_size (tuple[int]) – The old size (height, width) of image.

  • scale (float | int | tuple[float] | tuple[int]) – The scaling factor or maximum size. If it is a float number, an integer, or a tuple of 2 float numbers, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.

  • return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.

Returns:

The new rescaled image size with (height, width).

If return_scale is True, scale_factor obtained again will be returned as well.

Return type:

tuple[int]

otx.data.transform_libs.utils.scale_size(size: tuple[int, int], scale: float | int | tuple[float, float] | tuple[int, int]) tuple[int, int][source]#

Rescale a size by a ratio.

Parameters:
Returns:

scaled size with (height, width).

Return type:

tuple[int]

otx.data.transform_libs.utils.to_np_image(img: ndarray | Tensor | list) ndarray | list[ndarray][source]#

Convert torch.Tensor 3D image to numpy 3D image.

TODO (sungchul): move it into base data entity?

otx.data.transform_libs.utils.translate_bboxes(boxes: Tensor, distances: Sequence[float]) Tensor[source]#

Translate boxes in-place.

Parameters:
  • boxes (Tensor) – Bounding boxes to be translated.

  • distances (Sequence[float]) – Translate distances. The first is horizontal distance and the second is vertical distance.

Returns:

Translated bounding boxes.

Return type:

(Tensor)

otx.data.transform_libs.utils.translate_masks(masks: ndarray, out_shape: tuple[int, int], offset: int | float, direction: str = 'horizontal', border_value: int | tuple[int] = 0, interpolation: str = 'bilinear') ndarray[source]#

Translate the masks.

Parameters:
  • masks (np.ndarray) – Masks to be translated.

  • out_shape (tuple[int]) – Shape for output mask, format (h, w).

  • offset (int | float) – The offset for translate.

  • direction (str) – The translate direction, either “horizontal” or “vertical”.

  • border_value (int | tuple[int]) – Border value. Default 0 for masks.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

Returns:

Translated BitmapMasks.

Return type:

(np.ndarray)

otx.data.transform_libs.utils.translate_polygons(polygons: list[Polygon], out_shape: tuple[int, int], offset: int | float, direction: str = 'horizontal', border_value: int | float = 0) list[Polygon][source]#

Translate polygons.