otx.data.transform_libs.utils#
Utils for data transform functions.
Functions
|
Compute the area of a component of a polygon. |
|
Return a tensor representing the centers of boxes. |
|
Clip boxes according to the image shape in-place. |
|
Convert box coordinates from corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)) to (x1, y1, x2, y2). |
|
Crop each mask by the given bbox. |
|
Crop each polygon by the given bbox. |
|
Flip boxes horizontally or vertically in-place. |
|
Flip an image horizontally or vertically. |
|
Flip masks alone the given direction. |
|
Flip polygons alone the given direction. |
|
Clamp fp16 tensor. |
|
Create boxes from masks. |
|
Create boxes from polygons. |
|
Get image(s) shape with (height, width). |
|
Convert box coordinates from (x1, y1, x2, y2) to corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)). |
|
Find boxes inside the image. |
|
Calculate overlap between two set of bboxes. |
|
Geometric transformat boxes in-place. |
|
Rescale boxes w.r.t. |
|
Rescale keypoints as large as possible while keeping the aspect ratio. |
|
Rescale masks as large as possible while keeping the aspect ratio. |
|
Rescale polygons as large as possible while keeping the aspect ratio. |
|
Calculate the new size to be rescaled to. |
|
Rescale a size by a ratio. |
|
Convert torch.Tensor 3D image to numpy 3D image. |
|
Translate boxes in-place. |
|
Translate the masks. |
|
Translate polygons. |
Classes
|
Decorator that marks the method with random return value(s) in a transform class. |
- class otx.data.transform_libs.utils.cache_randomness(func)[source]#
Bases:
object
Decorator that marks the method with random return value(s) in a transform class.
Reference : open-mmlab/mmcv
This decorator is usually used together with the context-manager :func`:cache_random_params`. In this context, a decorated method will cache its return value(s) at the first time of being invoked, and always return the cached values when being invoked again.
Note
Only an instance method can be decorated with
cache_randomness
.
- otx.data.transform_libs.utils.area_polygon(x: ndarray, y: ndarray) ndarray [source]#
Compute the area of a component of a polygon.
Using the shoelace formula: https://stackoverflow.com/questions/24467972/calculate-area-of-polygon-given-x-y-coordinates
- Parameters:
x (ndarray) – x coordinates of the component
y (ndarray) – y coordinates of the component
- Returns:
the are of the component
- Return type:
(float)
- otx.data.transform_libs.utils.centers_bboxes(boxes: Tensor) Tensor [source]#
Return a tensor representing the centers of boxes.
- otx.data.transform_libs.utils.clip_bboxes(boxes: Tensor, img_shape: tuple[int, int]) Tensor [source]#
Clip boxes according to the image shape in-place.
- otx.data.transform_libs.utils.corner2hbox(corners: Tensor) Tensor [source]#
Convert box coordinates from corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)) to (x1, y1, x2, y2).
Reference : open-mmlab/mmdetection
- Parameters:
corners (Tensor) – Corner tensor with shape of (…, 4, 2).
- Returns:
Horizontal box tensor with shape of (…, 4).
- Return type:
Tensor
- otx.data.transform_libs.utils.crop_masks(masks: ndarray, bbox: ndarray) ndarray [source]#
Crop each mask by the given bbox.
- otx.data.transform_libs.utils.crop_polygons(polygons: list[Polygon], bbox: np.ndarray, height: int, width: int) list[Polygon] [source]#
Crop each polygon by the given bbox.
- otx.data.transform_libs.utils.flip_bboxes(boxes: Tensor, img_shape: tuple[int, int], direction: str = 'horizontal') Tensor [source]#
Flip boxes horizontally or vertically in-place.
- otx.data.transform_libs.utils.flip_image(img: ndarray | list[ndarray], direction: str = 'horizontal') ndarray | list[ndarray] [source]#
Flip an image horizontally or vertically.
- Parameters:
img (ndarray) – Image to be flipped.
direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.
- Returns:
The flipped image.
- Return type:
ndarray
- otx.data.transform_libs.utils.flip_masks(masks: ndarray, direction: str = 'horizontal') ndarray [source]#
Flip masks alone the given direction.
- otx.data.transform_libs.utils.flip_polygons(polygons: list[Polygon], height: int, width: int, direction: str = 'horizontal') list[Polygon] [source]#
Flip polygons alone the given direction.
- otx.data.transform_libs.utils.fp16_clamp(x: Tensor, min: float | None = None, max: float | None = None) Tensor [source]#
Clamp fp16 tensor.
- otx.data.transform_libs.utils.get_bboxes_from_masks(masks: Tensor) ndarray [source]#
Create boxes from masks.
- otx.data.transform_libs.utils.get_bboxes_from_polygons(polygons: list[Polygon], height: int, width: int) np.ndarray [source]#
Create boxes from polygons.
- otx.data.transform_libs.utils.get_image_shape(img: ndarray | Tensor | list) tuple[int, int] [source]#
Get image(s) shape with (height, width).
- otx.data.transform_libs.utils.hbox2corner(boxes: Tensor) Tensor [source]#
Convert box coordinates from (x1, y1, x2, y2) to corners ((x1, y1), (x2, y1), (x1, y2), (x2, y2)).
Reference : open-mmlab/mmdetection
- Parameters:
boxes (Tensor) – Horizontal box tensor with shape of (…, 4).
- Returns:
Corner tensor with shape of (…, 4, 2).
- Return type:
Tensor
- otx.data.transform_libs.utils.is_inside_bboxes(boxes: Tensor, img_shape: tuple[int, int], all_inside: bool = False, allowed_border: int = 0) BoolTensor [source]#
Find boxes inside the image.
- Parameters:
boxes (Tensor) – Bounding boxes to be checked.
img_shape (tuple[int, int]) – A tuple of image height and width.
all_inside (bool) – Whether the boxes are all inside the image or part inside the image. Defaults to False.
allowed_border (int) – Boxes that extend beyond the image shape boundary by more than
allowed_border
are considered “outside” Defaults to 0.
- Returns:
- A BoolTensor indicating whether the box is inside
the image. Assuming the original boxes have shape (m, n, 4), the output has shape (m, n).
- Return type:
(BoolTensor)
- otx.data.transform_libs.utils.overlap_bboxes(bboxes1: Tensor, bboxes2: Tensor, mode: str = 'iou', is_aligned: bool = False, eps: float = 1e-06) Tensor [source]#
Calculate overlap between two set of bboxes.
FP16 Contributed by open-mmlab/mmdetection#4889 .. note:
Assume bboxes1 is M x 4, bboxes2 is N x 4, when mode is 'iou', there are some new generated variable when calculating IOU using overlap_bboxes function: 1) is_aligned is False area1: M x 1 area2: N x 1 lt: M x N x 2 rb: M x N x 2 wh: M x N x 2 overlap: M x N x 1 union: M x N x 1 ious: M x N x 1 Total memory: S = (9 x N x M + N + M) * 4 Byte, When using FP16, we can reduce: R = (9 x N x M + N + M) * 4 / 2 Byte R large than (N + M) * 4 * 2 is always true when N and M >= 1. Obviously, N + M <= N * M < 3 * N * M, when N >=2 and M >=2, N + 1 < 3 * N, when N or M is 1. Given M = 40 (ground truth), N = 400000 (three anchor boxes in per grid, FPN, R-CNNs), R = 275 MB (one times) A special case (dense detection), M = 512 (ground truth), R = 3516 MB = 3.43 GB When the batch size is B, reduce: B x R Therefore, CUDA memory runs out frequently. Experiments on GeForce RTX 2080Ti (11019 MiB): | dtype | M | N | Use | Real | Ideal | |:----:|:----:|:----:|:----:|:----:|:----:| | FP32 | 512 | 400000 | 8020 MiB | -- | -- | | FP16 | 512 | 400000 | 4504 MiB | 3516 MiB | 3516 MiB | | FP32 | 40 | 400000 | 1540 MiB | -- | -- | | FP16 | 40 | 400000 | 1264 MiB | 276MiB | 275 MiB | 2) is_aligned is True area1: N x 1 area2: N x 1 lt: N x 2 rb: N x 2 wh: N x 2 overlap: N x 1 union: N x 1 ious: N x 1 Total memory: S = 11 x N * 4 Byte When using FP16, we can reduce: R = 11 x N * 4 / 2 Byte So do the 'giou' (large than 'iou'). Time-wise, FP16 is generally faster than FP32. When gpu_assign_thr is not -1, it takes more time on cpu but not reduce memory. There, we can reduce half the memory and keep the speed.
If
is_aligned
isFalse
, then calculate the overlaps between each bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned pair of bboxes1 and bboxes2.- Parameters:
bboxes1 (Tensor) – shape (B, m, 4) in <x1, y1, x2, y2> format or empty.
bboxes2 (Tensor) – shape (B, n, 4) in <x1, y1, x2, y2> format or empty. B indicates the batch dim, in shape (B1, B2, …, Bn). If
is_aligned
isTrue
, then m and n must be equal.mode (str) – “iou” (intersection over union), “iof” (intersection over foreground) or “giou” (generalized intersection over union). Default “iou”.
is_aligned (bool, optional) – If True, then m and n must be equal. Default False.
eps (float, optional) – A value added to the denominator for numerical stability. Default 1e-6.
- Returns:
shape (m, n) if
is_aligned
is False else shape (m,)- Return type:
Tensor
Example
>>> bboxes1 = torch.FloatTensor([ >>> [0, 0, 10, 10], >>> [10, 10, 20, 20], >>> [32, 32, 38, 42], >>> ]) >>> bboxes2 = torch.FloatTensor([ >>> [0, 0, 10, 20], >>> [0, 10, 10, 19], >>> [10, 10, 20, 20], >>> ]) >>> overlaps = overlap_bboxes(bboxes1, bboxes2) >>> assert overlaps.shape == (3, 3) >>> overlaps = overlap_bboxes(bboxes1, bboxes2, is_aligned=True) >>> assert overlaps.shape == (3, )
Example
>>> empty = torch.empty(0, 4) >>> nonempty = torch.FloatTensor([[0, 0, 10, 9]]) >>> assert tuple(overlap_bboxes(empty, nonempty).shape) == (0, 1) >>> assert tuple(overlap_bboxes(nonempty, empty).shape) == (1, 0) >>> assert tuple(overlap_bboxes(empty, empty).shape) == (0, 0)
- otx.data.transform_libs.utils.project_bboxes(boxes: Tensor, homography_matrix: Tensor | ndarray) Tensor [source]#
Geometric transformat boxes in-place.
Reference : open-mmlab/mmdetection
- Parameters:
homography_matrix (Tensor or np.ndarray]) – Shape (3, 3) for geometric transformation.
- Returns:
Projected bounding boxes.
- Return type:
(Tensor | np.ndarray)
- otx.data.transform_libs.utils.rescale_bboxes(boxes: Tensor, scale_factor: tuple[float, float]) Tensor [source]#
Rescale boxes w.r.t. rescale_factor in-place.
Note
Both
rescale_
andresize_
will enlarge or shrink boxes w.r.tscale_facotr
. The difference is thatresize_
only changes the width and the height of boxes, butrescale_
also rescales the box centers simultaneously.
- otx.data.transform_libs.utils.rescale_keypoints(keypoints: Tensor, scale_factor: float | tuple[float, float]) Tensor [source]#
Rescale keypoints as large as possible while keeping the aspect ratio.
- otx.data.transform_libs.utils.rescale_masks(masks: ndarray, scale_factor: float | tuple[float, float], interpolation: str = 'nearest') ndarray [source]#
Rescale masks as large as possible while keeping the aspect ratio.
- otx.data.transform_libs.utils.rescale_polygons(polygons: list[Polygon], scale_factor: float | tuple[float, float]) list[Polygon] [source]#
Rescale polygons as large as possible while keeping the aspect ratio.
- otx.data.transform_libs.utils.rescale_size(old_size: tuple, scale: float | int | tuple[float, float] | tuple[int, int], return_scale: bool = False) tuple[int, int] | tuple[tuple[int, int], float | int] [source]#
Calculate the new size to be rescaled to.
- Parameters:
old_size (tuple[int]) – The old size (height, width) of image.
scale (float | int | tuple[float] | tuple[int]) – The scaling factor or maximum size. If it is a float number, an integer, or a tuple of 2 float numbers, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.
- Returns:
- The new rescaled image size with (height, width).
If return_scale is True, scale_factor obtained again will be returned as well.
- Return type:
- otx.data.transform_libs.utils.scale_size(size: tuple[int, int], scale: float | int | tuple[float, float] | tuple[int, int]) tuple[int, int] [source]#
Rescale a size by a ratio.
- otx.data.transform_libs.utils.to_np_image(img: ndarray | Tensor | list) ndarray | list[ndarray] [source]#
Convert torch.Tensor 3D image to numpy 3D image.
TODO (sungchul): move it into base data entity?
- otx.data.transform_libs.utils.translate_bboxes(boxes: Tensor, distances: Sequence[float]) Tensor [source]#
Translate boxes in-place.
- Parameters:
boxes (Tensor) – Bounding boxes to be translated.
distances (Sequence[float]) – Translate distances. The first is horizontal distance and the second is vertical distance.
- Returns:
Translated bounding boxes.
- Return type:
(Tensor)
- otx.data.transform_libs.utils.translate_masks(masks: ndarray, out_shape: tuple[int, int], offset: int | float, direction: str = 'horizontal', border_value: int | tuple[int] = 0, interpolation: str = 'bilinear') ndarray [source]#
Translate the masks.
- Parameters:
masks (np.ndarray) – Masks to be translated.
out_shape (tuple[int]) – Shape for output mask, format (h, w).
direction (str) – The translate direction, either “horizontal” or “vertical”.
border_value (int | tuple[int]) – Border value. Default 0 for masks.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.
- Returns:
Translated BitmapMasks.
- Return type:
(np.ndarray)