clip_bbox package¶

clip_bbox.clipbbox module¶

clip_bbox.clipbbox.get_square_int_factors(n)¶

clip_bbox.clipbbox.img_fts_to_heatmap(img_fts, txt_fts)¶

Computes the similarity heatmap between a pair of image and text embeddings from CLIP.

Parameters

img_fts (numpy array) – Image embedding from CLIP.
txt_fts (numpy array) – Text embedding from CLIP.

Returns

Similarity heatmap between the image and text embeddings.

Return type

numpy array

clip_bbox.clipbbox.run_clip_bbox(img_path, caption, out_path)¶

Draws bounding boxes on an input image.

Parameters

img_path (str) – path to input image.
caption (str) – caption of input image.
out_path (str) – path to output image displaying bounding boxes.

Returns

None.

clip_bbox.bbox_utils module¶

clip_bbox.bbox_utils.heat2bbox(heat_map, original_image_shape)¶

Calculate bounding boxes on a 2D heatmap.

Parameters

heat_map (numpy array) – 2D heatmap of values.
original_image_shape (tuple[int]) – original image’s dimensions represented as (height, width).

Returns

List of bounding boxes, where each bounding: box is a tuple of coordinates (min_x, min_y, max_x, max_y).

Return type

List[tuple[int]]

clip_bbox.bbox_utils.img_heat_bbox_disp(image, heat_map, save_path, title='', alpha=0.6, cmap='viridis', bboxes=[])¶

Draw bounding boxes on image and overlay the corresponding heatmap. Saves result to save_path.

Parameters

image (numpy array) – Image stored in RGB format.
heat_map (numpy array) – 2D heatmap of values.
save_path (str) – Save path for generated figure.
title (str) – Title on generated figure.
alpha (float) – Transparency of heatmap.
cmap (str) – Matplotlib color map parameter applied to heatmap.
dot_max (bool) – Show or hide dots where the heatmap reaches max values.
bboxes (List[List[int]]) – List of bounding boxes.
order (str) – The order of coordinates used for bboxes.

Returns

Matplotlib figure.

clip_bbox.clip_model_setup module¶

clip_bbox.clip_model_setup.get_clip_model(device, model_name='RN50', input_res=(720, 1280))¶

Downloads pre-trained CLIP model and adjusts model to accept desired input resolution.

Parameters

device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)
model_name (str) – The name of the desired CLIP model to download. The options are “RN50”, “RN101”, “RN50x4”, or “ViT-B/32”.
input_res (tuple[int]) – Input resolution represented as (height, width).

Returns

Modified Torch model accepting the desired input resolution.

clip_bbox.preprocess module¶

clip_bbox.preprocess.preprocess_imgs(img_path_list, device, input_resolution=None)¶

Preprocess list of images for CLIP.

Parameters

img_path_list (List[str]) – List of image paths to preprocess.
device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)
input_resolution (tuple[int]) – Input resolution represented as (height, width). If not specified, the default value will be the first input image’s original dimensions.

Returns

List of images. Torch tensor: Array of images preprocessed as a Torch tensor.

Return type

List[Torch tensor]

clip_bbox.preprocess.preprocess_texts(caption_list, context_length, device)¶

Preprocess list of texts for CLIP.

Parameters

caption_list (List[str]) – List of captions.
context_length (int) – CLIP model parameter.
device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)

Returns

Array of preprocessed texts.

Return type

Torch tensor