clip_bbox package¶
clip_bbox.clipbbox module¶
- clip_bbox.clipbbox.get_square_int_factors(n)¶
Finds the 2 largest integer factors that multiply together to produce n.
- Parameters
n (int) – The target integer.
- Returns
- A tuple containing:
val (int): An integer factor of n.
val2 (int): Another integer factor of n.
- Return type
tuple
- clip_bbox.clipbbox.img_fts_to_heatmap(img_fts, txt_fts)¶
Computes the similarity heatmap between a pair of image and text embeddings from CLIP.
- Parameters
img_fts (numpy array) – Image embedding from CLIP.
txt_fts (numpy array) – Text embedding from CLIP.
- Returns
Similarity heatmap between the image and text embeddings.
- Return type
numpy array
- clip_bbox.clipbbox.run_clip_bbox(img_path, caption, out_path)¶
Draws bounding boxes on an input image.
- Parameters
img_path (str) – path to input image.
caption (str) – caption of input image.
out_path (str) – path to output image displaying bounding boxes.
- Returns
None.
clip_bbox.bbox_utils module¶
- clip_bbox.bbox_utils.heat2bbox(heat_map, original_image_shape)¶
Calculate bounding boxes on a 2D heatmap.
- Parameters
heat_map (numpy array) – 2D heatmap of values.
original_image_shape (tuple[int]) – original image’s dimensions represented as (height, width).
- Returns
- List of bounding boxes, where each bounding
box is a tuple of coordinates (min_x, min_y, max_x, max_y).
- Return type
List[tuple[int]]
- clip_bbox.bbox_utils.img_heat_bbox_disp(image, heat_map, save_path, title='', alpha=0.6, cmap='viridis', bboxes=[])¶
Draw bounding boxes on image and overlay the corresponding heatmap. Saves result to save_path.
- Parameters
image (numpy array) – Image stored in RGB format.
heat_map (numpy array) – 2D heatmap of values.
save_path (str) – Save path for generated figure.
title (str) – Title on generated figure.
alpha (float) – Transparency of heatmap.
cmap (str) – Matplotlib color map parameter applied to heatmap.
dot_max (bool) – Show or hide dots where the heatmap reaches max values.
bboxes (List[List[int]]) – List of bounding boxes.
order (str) – The order of coordinates used for bboxes.
- Returns
Matplotlib figure.
clip_bbox.clip_model_setup module¶
- clip_bbox.clip_model_setup.get_clip_model(device, model_name='RN50', input_res=(720, 1280))¶
Downloads pre-trained CLIP model and adjusts model to accept desired input resolution.
- Parameters
device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)
model_name (str) – The name of the desired CLIP model to download. The options are “RN50”, “RN101”, “RN50x4”, or “ViT-B/32”.
input_res (tuple[int]) – Input resolution represented as (height, width).
- Returns
Modified Torch model accepting the desired input resolution.
clip_bbox.preprocess module¶
- clip_bbox.preprocess.preprocess_imgs(img_path_list, device, input_resolution=None)¶
Preprocess list of images for CLIP.
- Parameters
img_path_list (List[str]) – List of image paths to preprocess.
device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)
input_resolution (tuple[int]) – Input resolution represented as (height, width). If not specified, the default value will be the first input image’s original dimensions.
- Returns
- A tuple containing:
images (List[Torch tensor]): List of images.
image input (Torch tensor): Array of images preprocessed for input to CLIP.
- Return type
tuple
- clip_bbox.preprocess.preprocess_texts(caption_list, context_length, device)¶
Preprocess list of texts for CLIP.
- Parameters
caption_list (List[str]) – List of captions.
context_length (int) – CLIP model parameter.
device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)
- Returns
Array of preprocessed texts.
- Return type
Torch tensor