clip_bbox package

clip_bbox.clipbbox module

clip_bbox.clipbbox.get_square_int_factors(n)
clip_bbox.clipbbox.img_fts_to_heatmap(img_fts, txt_fts)

Computes the similarity heatmap between a pair of image and text embeddings from CLIP.

Parameters
  • img_fts (numpy array) – Image embedding from CLIP.

  • txt_fts (numpy array) – Text embedding from CLIP.

Returns

Similarity heatmap between the image and text embeddings.

Return type

numpy array

clip_bbox.clipbbox.run_clip_bbox(img_path, caption, out_path)

Draws bounding boxes on an input image.

Parameters
  • img_path (str) – path to input image.

  • caption (str) – caption of input image.

  • out_path (str) – path to output image displaying bounding boxes.

Returns

None.

clip_bbox.bbox_utils module

clip_bbox.bbox_utils.heat2bbox(heat_map, original_image_shape)

Calculate bounding boxes on a 2D heatmap.

Parameters
  • heat_map (numpy array) – 2D heatmap of values.

  • original_image_shape (tuple[int]) – original image’s dimensions represented as (height, width).

Returns

List of bounding boxes, where each bounding

box is a tuple of coordinates (min_x, min_y, max_x, max_y).

Return type

List[tuple[int]]

clip_bbox.bbox_utils.img_heat_bbox_disp(image, heat_map, save_path, title='', alpha=0.6, cmap='viridis', bboxes=[])

Draw bounding boxes on image and overlay the corresponding heatmap. Saves result to save_path.

Parameters
  • image (numpy array) – Image stored in RGB format.

  • heat_map (numpy array) – 2D heatmap of values.

  • save_path (str) – Save path for generated figure.

  • title (str) – Title on generated figure.

  • alpha (float) – Transparency of heatmap.

  • cmap (str) – Matplotlib color map parameter applied to heatmap.

  • dot_max (bool) – Show or hide dots where the heatmap reaches max values.

  • bboxes (List[List[int]]) – List of bounding boxes.

  • order (str) – The order of coordinates used for bboxes.

Returns

Matplotlib figure.

clip_bbox.clip_model_setup module

clip_bbox.clip_model_setup.get_clip_model(device, model_name='RN50', input_res=(720, 1280))

Downloads pre-trained CLIP model and adjusts model to accept desired input resolution.

Parameters
  • device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)

  • model_name (str) – The name of the desired CLIP model to download. The options are “RN50”, “RN101”, “RN50x4”, or “ViT-B/32”.

  • input_res (tuple[int]) – Input resolution represented as (height, width).

Returns

Modified Torch model accepting the desired input resolution.

clip_bbox.preprocess module

clip_bbox.preprocess.preprocess_imgs(img_path_list, device, input_resolution=None)

Preprocess list of images for CLIP.

Parameters
  • img_path_list (List[str]) – List of image paths to preprocess.

  • device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)

  • input_resolution (tuple[int]) – Input resolution represented as (height, width). If not specified, the default value will be the first input image’s original dimensions.

Returns

List of images. Torch tensor: Array of images preprocessed as a Torch tensor.

Return type

List[Torch tensor]

clip_bbox.preprocess.preprocess_texts(caption_list, context_length, device)

Preprocess list of texts for CLIP.

Parameters
  • caption_list (List[str]) – List of captions.

  • context_length (int) – CLIP model parameter.

  • device (Torch.device) – Device Torch should use to run CLIP. For example, the device can be torch.device(“cpu”) or torch.device(“cuda”)

Returns

Array of preprocessed texts.

Return type

Torch tensor