LAB 4.3 - Object detection

LAB 4.3 - Object detection#

The labs require a tensorflow version lower than the default one used in Google Colab. Run the following cell to downgrade TensorFlow accordingly.

import os
def downgrade_tf_version():
    os.system("!yes | pip uninstall -y tensorflow")
    os.system("!yes | pip install tensorflow==2.12.0")
    os.kill(os.getpid(), 9)
downgrade_tf_version()

!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
import init; init.init(force_download=False); 

from local.lib.rlxmoocapi import submit, session
session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id="L04.03", varname="student");

import pandas as pd
import matplotlib.pyplot as plt
from skimage import io
import numpy as np
%matplotlib inline
from IPython.display import Image

TASK 1: Create anchor boxes#

observe how we download and extract the first 20K bounding-box annotations from the Open Images V6 dataset, out of the total 14M available. We are interested in the width and height of each box, which we must obtain by substracting the box coordinates, according to the description here. Recall that the coordinates are \(\in [0,1]\) as they are relative to the image size

!wget -nc https://storage.googleapis.com/openimages/v6/oidv6-train-annotations-bbox.csv

!wc oidv6-train-annotations-bbox.csv
!head -5 oidv6-train-annotations-bbox.csv

The X numpy array now contains the width and height of bounding boxes

!head -20001 oidv6-train-annotations-bbox.csv > oidv6-train-annotations-bbox-20k.csv
d = pd.read_csv('oidv6-train-annotations-bbox-20k.csv')
w = (d.XMax-d.XMin).values
h = (d.YMax-d.YMin).values
X = np.r_[[w,h]].T
X[:6]

we can paint a sample of them

from matplotlib.patches import Rectangle
plt.figure(figsize=(5,5)); 
ax = plt.subplot(111)

for w,h in np.random.permutation(X)[:25]:
    ax.add_patch(Rectangle((0.5-w/2,0.5-h/2),w,h, linewidth=2,edgecolor='r',facecolor='none'))

../_images/c0defbe64bec87d3c66510850c5e456c2358268b5bc521e171cd9e3057396998.png

Complete the following function such that it creates n anchor boxes from the bounding boxes in X using sklearn.cluster.KMeans with n_clusters set to the number of anchor boxes desired. After fitting KMeans, return cluster centers. Use the random_state passed as argument in KMeans.

def get_anchor_boxes(X, n, random_state=0):
    from sklearn.cluster import KMeans
    # YOUR CODE HERE
    km = ... 
    return ...

you can visualize your anchor boxes. You should get something similar to this

Image("local/imgs/anchor_boxes.png", width=300)

../_images/b4e9838565d56ac755798ac7299982f53174b642535525b02e3e563d1f85ce05.png

from matplotlib.patches import Rectangle
anchors = get_anchor_boxes(X, n=10, random_state=0)
plt.figure(figsize=(5,5)); 
ax = plt.subplot(111)

for w,h in anchors:
    ax.add_patch(Rectangle((0.5-w/2,0.5-h/2),w,h, linewidth=2,edgecolor='r',facecolor='none'))

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T1');

TASK 2: Get closest anchor#

assume we have these anchor boxes, sorted by size

kc = np.array([0.03612632, 0.05025544, 0.0982887 , 0.1392435 , 0.11913009,
       0.28577818, 0.32945173, 0.23846835, 0.18874274, 0.48914381,
       0.25347843, 0.77500826, 0.45506799, 0.51589807, 0.83168319,
       0.39802428, 0.5539543 , 0.86824085, 0.93553054, 0.89561131]).reshape(10,2)
kc = kc[np.argsort(np.product(kc, axis=1))]
anchors = pd.DataFrame(kc, columns=['w', 'h'])
anchors

Complete the following function so that, given a bounding box XMin, XMax, YMin, YMax and a dataframe with anchors such as kc above returns the index of the most similar anchor to the bouding box.

Recall that all values \(\in [0,1]\)

Given two boxes (bounding box and anchor) with widths and heights \(w_0, h_0\) and \(w_1, h_1\), we define their similarity measure as:

\[|w_0-w_1| + |h_0-h_1|\]

Your return value must be an integer between 0 and 9

def get_closest_anchor_box(XMin, XMax, YMin, YMax, anchors):
  anchor_index = ...
  return anchor_index

you can visualize random annotations and their corresponding anchor boxes according to your function below. Your should see things like this one

Image("local/imgs/annotations-anchors.png", width=600)

../_images/f61cb42cd0ea61fcbe8b5cbcb9870e2e894db6f263a19bb429f22373c8c755eb.png

plt.figure(figsize=(12,4)); 
for i in range(3):
    b = d.iloc[np.random.randint(len(d))]
    bx, by = b.XMin + (b.XMax-b.XMin)/2, b.YMin + (b.YMax-b.YMin)/2
    bw, bh = b.XMax - b.XMin, b.YMax - b.YMin
    pw, ph = anchors.values[get_closest_anchor_box(b.XMin, b.XMax, b.YMin, b.YMax, anchors)]

    ax = plt.subplot(1,3,i+1)
    ax.add_patch(Rectangle((bx-bw/2,by-bh/2),bw,bh, linewidth=2,edgecolor='r',facecolor='none', label="annotation"))
    ax.add_patch(Rectangle((bx-pw/2,by-ph/2),pw,ph, linewidth=2,edgecolor='b',facecolor='none', label="anchor box"))
    plt.scatter(bx, by, color="black", label="object center")
    plt.grid(); plt.legend()
    plt.xlim(0,1); plt.ylim(0,1);

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T2');

TASK 3: Compute desired model bounding box predictions#

Understand the schema of YOLO coordinates below

\(b_w\), \(b_h\) are the width and height of the annotation we want the model to predict
\(b_x\), \(b_y\) are \(x\) and \(y\) coordinate of the annotation we want the model to predict
\(p_w\), \(p_h\) are the width and height of its closest anchor box
\(c_x\), \(c_y\) are the \(x\) and \(y\) coordinates of the TOP LEFT corner of the image cell responsible for detecting the annotation

Image("local/imgs/yolo_predictions.png", width=400)

../_images/6c5339b6bc49d1ebef004a0c5aa4bc14c8573c065e48a5aaabad520315df8eb0.png

Complete the following function such that, when given \(b_x\), \(b_y\), \(b_w\), \(b_h\), \(p_w\), \(p_h\), \(n_w\), \(n_h\) returns:

\(n_x \in \{0,1,..,n_x-1\}\), \(n_y \in \{0,1,..,n_y-1\}\): the cell number in which the annotation center (\(b_x\), \(b_y\)) falls in.
\(t_x\), \(t_y\), \(t_w\), \(t_h\): the desired model predictions according to the figure above

\(n_w\) and \(n_h\) specify the grid size in terms of number of cells wide and number of cells high.

def get_model_target_predictions(bx, by, bw, bh, pw, ph, nw, nh):
    # assume all x,y,w,h are in the [0,1] range, and nw, nh > 2
    
    nx = ...
    ny = ...
    tx = ...
    ty = ...
    tw = ...
    th = ...

    return nx, ny, tx, ty, tw, th

check your code. For the following values, you should get

\(n_x\), \(n_y\) = 1, 4
\(t_x\), \(t_y\) = -2.63, -3.89
\(t_w\), \(t_h\) = -0.92, 0.14

Make sure the values make sense (why are they positive or negative)

nw, nh = 7, 5
bx, by = 0.21, 0.82
bw, bh = 0.02, 0.15
pw, ph = 0.05, 0.13

get_model_target_predictions(bx, by, bw, bh, pw, ph, nw, nh)

try with other cases generated randomly

nw, nh = np.random.randint(6, size=2)+5
bx, by = np.round(np.random.random(size=2)*.4+.3,3)
bw, bh = np.round(np.random.random(size=2)*.25,3)
pw, ph = np.round(np.r_[bw, bh] * (1+np.random.random(size=2)*0.3-0.15),3)

print ("inputs", nw, nh, bx, by, bw, bh, pw, ph)
get_model_target_predictions(bx, by, bw, bh, pw, ph, nw, nh)

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T3');

LAB 4.3 - Object detection

Contents

LAB 4.3 - Object detection#

TASK 1: Create anchor boxes#

TASK 2: Get closest anchor#

TASK 3: Compute desired model bounding box predictions#