LAB 05.02 - Model evaluation

LAB 05.02 - Model evaluation#

!wget --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/ai4eng.v1/main/content/init.py
import init; init.init(force_download=False); init.get_weblink()

from local.lib.rlxmoocapi import submit, session
session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id="L05.02", varname="student");

Task 1: Partition randomly `numpy` arrays#

observe we can select specific rows and/or columns on a numpy array

import numpy as np

x = np.random.randint(100, size=(20,5))
x[:,0] = range(len(x))
x[0,:] = range(x.shape[1])
x

ridxs = np.r_[2,4,5]
x[ridxs]

cidxs = np.r_[1,3]
x[:,cidxs]

x[ridxs][:, cidxs]

and the dimensions of the array are accessible through len and shape

len(x), x.shape

observe also how we can partition it

x[:3]

x[3:]

we can do the same thing with vectors

v = np.arange(100,120)
v

v[:5], v[5:]

finally, observe how we can create a random permutation of a specific vector

np.random.permutation(v)

or the first natural numbers

p = np.random.permutation(20)
p

how do you interpret this?

v[p[5:]]

x[p[:5]]

assignment#

in this task you will have to complete the function split_data below so that:

it accepts two arguments X and y, either of which can be any numpy array (1D, 2D, etc.) of the same size \(n\) (observe the assert statement), and a pct
creates a random permutation of the natural number from \(0\) to \(n-1\)
partitions the permutations so that the first partition contains the first n1_elements \(=\) int(n * pct) numbers, and the second partition the rest
interpret the permutation partitions components as indexes to X and y so that they are partitioned into X1, X2 and y1, y2 respectively

note that indexes to array must be of type int. do the following to convert a float to int

a,b = 10,.3
c = a*b
print (c)
c = int(c)
print(c)

def split_data(X, y, pct):
    
    assert len(X)==len(y), "X and y must have the same length"
    assert pct>0 and pct<1, "pct must be in the (0,1) iterval"
    
    permutation = 
    n1_elements = 
    permutation_partition_1 =
    permutation_partition_2 = 
    X1 = 
    X2 = 
    y1 = 
    y2 = 
    return X1, X2, y1, y2

check your solution manually with the following code

XX = np.random.randint(100, size=(20,8))
yy = np.arange(100,100+len(XX))
XX[:,0] = range(len(XX))
XX[0,:] = range(XX.shape[1])
print (XX)
print (yy)

Xtr, Xts, ytr, yts = split_data(XX, yy, pct=.7)

# check partition ok
np.sum(XX), np.sum(Xtr) + np.sum(Xts), np.sum(yy), np.sum(ytr)+np.sum(yts)

print (Xtr, "\n--")
print (Xts, "\n--")
print (ytr, "\n--")
print (yts, "\n--")

Xts

submit your code

student.submit_task(globals(), task_id="task_01");

Task 2: Fit a model and make predictions#

observe how we create new data from synthetic datasets available in sklearn

from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
from local.lib import mlutils
%matplotlib inline

X, y = make_moons(200, noise=0.2)
X.shape, y.shape

mlutils.plot_2Ddata(X,y); plt.grid();

observe also how we create an algorithm instance and fit a model

from sklearn.svm import SVC
estimator = SVC(gamma=1)
estimator.fit(X,y)
mlutils.plot_2Ddata_with_boundary(estimator.predict, X, y)

and how we make predictions

preds = estimator.predict(X)
print (preds.shape)
preds

in this task you have to complete the following function so that:

it makes two non-random partitions of X and y. One containing the first half of the data and one containing the second part. If the number of elements of X is odd, then the second half will contain one more element than the first half.
it fits the model with the first part of the data
it makes predictions on the second half of the data
returns the estimator fitted, and the predictions on the second half of the data.

def fit_and_predict(estimator, X, y):
    assert len(X)==len(y), "X and y must have the same length"
    
    predictions = ...
    
    return estimator, predictions

check your code. your predictions should be similar to

preds
>> array([0, 0, 0, 0, 1, 0, 1, 1, 1, 0])

X = np.array([[ 0.74799424, -0.5867667 ],
       [-0.64457753,  1.25127894],
       [ 0.53682593,  0.10931563],
       [-0.88825294, -0.06987509],
       [ 0.99612638, -0.52295157],
       [ 1.20586692,  0.01930477],
       [-0.19368482,  0.65121567],
       [ 0.1973759 ,  0.82250723],
       [ 0.94859234, -0.5457241 ],
       [ 1.87967948, -0.22740261],
       [ 0.58766146,  0.3982837 ],
       [ 0.27731571,  1.14369568],
       [-0.67421956,  0.12785382],
       [ 0.56957459,  1.05330376],
       [ 1.52435938, -0.29864338],
       [-0.15973608,  0.21790711],
       [ 1.59037406, -0.56875485],
       [ 0.43257507, -0.48900315],
       [ 1.09440413, -0.73789029],
       [-0.32940869,  0.74671384]])
y = np.array([1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0])
X.shape, y.shape

from sklearn.linear_model import LogisticRegression
estimator = LogisticRegression()
estimator, preds = fit_and_predict(estimator, X, y)
preds

submit your code

student.submit_task(globals(), task_id="task_02");

Task 3: Select data with indices#

Observe how we can create a vector or matrix of True/False (boolean) by applying a condition to any matrix or vector

import numpy as np
y = np.random.randint(10, size=15)
print (y)

y_less_than_5 = y<5
print (y_less_than_5)

and how we can select elements of a vector using a boolean vector of the same length

y[y_less_than_5]

y[y<5]

python doesn’t really care how you construct the vector of booleans to index any other vector or array

v = np.random.randint(20, size=15)
v

v[y<5]

in this task you will complete the function select_per_class such that:

receives an array of data X and a vector of labels y, of the same length
the labels y are binary, they can only have values 0 or 1
makes two partitions of X, one corresponding to the places where y is 0, and another where y is 1
returns the two partitions

For instance, for the following X and y

X = np.array([[8, 8, 5, 2, 0, 0],
              [4, 4, 8, 1, 3, 7],
              [4, 5, 3, 6, 9, 6],
              [0, 3, 5, 3, 5, 3],
              [0, 7, 2, 7, 1, 7],
              [5, 7, 7, 1, 8, 5],
              [2, 5, 7, 3, 8, 0],
              [7, 2, 5, 9, 8, 7],
              [1, 6, 6, 1, 6, 0],
              [0, 7, 6, 5, 3, 4]])

y = np.array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1])

your function must return the following two matrices:

[[8 8 5 2 0 0]
 [4 4 8 1 3 7]
 [4 5 3 6 9 6]
 [0 3 5 3 5 3]
 [2 5 7 3 8 0]
 [7 2 5 9 8 7]]
 
[[0 7 2 7 1 7]
 [5 7 7 1 8 5]
 [1 6 6 1 6 0]
 [0 7 6 5 3 4]]

def select_per_class(X, y):
    X1 = 
    X2 = 
    return X1, X2

check manually your code

X = np.array([[8, 8, 5, 2, 0, 0],
              [4, 4, 8, 1, 3, 7],
              [4, 5, 3, 6, 9, 6],
              [0, 3, 5, 3, 5, 3],
              [0, 7, 2, 7, 1, 7],
              [5, 7, 7, 1, 8, 5],
              [2, 5, 7, 3, 8, 0],
              [7, 2, 5, 9, 8, 7],
              [1, 6, 6, 1, 6, 0],
              [0, 7, 6, 5, 3, 4]])

y = np.array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1])

a,b = select_per_class(X, y)
print (a)
print (b)

submit your code

student.submit_task(globals(), task_id="task_03");

Task 4: Measure accuracy#

complete the following function such that:

it receives to binary vectors (composed of 0’s and 1’s) of the same length
returns the percentage of elements that are the same in both vectors

recall that

if a and b are vectors of the same length a==b returns a vector of booleans in which positions in True signal that elements in those position are the same
if k is a vector of booleans, sum(k) returns the number of True elements.

for the following two vectors you should get 0.375

a = np.array([1,0,0,0,1,1,0,0])
b = np.array([1,1,1,1,0,1,0,1])
accuracy(a, b)
>>> 0.375

def accuracy(y_true, y_pred):
    result = 
    return result

a = np.array([1,0,0,0,1,1,0,0])
b = np.array([1,1,1,1,0,1,0,1])
accuracy(a,b)

submit your code

student.submit_task(globals(), task_id="task_04");

Task 5: Random split, fit and predict#

complete the following function so that:

fits the estimator with a random sample of size train_pct of the data X and binary labels y. You can use the split_data function developed previously
makes predictions on the test part of the data
measures accuracy of those predictions. you may use the function created previously
returns the estimator fitted, the test part of X and y, and the accuracy measured

the execution below should return something with the following structure (the actual numbers will change)

(LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='warn', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False), array([[-0.76329684,  0.2572069 ],
        [ 1.02356829,  0.37629873],
        [ 0.32099415,  0.82244488],
        [ 1.08858315, -0.61299904],
        [ 0.58470767,  0.58510559],
        [ 1.60827644, -0.15477173],
        [ 1.53121784,  0.78121504],
        [-0.42734156,  0.87585237],
        [-0.36368682,  0.72152586],
        [ 1.05312619,  0.19835526]]), array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]), 0.6)

def split_fit_predict(estimator, X, y, train_pct):
    
    def split_data(X, y, pct):
        # your code here
    
    def accuracy(y_true, y_pred):
        # your code here

    Xtr, Xts, ytr, yts = ...
    ... fit the estimator ....
    preds_ts = ... obtain predictions ... 
    return estimator, Xts, yts, accuracy(yts, preds_ts)
        
        

from sklearn.linear_model import LogisticRegression

X, y = make_moons(100, noise=0.2)
estimator = LogisticRegression(solver="lbfgs")
split_fit_predict(estimator, X, y, train_pct=0.9)

submit your code

student.submit_task(globals(), task_id="task_05");

LAB 05.02 - Model evaluation

Contents

LAB 05.02 - Model evaluation#

Task 1: Partition randomly numpy arrays#

assignment#

Task 2: Fit a model and make predictions#

Task 3: Select data with indices#

Task 4: Measure accuracy#

Task 5: Random split, fit and predict#

Task 1: Partition randomly `numpy` arrays#