LAB 05.02 - Model evaluation
Contents
LAB 05.02 - Model evaluation¶
!wget --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/ai4eng.v1/main/content/init.py
import init; init.init(force_download=False); init.get_weblink()
from local.lib.rlxmoocapi import submit, session
session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id="L05.02", varname="student");
Task 1: Partition randomly numpy
arrays¶
observe we can select specific rows and/or columns on a numpy array
import numpy as np
x = np.random.randint(100, size=(20,5))
x[:,0] = range(len(x))
x[0,:] = range(x.shape[1])
x
ridxs = np.r_[2,4,5]
x[ridxs]
cidxs = np.r_[1,3]
x[:,cidxs]
x[ridxs][:, cidxs]
and the dimensions of the array are accessible through len
and shape
len(x), x.shape
observe also how we can partition it
x[:3]
x[3:]
we can do the same thing with vectors
v = np.arange(100,120)
v
v[:5], v[5:]
finally, observe how we can create a random permutation of a specific vector
np.random.permutation(v)
or the first natural numbers
p = np.random.permutation(20)
p
how do you interpret this?
v[p[5:]]
x[p[:5]]
assignment¶
in this task you will have to complete the function split_data
below so that:
it accepts two arguments
X
andy
, either of which can be any numpy array (1D, 2D, etc.) of the same size \(n\) (observe theassert
statement), and apct
creates a random permutation of the natural number from \(0\) to \(n-1\)
partitions the permutations so that the first partition contains the first
n1_elements
\(=\)int(n * pct)
numbers, and the second partition the restinterpret the permutation partitions components as indexes to
X
andy
so that they are partitioned intoX1
,X2
andy1
,y2
respectively
note that indexes to array must be of type int
. do the following to convert a float to int
a,b = 10,.3
c = a*b
print (c)
c = int(c)
print(c)
def split_data(X, y, pct):
assert len(X)==len(y), "X and y must have the same length"
assert pct>0 and pct<1, "pct must be in the (0,1) iterval"
permutation =
n1_elements =
permutation_partition_1 =
permutation_partition_2 =
X1 =
X2 =
y1 =
y2 =
return X1, X2, y1, y2
check your solution manually with the following code
XX = np.random.randint(100, size=(20,8))
yy = np.arange(100,100+len(XX))
XX[:,0] = range(len(XX))
XX[0,:] = range(XX.shape[1])
print (XX)
print (yy)
Xtr, Xts, ytr, yts = split_data(XX, yy, pct=.7)
# check partition ok
np.sum(XX), np.sum(Xtr) + np.sum(Xts), np.sum(yy), np.sum(ytr)+np.sum(yts)
print (Xtr, "\n--")
print (Xts, "\n--")
print (ytr, "\n--")
print (yts, "\n--")
Xts
submit your code
student.submit_task(globals(), task_id="task_01");
Task 2: Fit a model and make predictions¶
observe how we create new data from synthetic datasets available in sklearn
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
from local.lib import mlutils
%matplotlib inline
X, y = make_moons(200, noise=0.2)
X.shape, y.shape
mlutils.plot_2Ddata(X,y); plt.grid();
observe also how we create an algorithm instance and fit a model
from sklearn.svm import SVC
estimator = SVC(gamma=1)
estimator.fit(X,y)
mlutils.plot_2Ddata_with_boundary(estimator.predict, X, y)
and how we make predictions
preds = estimator.predict(X)
print (preds.shape)
preds
in this task you have to complete the following function so that:
it makes two non-random partitions of
X
andy
. One containing the first half of the data and one containing the second part. If the number of elements ofX
is odd, then the second half will contain one more element than the first half.it fits the model with the first part of the data
it makes predictions on the second half of the data
returns the estimator fitted, and the predictions on the second half of the data.
def fit_and_predict(estimator, X, y):
assert len(X)==len(y), "X and y must have the same length"
predictions = ...
return estimator, predictions
check your code. your predictions should be similar to
preds
>> array([0, 0, 0, 0, 1, 0, 1, 1, 1, 0])
X = np.array([[ 0.74799424, -0.5867667 ],
[-0.64457753, 1.25127894],
[ 0.53682593, 0.10931563],
[-0.88825294, -0.06987509],
[ 0.99612638, -0.52295157],
[ 1.20586692, 0.01930477],
[-0.19368482, 0.65121567],
[ 0.1973759 , 0.82250723],
[ 0.94859234, -0.5457241 ],
[ 1.87967948, -0.22740261],
[ 0.58766146, 0.3982837 ],
[ 0.27731571, 1.14369568],
[-0.67421956, 0.12785382],
[ 0.56957459, 1.05330376],
[ 1.52435938, -0.29864338],
[-0.15973608, 0.21790711],
[ 1.59037406, -0.56875485],
[ 0.43257507, -0.48900315],
[ 1.09440413, -0.73789029],
[-0.32940869, 0.74671384]])
y = np.array([1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0])
X.shape, y.shape
from sklearn.linear_model import LogisticRegression
estimator = LogisticRegression()
estimator, preds = fit_and_predict(estimator, X, y)
preds
submit your code
student.submit_task(globals(), task_id="task_02");
Task 3: Select data with indices¶
Observe how we can create a vector or matrix of True/False
(boolean) by applying a condition to any matrix or vector
import numpy as np
y = np.random.randint(10, size=15)
print (y)
y_less_than_5 = y<5
print (y_less_than_5)
and how we can select elements of a vector using a boolean vector of the same length
y[y_less_than_5]
y[y<5]
python doesn’t really care how you construct the vector of booleans to index any other vector or array
v = np.random.randint(20, size=15)
v
v[y<5]
in this task you will complete the function select_per_class
such that:
receives an array of data
X
and a vector of labelsy
, of the same lengththe labels
y
are binary, they can only have values0
or1
makes two partitions of
X
, one corresponding to the places where y is 0, and another where y is 1returns the two partitions
For instance, for the following X and y
X = np.array([[8, 8, 5, 2, 0, 0],
[4, 4, 8, 1, 3, 7],
[4, 5, 3, 6, 9, 6],
[0, 3, 5, 3, 5, 3],
[0, 7, 2, 7, 1, 7],
[5, 7, 7, 1, 8, 5],
[2, 5, 7, 3, 8, 0],
[7, 2, 5, 9, 8, 7],
[1, 6, 6, 1, 6, 0],
[0, 7, 6, 5, 3, 4]])
y = np.array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1])
your function must return the following two matrices:
[[8 8 5 2 0 0]
[4 4 8 1 3 7]
[4 5 3 6 9 6]
[0 3 5 3 5 3]
[2 5 7 3 8 0]
[7 2 5 9 8 7]]
[[0 7 2 7 1 7]
[5 7 7 1 8 5]
[1 6 6 1 6 0]
[0 7 6 5 3 4]]
def select_per_class(X, y):
X1 =
X2 =
return X1, X2
check manually your code
X = np.array([[8, 8, 5, 2, 0, 0],
[4, 4, 8, 1, 3, 7],
[4, 5, 3, 6, 9, 6],
[0, 3, 5, 3, 5, 3],
[0, 7, 2, 7, 1, 7],
[5, 7, 7, 1, 8, 5],
[2, 5, 7, 3, 8, 0],
[7, 2, 5, 9, 8, 7],
[1, 6, 6, 1, 6, 0],
[0, 7, 6, 5, 3, 4]])
y = np.array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1])
a,b = select_per_class(X, y)
print (a)
print (b)
submit your code
student.submit_task(globals(), task_id="task_03");
Task 4: Measure accuracy¶
complete the following function such that:
it receives to binary vectors (composed of 0’s and 1’s) of the same length
returns the percentage of elements that are the same in both vectors
recall that
if
a
andb
are vectors of the same length a==b returns a vector of booleans in which positions in True signal that elements in those position are the sameif
k
is a vector of booleans,sum(k)
returns the number ofTrue
elements.
for the following two vectors you should get 0.375
a = np.array([1,0,0,0,1,1,0,0])
b = np.array([1,1,1,1,0,1,0,1])
accuracy(a, b)
>>> 0.375
def accuracy(y_true, y_pred):
result =
return result
a = np.array([1,0,0,0,1,1,0,0])
b = np.array([1,1,1,1,0,1,0,1])
accuracy(a,b)
submit your code
student.submit_task(globals(), task_id="task_04");
Task 5: Random split, fit and predict¶
complete the following function so that:
fits the estimator with a random sample of size
train_pct
of the dataX
and binary labelsy
. You can use thesplit_data
function developed previouslymakes predictions on the test part of the data
measures accuracy of those predictions. you may use the function created previously
returns the estimator fitted, the test part of
X
andy
, and the accuracy measured
the execution below should return something with the following structure (the actual numbers will change)
(LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False), array([[-0.76329684, 0.2572069 ],
[ 1.02356829, 0.37629873],
[ 0.32099415, 0.82244488],
[ 1.08858315, -0.61299904],
[ 0.58470767, 0.58510559],
[ 1.60827644, -0.15477173],
[ 1.53121784, 0.78121504],
[-0.42734156, 0.87585237],
[-0.36368682, 0.72152586],
[ 1.05312619, 0.19835526]]), array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]), 0.6)
def split_fit_predict(estimator, X, y, train_pct):
def split_data(X, y, pct):
# your code here
def accuracy(y_true, y_pred):
# your code here
Xtr, Xts, ytr, yts = ...
... fit the estimator ....
preds_ts = ... obtain predictions ...
return estimator, Xts, yts, accuracy(yts, preds_ts)
from sklearn.linear_model import LogisticRegression
X, y = make_moons(100, noise=0.2)
estimator = LogisticRegression(solver="lbfgs")
split_fit_predict(estimator, X, y, train_pct=0.9)
submit your code
student.submit_task(globals(), task_id="task_05");