LAB 3.2 - Low level tensorflow
Contents
LAB 3.2 - Low level tensorflow
¶
!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
import init; init.init(force_download=False);
from local.lib.rlxmoocapi import submit, session
import inspect
session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id="L03.02", varname="student");
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
tf.__version__
TASK 1: Obtain layer output¶
COMPLETE the following function so that, when given a TF model
and input X
returns the output of the model at layer layer_name
when feeding X
to the model.
You MUST RETURN a numpy
array, NOT a tensor.
HINT: Use the tf.keras.Model
class like in the functional API with outputs
from the desired layer.
CHALLENGE: Solve it with a single line of code (not counting the import
)
def output_at_layer(X, model, layer_name):
from tensorflow.keras.models import Model
r = ... # YOUR CODE HERE
return r
check your answer manually. With the following model and weights below you should get this answer:
>> output_at_layer(X, model, "layer_A")
array([[0.91274303, 0.69886017, 0.8832942 ],
[0.9311633 , 0.7634138 , 0.8924969 ],
[0.85661894, 0.5696809 , 0.8091405 ],
[0.8952345 , 0.6803274 , 0.8326857 ]], dtype=float32)
>> output_at_layer(X, model, "layer_B")
array([[0.87063193, 0.8240411 ],
[0.8774254 , 0.83376545],
[0.84875023, 0.7963983 ],
[0.86286545, 0.81590414]], dtype=float32)
>> output_at_layer(X, model, "layer_C")
array([[0.8959839 , 0.65980244],
[0.9032545 , 0.66435313],
[0.8733646 , 0.646801 ],
[0.8883195 , 0.6559416 ]], dtype=float32)
def get_model(n1,n2,n3):
model = Sequential()
model.add(Dense(n1, name="layer_A", activation="tanh", input_dim=2))
model.add(Dense(n2, name="layer_B", activation="sigmoid"))
model.add(Dense(n3, name="layer_C", activation="linear"))
return model
w = [np.array([[0.3336241 , 0.26024526, 0.37238857],
[0.6344426 , 0.67660165, 0.31070882]], dtype=np.float32),
np.array([0.97280777, 0.3447949 , 0.91722184], dtype=np.float32),
np.array([[0.12999585, 0.31851983],
[0.7763866 , 0.8777575 ],
[0.99977154, 0.65771514]], dtype=np.float32),
np.array([0.36222705, 0.05885772], dtype=np.float32),
np.array([[0.75918376, 0.02541249],
[0.21730722, 0.45021895]], dtype=np.float32),
np.array([0.05594416, 0.26667854], dtype=np.float32)]
X = np.array([[0.9269997 , 0.41239464],
[0.8461177 , 0.64935404],
[0.27092433, 0.34251866],
[0.22509325, 0.6301328 ]], dtype=np.float32)
model=get_model(3,2,2)
model.set_weights(w)
output_at_layer(X, model, "layer_A")
# which corresponds to a tanh activation from the input data
np.tanh(X.dot(w[0])+w[1])
output_at_layer(X, model, "layer_B")
# which corresponds to a sigmoid activation from the output of layer A
sigm = lambda x: 1/(1+np.exp(-x))
sigm(output_at_layer(X, model, "layer_A").dot(w[2])+w[3])
output_at_layer(X, model, "layer_C")
# which corresponds to a linear activation from the output of layer B
output_at_layer(X, model, "layer_B").dot(w[-2])+w[-1]
Registra tu solución en linea
student.submit_task(namespace=globals(), task_id='T1');
TASK 2: Implement batch normalization¶
Observe how we create a ONE LAYER model with TANH activation and batch normalization
def get_model(input_dim, n):
model = Sequential()
model.add(Dense(n, name="layer_A", activation="tanh", input_dim=input_dim))
model.add(BatchNormalization())
return model
we manually initialize it with random weights and apply it to some random input
input_dim = np.random.randint(3)+2
n = np.random.randint(5)+5
X = np.random.random((6,input_dim)).astype(np.float32)
print ("input_dim", input_dim, ", n", n)
input_dim 4 , n 5
model = get_model(input_dim=input_dim, n=n)
model.set_weights([np.random.random(i.shape) for i in model.get_weights()])
model(X).numpy()
array([[1.080421 , 0.9186753 , 1.4783407 , 0.5158316 , 0.7339326 ],
[1.047442 , 0.8679704 , 1.4229709 , 0.51811504, 0.6608341 ],
[1.0745426 , 0.90660375, 1.4581381 , 0.5141438 , 0.73286605],
[1.0549638 , 0.88741153, 1.3028177 , 0.48024043, 0.66753477],
[1.0836991 , 0.90194285, 1.5070624 , 0.5117415 , 0.7579496 ],
[1.0492579 , 0.88168395, 1.3504746 , 0.5018039 , 0.6871215 ]],
dtype=float32)
and we can extract the weights of the dense layer and the batch normalization layer
W, b, gamma, beta, moving_mean, moving_var = model.get_weights()
W.shape, b.shape, beta.shape, moving_mean.shape, moving_var.shape
((4, 5), (5,), (5,), (5,), (5,))
COMPLETE the following function WITHOUT USING TENSORFLOW such that you get the same output as the model above, given the input and the weights.
In specific, the Dense layer output must be
And the batch normalization layer after that output is
You MUST RETURN a numpy
array, NOT a tensor.
CHALLENGE: Solve it with one single line of Python code.
def apply_model(X, W, b, gamma, beta, moving_mean, moving_var, epsilon=1e-3):
r = ... # YOUR CODE HERE
return r
check manually your code, the output should be the same as above.
apply_model(X, W, b, gamma, beta, moving_mean, moving_var)
Registra tu solución en linea
student.submit_task(namespace=globals(), task_id='T2');
TASK 3: Compute the Hessian¶
Complete the function below so that computes the Hessian of a function with respect to a set of variables. Remember that the Hessian is the matrix with all combinations of double partial derivatives. See https://en.wikipedia.org/wiki/Hessian_matrix
The arguments for your code below:
expression_fn
: is a Python function that, when executed, will return a Tensor depending on the variables insvars
.svars
: a list of \(n\)tf.Variable
s against which the derivatives are to be taken.
The result:
a
numpy
array of dimension \(n\times n\), containing in each position the value of the corresponding double derivative evaluated with the value attached to each variable insvars
.
See the example call below to understand what you have to produce.
NOTE: Observe that expression_fn
is a function that you must call inside some GradientTape
to obtain the expresion. This needs to be done this way because GradientTape
needs to watch how expressions are built to be able to access the computational graph and compute the gradient. This is a technique which is very often used in Tensorflow.
WARN: You cannot use tf.hessian
or GradientTape.jacobian
or sympy
. Do not use the name hessian
to name any variable within your code.
HINT 1: use a GradientTape
inside another GradientTape
.
HINT 2: use unconnected_gradients=tf.UnconnectedGradients.ZERO
as argument to GradientTape.gradient
to have the variables not participating in an expresion result in gradient zero. For instance with \(f=xy\), we have \(\frac{\partial f}{\partial y\partial y}=0\), since \(\frac{\partial f}{\partial y}=1\). Or if we have \(f=x\) we have \(\frac{\partial f}{\partial y\partial x}=0\), since \(\frac{\partial f}{\partial y}=0\). If you do not include this argument, Tensorflow will return these values as None
and you would have to fix them later.
def get_double_derivatives(expression_fn,svars):
import tensorflow as tf
result = ...
return result
check your code. The following expression
has as first derivatives:
\(\frac{\partial f }{\partial x} = 2y^2 +3\cos{y}\)
\(\frac{\partial f }{\partial y} = 4xy - 3x\sin{y}\)
and as second derivatives:
\(\frac{\partial f }{\partial x \partial x} = 0\)
\(\frac{\partial f }{\partial x \partial y} = 4y - 3\sin{y}\)
\(\frac{\partial f }{\partial y \partial x} = 4y - 3\sin{y}\)
\(\frac{\partial f }{\partial y \partial y} = 4x - 3x\cos{y}\)
which, when evaluated at \(x=2\) and \(y=-3\) yields
[[ 0 , -11.58],
[ -11.58 , 13.94]]
x = tf.Variable(2, dtype=tf.float32)
y = tf.Variable(-3, dtype=tf.float32)
expr = lambda: 2*x*y**2 + 3*x*tf.cos(y)
get_double_derivatives(expr, [x,y])
Registra tu solución en linea
student.submit_task(namespace=globals(), task_id='T3');