LAB 3.2 - Low level tensorflow

LAB 3.2 - Low level `tensorflow`#

The labs require a tensorflow version lower than the default one used in Google Colab. Run the following cell to downgrade TensorFlow accordingly.

import os
def downgrade_tf_version():
    os.system("!yes | pip uninstall -y tensorflow")
    os.system("!yes | pip install tensorflow==2.12.0")
    os.kill(os.getpid(), 9)
downgrade_tf_version()

!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
import init; init.init(force_download=False); 

from local.lib.rlxmoocapi import submit, session
import inspect
session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id="L03.02", varname="student");

import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, BatchNormalization

tf.__version__

TASK 1: Obtain layer output#

COMPLETE the following function so that, when given a TF model and input X returns the output of the model at layer layer_name when feeding X to the model.

You MUST RETURN a numpy array, NOT a tensor.

HINT: Use the tf.keras.Model class like in the functional API with outputs from the desired layer.

CHALLENGE: Solve it with a single line of code (not counting the import)

def output_at_layer(X, model, layer_name):
    from tensorflow.keras.models import Model
    r = ... # YOUR CODE HERE
    return r

check your answer manually. With the following model and weights below you should get this answer:

    >> output_at_layer(X, model, "layer_A")
    array([[0.91274303, 0.69886017, 0.8832942 ],
           [0.9311633 , 0.7634138 , 0.8924969 ],
           [0.85661894, 0.5696809 , 0.8091405 ],
           [0.8952345 , 0.6803274 , 0.8326857 ]], dtype=float32)

    >> output_at_layer(X, model, "layer_B")
    array([[0.87063193, 0.8240411 ],
           [0.8774254 , 0.83376545],
           [0.84875023, 0.7963983 ],
           [0.86286545, 0.81590414]], dtype=float32)

    >> output_at_layer(X, model, "layer_C")
    array([[0.8959839 , 0.65980244],
           [0.9032545 , 0.66435313],
           [0.8733646 , 0.646801  ],
           [0.8883195 , 0.6559416 ]], dtype=float32)

def get_model(n1,n2,n3):
    model = Sequential()
    model.add(Dense(n1, name="layer_A", activation="tanh", input_dim=2))
    model.add(Dense(n2, name="layer_B", activation="sigmoid"))
    model.add(Dense(n3, name="layer_C", activation="linear"))
    return model


w = [np.array([[0.3336241 , 0.26024526, 0.37238857],
               [0.6344426 , 0.67660165, 0.31070882]], dtype=np.float32),
     np.array([0.97280777, 0.3447949 , 0.91722184], dtype=np.float32),
     
     np.array([[0.12999585, 0.31851983],
               [0.7763866 , 0.8777575 ],
               [0.99977154, 0.65771514]], dtype=np.float32),
     np.array([0.36222705, 0.05885772], dtype=np.float32),

     np.array([[0.75918376, 0.02541249],
               [0.21730722, 0.45021895]], dtype=np.float32),
     np.array([0.05594416, 0.26667854], dtype=np.float32)]


X = np.array([[0.9269997 , 0.41239464],
              [0.8461177 , 0.64935404],
              [0.27092433, 0.34251866],
              [0.22509325, 0.6301328 ]], dtype=np.float32)


model=get_model(3,2,2)
model.set_weights(w)

output_at_layer(X, model, "layer_A")

# which corresponds to a tanh activation from the input data
np.tanh(X.dot(w[0])+w[1])

output_at_layer(X, model, "layer_B")

# which corresponds to a sigmoid activation from the output of layer A
sigm = lambda x: 1/(1+np.exp(-x))
sigm(output_at_layer(X, model, "layer_A").dot(w[2])+w[3])

output_at_layer(X, model, "layer_C")

# which corresponds to a linear activation from the output of layer B
output_at_layer(X, model, "layer_B").dot(w[-2])+w[-1]

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T1');

TASK 2: Implement batch normalization#

Observe how we create a ONE LAYER model with TANH activation and batch normalization

def get_model(input_dim, n):
    model = Sequential()
    model.add(Dense(n, name="layer_A", activation="tanh", input_dim=input_dim))
    model.add(BatchNormalization())
    return model    

we manually initialize it with random weights and apply it to some random input

input_dim = np.random.randint(3)+2
n = np.random.randint(5)+5
X = np.random.random((6,input_dim)).astype(np.float32)
print ("input_dim", input_dim, ", n", n)

input_dim 4 , n 5

model = get_model(input_dim=input_dim, n=n)
model.set_weights([np.random.random(i.shape) for i in model.get_weights()])
model(X).numpy()

array([[1.080421  , 0.9186753 , 1.4783407 , 0.5158316 , 0.7339326 ],
       [1.047442  , 0.8679704 , 1.4229709 , 0.51811504, 0.6608341 ],
       [1.0745426 , 0.90660375, 1.4581381 , 0.5141438 , 0.73286605],
       [1.0549638 , 0.88741153, 1.3028177 , 0.48024043, 0.66753477],
       [1.0836991 , 0.90194285, 1.5070624 , 0.5117415 , 0.7579496 ],
       [1.0492579 , 0.88168395, 1.3504746 , 0.5018039 , 0.6871215 ]],
      dtype=float32)

and we can extract the weights of the dense layer and the batch normalization layer

W, b, gamma, beta, moving_mean, moving_var = model.get_weights()
W.shape, b.shape, beta.shape, moving_mean.shape, moving_var.shape

((4, 5), (5,), (5,), (5,), (5,))

COMPLETE the following function WITHOUT USING TENSORFLOW such that you get the same output as the model above, given the input and the weights.

In specific, the Dense layer output must be

\[A = \text{tanh}(XW+b)\]

And the batch normalization layer after that output is

\[\frac{A-m_\mu}{\sqrt{m_\sigma+\varepsilon}}\gamma + \beta\]

You MUST RETURN a numpy array, NOT a tensor.

CHALLENGE: Solve it with one single line of Python code.

def apply_model(X, W, b, gamma, beta, moving_mean, moving_var, epsilon=1e-3):
    r = ... # YOUR CODE HERE
    return r

check manually your code, the output should be the same as above.

apply_model(X, W, b, gamma, beta, moving_mean, moving_var)

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T2');

TASK 3: Compute the Hessian#

Complete the function below so that computes the Hessian of a function with respect to a set of variables. Remember that the Hessian is the matrix with all combinations of double partial derivatives. See https://en.wikipedia.org/wiki/Hessian_matrix

The arguments for your code below:

expression_fn: is a Python function that, when executed, will return a Tensor depending on the variables in svars.
svars: a list of \(n\) tf.Variables against which the derivatives are to be taken.

The result:

a numpy array of dimension \(n\times n\), containing in each position the value of the corresponding double derivative evaluated with the value attached to each variable in svars.

See the example call below to understand what you have to produce.

NOTE: Observe that expression_fn is a function that you must call inside some GradientTape to obtain the expresion. This needs to be done this way because GradientTape needs to watch how expressions are built to be able to access the computational graph and compute the gradient. This is a technique which is very often used in Tensorflow.

WARN: You cannot use tf.hessian or GradientTape.jacobian or sympy. Do not use the name hessian to name any variable within your code.

HINT 1: use a GradientTape inside another GradientTape.

HINT 2: use unconnected_gradients=tf.UnconnectedGradients.ZERO as argument to GradientTape.gradient to have the variables not participating in an expresion result in gradient zero. For instance with \(f=xy\), we have \(\frac{\partial f}{\partial y\partial y}=0\), since \(\frac{\partial f}{\partial y}=1\). Or if we have \(f=x\) we have \(\frac{\partial f}{\partial y\partial x}=0\), since \(\frac{\partial f}{\partial y}=0\). If you do not include this argument, Tensorflow will return these values as None and you would have to fix them later.

def get_double_derivatives(expression_fn,svars):
    import tensorflow as tf
    
    result = ... 
    
    return result

check your code. The following expression

\[f = 2xy^2 + 3x\cos{y}\]

has as first derivatives:

\(\frac{\partial f }{\partial x} = 2y^2 +3\cos{y}\)
\(\frac{\partial f }{\partial y} = 4xy - 3x\sin{y}\)

and as second derivatives:

\(\frac{\partial f }{\partial x \partial x} = 0\)
\(\frac{\partial f }{\partial x \partial y} = 4y - 3\sin{y}\)
\(\frac{\partial f }{\partial y \partial x} = 4y - 3\sin{y}\)
\(\frac{\partial f }{\partial y \partial y} = 4x - 3x\cos{y}\)

which, when evaluated at \(x=2\) and \(y=-3\) yields

[[  0     ,  -11.58],
 [ -11.58 ,   13.94]]

x = tf.Variable(2, dtype=tf.float32)
y = tf.Variable(-3, dtype=tf.float32)
expr = lambda: 2*x*y**2 + 3*x*tf.cos(y)

get_double_derivatives(expr, [x,y])

Registra tu solución en linea

student.submit_task(namespace=globals(), task_id='T3');

LAB 3.2 - Low level tensorflow

Contents

LAB 3.2 - Low level tensorflow#

TASK 1: Obtain layer output#

TASK 2: Implement batch normalization#

TASK 3: Compute the Hessian#

LAB 3.2 - Low level `tensorflow`#