07.04 - TENSORFLOW¶

!wget --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/20201.xai4eng/master/content/init.py
import init; init.init(force_download=False); init.get_weblink()

endpoint https://m5knaekxo6.execute-api.us-west-2.amazonaws.com/dev-v0001/rlxmooc

See my courses and progress

import numpy as np
import matplotlib.pyplot as plt
from local.lib import mlutils
from IPython.display import Image

try:
    %tensorflow_version 2.x
    print ("Using TF2 in Google Colab")
except:
    pass

import tensorflow as tf
%matplotlib inline

A dataset (again)¶

## KEEPOUTPUT
from sklearn.datasets import make_moons
X,y = make_moons(300, noise=.15)
plt.scatter(X[:,0][y==0], X[:,1][y==0], color="blue", label="class 0", alpha=.5)
plt.scatter(X[:,0][y==1], X[:,1][y==1], color="red", label="class 1", alpha=.5)

<matplotlib.collections.PathCollection at 0x7f34b2988f70>

../_images/NOTES 07.04 - NEURAL NETWORKS WITH TENSORFLOW_4_1.png

A neural network in tensorflow, 4 neurons in the hidden layer, 1 output¶

model = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation='tanh'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=.5),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),metrics=['accuracy'])

## KEEPOUTPUT
model.fit(X,y, epochs=10, batch_size=1)

Epoch 1/10
300/300 [==============================] - 0s 480us/step - loss: 0.4007 - accuracy: 0.8267
Epoch 2/10
300/300 [==============================] - 0s 478us/step - loss: 0.3605 - accuracy: 0.8467
Epoch 3/10
300/300 [==============================] - 0s 469us/step - loss: 0.2444 - accuracy: 0.8900
Epoch 4/10
300/300 [==============================] - 0s 473us/step - loss: 0.1709 - accuracy: 0.9467
Epoch 5/10
300/300 [==============================] - 0s 470us/step - loss: 0.1837 - accuracy: 0.9467
Epoch 6/10
300/300 [==============================] - 0s 455us/step - loss: 0.1485 - accuracy: 0.9600
Epoch 7/10
300/300 [==============================] - 0s 467us/step - loss: 0.1072 - accuracy: 0.9633
Epoch 8/10
300/300 [==============================] - 0s 456us/step - loss: 0.0981 - accuracy: 0.9667
Epoch 9/10
300/300 [==============================] - 0s 472us/step - loss: 0.1252 - accuracy: 0.9600
Epoch 10/10
300/300 [==============================] - 0s 465us/step - loss: 0.1253 - accuracy: 0.9633

<tensorflow.python.keras.callbacks.History at 0x7f3430485850>

## KEEPOUTPUT
predict = lambda X: (model.predict(X)[:,0]>.5).astype(int)
mlutils.plot_2Ddata_with_boundary(predict, X, y)

(0.5318, 0.4682)

../_images/NOTES 07.04 - NEURAL NETWORKS WITH TENSORFLOW_8_1.png

A bigger network¶

different activation functions
different optimizer

model = tf.keras.Sequential([
    tf.keras.layers.Dense(20, activation='tanh'),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=.01),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),metrics=['accuracy'])

## KEEPOUTPUT
model.fit(X,y, epochs=10, batch_size=1)

Epoch 1/10
300/300 [==============================] - 0s 510us/step - loss: 0.3400 - accuracy: 0.8567
Epoch 2/10
300/300 [==============================] - 0s 508us/step - loss: 0.2804 - accuracy: 0.8867
Epoch 3/10
300/300 [==============================] - 0s 487us/step - loss: 0.2450 - accuracy: 0.8967
Epoch 4/10
300/300 [==============================] - 0s 532us/step - loss: 0.1722 - accuracy: 0.9433
Epoch 5/10
300/300 [==============================] - 0s 516us/step - loss: 0.1660 - accuracy: 0.9500
Epoch 6/10
300/300 [==============================] - 0s 527us/step - loss: 0.0917 - accuracy: 0.9700
Epoch 7/10
300/300 [==============================] - 0s 516us/step - loss: 0.1127 - accuracy: 0.9667
Epoch 8/10
300/300 [==============================] - 0s 507us/step - loss: 0.1118 - accuracy: 0.9600
Epoch 9/10
300/300 [==============================] - 0s 526us/step - loss: 0.1076 - accuracy: 0.9500
Epoch 10/10
300/300 [==============================] - 0s 487us/step - loss: 0.0840 - accuracy: 0.9800

<tensorflow.python.keras.callbacks.History at 0x7f340807ac10>

## KEEPOUTPUT
predict = lambda X: (model.predict(X)[:,0]>.5).astype(int)
mlutils.plot_2Ddata_with_boundary(predict, X, y)

(0.522575, 0.477425)

../_images/NOTES 07.04 - NEURAL NETWORKS WITH TENSORFLOW_12_1.png

Cross entropy - multiclass classification¶

follow THIS EXAMPLE in TensorFlow doc site. Observe that:

labels corresponding to a 10-class classification problem
the network contains 10 output neurons, one per output class
the loss function is SparseCategoricalCrossEntropy

Observe how cross entropy works with 4 classes:

first we convert the output to a one-hot encoding
we create a network with two output neurons with sigmoid activation
interpret each neuron’s output as elements of a probability distribution
normalize the probability distribution (must add up to one)
we consider network output is better when it yields more probability to the correct class

expected classes for five data points

## KEEPOUTPUT
y = np.random.randint(4, size=5)
y

array([3, 1, 2, 0, 3])

convert it to one hot encoding

## KEEPOUTPUT
y_ohe = np.eye(4)[y].astype(int)
y_ohe

array([[0, 0, 0, 1],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

simulate some neural network output with NO ACTIVATION function

with 10 output neurons, so for each input element (we have five) we have 4 outputs.

this is called LOGITS in Tensorflow

## KEEPOUTPUT
y_hat = np.round(np.random.normal(size=y_ohe.shape), 2)
y_hat

array([[ 0.06, -0.31, -0.95,  0.39],
       [ 0.92, -0.48, -0.08,  0.53],
       [-0.5 ,  0.22, -0.18,  1.81],
       [-0.49, -1.41,  0.09, -0.11],
       [-0.73,  0.26, -1.63, -0.68]])

normalize LOGITS. This is the SOFTMAX function

LOGITS obtained from network last layer with no activation

\[\hat{\mathbf{y}}^{(i)} = [\hat{y}^{(i)}_0, \hat{y}^{(i)}_1,...,\hat{y}^{(i)}_9]\]

SOFTMAX ACTIVATION

\[\hat{\bar{\mathbf{y}}}^{(i)} = [\hat{\bar{y}}^{(i)}_0, \hat{\bar{y}}^{(i)}_1,...,\hat{\bar{y}}^{(i)}_9]\]

with

\[\hat{\bar{y}}^{(i)}_k = \frac{e^{\hat{y}^{(i)}_k}}{\sum_{j=0}^9e^{\hat{y}^{(i)}_j}}\]

this ensures:

\(\sum_{k=0}^9 \hat{\bar{y}}^{(i)}_k=1\)
\(0 \le \hat{\bar{y}}^{(i)}_k \le 1\)

this way, for each input we have a nice probability distribution in its outputs.

This is implemented in Tensorflow

## KEEPOUTPUT
y_hatb = tf.nn.softmax(y_hat).numpy()
y_hatb

array([[0.29019814, 0.20044982, 0.10569567, 0.40365637],
       [0.43638904, 0.10761221, 0.16053855, 0.2954602 ],
       [0.06893706, 0.14162659, 0.09493514, 0.69450122],
       [0.21519991, 0.08576126, 0.38435531, 0.31468351],
       [0.19420963, 0.52266365, 0.07895974, 0.20416697]])

check sums

## KEEPOUTPUT
y_hatb.sum(axis=1)

array([1., 1., 1., 1., 1.])

how would you now measure how closely y_hatb is to the expected output on y_ohe?

cross entropy: just take the probability assigned to the correct class (and pass it through a log function)

\[\text{loss}(\bar{\mathbf{y}}^{(i)}, \hat{\bar{\mathbf{y}}}^{(i)}) = -\sum_{k=0}^9 \bar{y}^{(i)}_k\log(\hat{\bar{y}}^{(i)}_k)\]

where \(\bar{\mathbf{y}}^{(i)}\) is the one-hot encoding of the expected class (label) for data point \(i\).

observe that,

in the one-hot encoding \(\bar{\mathbf{y}}^{(i)}\) only one of the elements will be 1, and the rest will be 0’s, so the sumation above is only taking the log of the probability of the correct label.
the negative sign accounts for logs of values <1 are negative and we will later want to minimize the loss