07.04 - TENSORFLOW¶
A dataset (again)¶
A bigger network¶
different activation functions
different optimizer
model = tf.keras.Sequential([
tf.keras.layers.Dense(20, activation='tanh'),
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=.01),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),metrics=['accuracy'])
Epoch 1/10
300/300 [==============================] - 0s 510us/step - loss: 0.3400 - accuracy: 0.8567
Epoch 2/10
300/300 [==============================] - 0s 508us/step - loss: 0.2804 - accuracy: 0.8867
Epoch 3/10
300/300 [==============================] - 0s 487us/step - loss: 0.2450 - accuracy: 0.8967
Epoch 4/10
300/300 [==============================] - 0s 532us/step - loss: 0.1722 - accuracy: 0.9433
Epoch 5/10
300/300 [==============================] - 0s 516us/step - loss: 0.1660 - accuracy: 0.9500
Epoch 6/10
300/300 [==============================] - 0s 527us/step - loss: 0.0917 - accuracy: 0.9700
Epoch 7/10
300/300 [==============================] - 0s 516us/step - loss: 0.1127 - accuracy: 0.9667
Epoch 8/10
300/300 [==============================] - 0s 507us/step - loss: 0.1118 - accuracy: 0.9600
Epoch 9/10
300/300 [==============================] - 0s 526us/step - loss: 0.1076 - accuracy: 0.9500
Epoch 10/10
300/300 [==============================] - 0s 487us/step - loss: 0.0840 - accuracy: 0.9800
Cross entropy - multiclass classification¶
follow THIS EXAMPLE in TensorFlow doc site. Observe that:
labels corresponding to a 10-class classification problem
the network contains 10 output neurons, one per output class
the loss function is
SparseCategoricalCrossEntropy
Observe how cross entropy works with 4 classes:
first we convert the output to a one-hot encoding
we create a network with two output neurons with sigmoid activation
interpret each neuron’s output as elements of a probability distribution
normalize the probability distribution (must add up to one)
we consider network output is better when it yields more probability to the correct class
expected classes for five data points
convert it to one hot encoding
simulate some neural network output with NO ACTIVATION function
with 10 output neurons, so for each input element (we have five) we have 4 outputs.
this is called LOGITS in Tensorflow
normalize LOGITS. This is the SOFTMAX function
LOGITS obtained from network last layer with no activation
SOFTMAX ACTIVATION
with
this ensures:
∑9k=0ˆˉy(i)k=1
0≤ˆˉy(i)k≤1
this way, for each input we have a nice probability distribution in its outputs.
This is implemented in Tensorflow
check sums
how would you now measure how closely y_hatb
is to the expected output on y_ohe
?
cross entropy: just take the probability assigned to the correct class (and pass it through a log function)
where ˉy(i) is the one-hot encoding of the expected class (label) for data point i.
observe that,
in the one-hot encoding ˉy(i) only one of the elements will be 1, and the rest will be 0’s, so the sumation above is only taking the log of the probability of the correct label.
the negative sign accounts for logs of values <1 are negative and we will later want to minimize the loss
This is implemented in Tensorflow
Observe that TensorFlow also implements the corresponding sparse convenience function that works directly with our labels