# 2.6 - Multimodal architectures¶

```!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
```
```import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image
%matplotlib inline
import tensorflow as tf
tf.__version__
```
```'2.1.0'
```
```mnist = pd.read_csv("local/data/mnist1.5k.csv.gz", compression="gzip", header=None).values
X=mnist[:,1:785]/255.
y=mnist[:,0]
print("dimension de las imagenes y las clases", X.shape, y.shape)
```
```dimension de las imagenes y las clases (1500, 784) (1500,)
```
```perm = np.random.permutation(list(range(X.shape[0])))[0:50]
random_imgs   = X[perm]
random_labels = y[perm]
fig = plt.figure(figsize=(10,6))
for i in range(random_imgs.shape[0]):
plt.imshow(random_imgs[i].reshape(28,28), interpolation="nearest", cmap = plt.cm.Greys_r)
ax.set_title(int(random_labels[i]))
ax.set_xticklabels([])
ax.set_yticklabels([])
```

## A regular neural network for classification¶

```Image(filename='local/imgs/ann1.png')
```

Number of connections:

```INPUT to LAYER 1:    784*50 + 50 (bias) = 39250
LAYER 1 to LAYER 2:   50*30 + 30 (bias) = 1530
LAYER 2 to LAYER 3:   30*20 + 20 (bias) = 620
LAYER 3 to OUTPUT:    20*10 + 10 (bias) = 210

TOTAL 41610
```

observe we convert `y` to a one_hot encoding

```yoh = np.eye(10)[y]
```
```i = np.random.randint(len(y))
y[i], yoh[i]
```
```(0, array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))
```
```from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.8)

X_train, X_test, y_train, y_test = X[:300], X[300:], y[:300], y[300:]
y_train_oh = np.eye(10)[y_train]
y_test_oh  = np.eye(10)[y_test]
print(X_train.shape, y_train_oh.shape)
```
```(300, 784) (300, 10)
```

### create the model¶

```from tensorflow.keras import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Flatten, concatenate, Input
from tensorflow.keras.backend import clear_session
```
```def get_model_A(input_dim, s1, s2, s3, s3_activation="relu"):
print(input_dim*s1 + s1*s2 + s2*s3 + s3*10 + s1+s2+s3+10)
clear_session()
model = Sequential()

model.reset_states()
return model
```
```model = get_model_A(input_dim=X.shape[1], s1=50, s2=30, s3=20)
model.summary()
```
```41610
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 50)                39250
_________________________________________________________________
dense_1 (Dense)              (None, 30)                1530
_________________________________________________________________
dense_2 (Dense)              (None, 20)                620
_________________________________________________________________
dense_3 (Dense)              (None, 10)                210
=================================================================
Total params: 41,610
Trainable params: 41,610
Non-trainable params: 0
_________________________________________________________________
```

### fit and display losses¶

```model.fit(X_train, y_train_oh, epochs=100, batch_size=32, validation_data=(X_test, y_test_oh))
```
```Train on 300 samples, validate on 1200 samples
Epoch 1/100
300/300 [==============================] - 0s 1ms/sample - loss: 2.2274 - val_loss: 2.1210
Epoch 2/100
300/300 [==============================] - 0s 201us/sample - loss: 1.9919 - val_loss: 1.9278
Epoch 3/100
300/300 [==============================] - 0s 224us/sample - loss: 1.7531 - val_loss: 1.7165
Epoch 4/100
300/300 [==============================] - 0s 185us/sample - loss: 1.4943 - val_loss: 1.4922
Epoch 5/100
300/300 [==============================] - 0s 188us/sample - loss: 1.2550 - val_loss: 1.3319
Epoch 6/100
300/300 [==============================] - 0s 196us/sample - loss: 1.0457 - val_loss: 1.2062
Epoch 7/100
300/300 [==============================] - 0s 199us/sample - loss: 0.8917 - val_loss: 1.0992
Epoch 8/100
300/300 [==============================] - 0s 186us/sample - loss: 0.7668 - val_loss: 1.0182
Epoch 9/100
300/300 [==============================] - 0s 184us/sample - loss: 0.6596 - val_loss: 0.9711
Epoch 10/100
300/300 [==============================] - 0s 187us/sample - loss: 0.5542 - val_loss: 0.8945
Epoch 11/100
300/300 [==============================] - 0s 193us/sample - loss: 0.4636 - val_loss: 0.8483
Epoch 12/100
300/300 [==============================] - 0s 187us/sample - loss: 0.3939 - val_loss: 0.7981
Epoch 13/100
300/300 [==============================] - 0s 169us/sample - loss: 0.3208 - val_loss: 0.7711
Epoch 14/100
300/300 [==============================] - 0s 157us/sample - loss: 0.2782 - val_loss: 0.7700
Epoch 15/100
300/300 [==============================] - 0s 172us/sample - loss: 0.2332 - val_loss: 0.7443
Epoch 16/100
300/300 [==============================] - 0s 155us/sample - loss: 0.1945 - val_loss: 0.7321
Epoch 17/100
300/300 [==============================] - 0s 172us/sample - loss: 0.1655 - val_loss: 0.7295
Epoch 18/100
300/300 [==============================] - 0s 162us/sample - loss: 0.1385 - val_loss: 0.7404
Epoch 19/100
300/300 [==============================] - 0s 153us/sample - loss: 0.1214 - val_loss: 0.7357
Epoch 20/100
300/300 [==============================] - 0s 175us/sample - loss: 0.0988 - val_loss: 0.7371
Epoch 21/100
300/300 [==============================] - 0s 171us/sample - loss: 0.0864 - val_loss: 0.7381
Epoch 22/100
300/300 [==============================] - 0s 159us/sample - loss: 0.0759 - val_loss: 0.7365
Epoch 23/100
300/300 [==============================] - 0s 148us/sample - loss: 0.0621 - val_loss: 0.7684
Epoch 24/100
300/300 [==============================] - 0s 176us/sample - loss: 0.0558 - val_loss: 0.7525
Epoch 25/100
300/300 [==============================] - 0s 182us/sample - loss: 0.0470 - val_loss: 0.7554
Epoch 26/100
300/300 [==============================] - 0s 165us/sample - loss: 0.0401 - val_loss: 0.7518
Epoch 27/100
300/300 [==============================] - 0s 174us/sample - loss: 0.0356 - val_loss: 0.7713
Epoch 28/100
300/300 [==============================] - 0s 173us/sample - loss: 0.0321 - val_loss: 0.7725
Epoch 29/100
300/300 [==============================] - 0s 179us/sample - loss: 0.0278 - val_loss: 0.7749
Epoch 30/100
300/300 [==============================] - 0s 181us/sample - loss: 0.0254 - val_loss: 0.7771
Epoch 31/100
300/300 [==============================] - 0s 167us/sample - loss: 0.0226 - val_loss: 0.7835
Epoch 32/100
300/300 [==============================] - 0s 169us/sample - loss: 0.0206 - val_loss: 0.7873
Epoch 33/100
300/300 [==============================] - 0s 169us/sample - loss: 0.0187 - val_loss: 0.7902
Epoch 34/100
300/300 [==============================] - 0s 173us/sample - loss: 0.0170 - val_loss: 0.7993
Epoch 35/100
300/300 [==============================] - 0s 188us/sample - loss: 0.0156 - val_loss: 0.8065
Epoch 36/100
300/300 [==============================] - 0s 170us/sample - loss: 0.0145 - val_loss: 0.8089
Epoch 37/100
300/300 [==============================] - 0s 154us/sample - loss: 0.0134 - val_loss: 0.8054
Epoch 38/100
300/300 [==============================] - 0s 169us/sample - loss: 0.0123 - val_loss: 0.8137
Epoch 39/100
300/300 [==============================] - 0s 213us/sample - loss: 0.0115 - val_loss: 0.8190
Epoch 40/100
300/300 [==============================] - 0s 200us/sample - loss: 0.0107 - val_loss: 0.8212
Epoch 41/100
300/300 [==============================] - 0s 198us/sample - loss: 0.0100 - val_loss: 0.8214
Epoch 42/100
300/300 [==============================] - 0s 184us/sample - loss: 0.0094 - val_loss: 0.8351
Epoch 43/100
300/300 [==============================] - 0s 201us/sample - loss: 0.0088 - val_loss: 0.8428
Epoch 44/100
300/300 [==============================] - 0s 189us/sample - loss: 0.0082 - val_loss: 0.8436
Epoch 45/100
300/300 [==============================] - 0s 202us/sample - loss: 0.0077 - val_loss: 0.8413
Epoch 46/100
300/300 [==============================] - 0s 196us/sample - loss: 0.0073 - val_loss: 0.8521
Epoch 47/100
300/300 [==============================] - 0s 187us/sample - loss: 0.0069 - val_loss: 0.8551
Epoch 48/100
300/300 [==============================] - 0s 225us/sample - loss: 0.0065 - val_loss: 0.8512
Epoch 49/100
300/300 [==============================] - 0s 175us/sample - loss: 0.0061 - val_loss: 0.8589
Epoch 50/100
300/300 [==============================] - 0s 169us/sample - loss: 0.0058 - val_loss: 0.8664
Epoch 51/100
300/300 [==============================] - 0s 154us/sample - loss: 0.0055 - val_loss: 0.8677
Epoch 52/100
300/300 [==============================] - 0s 165us/sample - loss: 0.0053 - val_loss: 0.8700
Epoch 53/100
300/300 [==============================] - 0s 156us/sample - loss: 0.0050 - val_loss: 0.8782
Epoch 54/100
300/300 [==============================] - 0s 176us/sample - loss: 0.0048 - val_loss: 0.8812
Epoch 55/100
300/300 [==============================] - 0s 196us/sample - loss: 0.0046 - val_loss: 0.8815
Epoch 56/100
300/300 [==============================] - 0s 193us/sample - loss: 0.0044 - val_loss: 0.8859
Epoch 57/100
300/300 [==============================] - 0s 196us/sample - loss: 0.0042 - val_loss: 0.8884
Epoch 58/100
300/300 [==============================] - 0s 166us/sample - loss: 0.0040 - val_loss: 0.8921
Epoch 59/100
300/300 [==============================] - 0s 155us/sample - loss: 0.0039 - val_loss: 0.8926
Epoch 60/100
300/300 [==============================] - 0s 157us/sample - loss: 0.0037 - val_loss: 0.9001
Epoch 61/100
300/300 [==============================] - 0s 203us/sample - loss: 0.0035 - val_loss: 0.9011
Epoch 62/100
300/300 [==============================] - 0s 203us/sample - loss: 0.0034 - val_loss: 0.9025
Epoch 63/100
300/300 [==============================] - 0s 164us/sample - loss: 0.0033 - val_loss: 0.9089
Epoch 64/100
300/300 [==============================] - 0s 168us/sample - loss: 0.0032 - val_loss: 0.9113
Epoch 65/100
300/300 [==============================] - 0s 152us/sample - loss: 0.0030 - val_loss: 0.9144
Epoch 66/100
300/300 [==============================] - 0s 146us/sample - loss: 0.0029 - val_loss: 0.9170
Epoch 67/100
300/300 [==============================] - 0s 157us/sample - loss: 0.0028 - val_loss: 0.9206
Epoch 68/100
300/300 [==============================] - 0s 153us/sample - loss: 0.0027 - val_loss: 0.9230
Epoch 69/100
300/300 [==============================] - 0s 164us/sample - loss: 0.0026 - val_loss: 0.9266
Epoch 70/100
300/300 [==============================] - 0s 157us/sample - loss: 0.0025 - val_loss: 0.9271
Epoch 71/100
300/300 [==============================] - 0s 162us/sample - loss: 0.0025 - val_loss: 0.9302
Epoch 72/100
300/300 [==============================] - 0s 165us/sample - loss: 0.0024 - val_loss: 0.9313
Epoch 73/100
300/300 [==============================] - 0s 162us/sample - loss: 0.0023 - val_loss: 0.9379
Epoch 74/100
300/300 [==============================] - 0s 167us/sample - loss: 0.0022 - val_loss: 0.9381
Epoch 75/100
300/300 [==============================] - 0s 154us/sample - loss: 0.0022 - val_loss: 0.9368
Epoch 76/100
300/300 [==============================] - 0s 177us/sample - loss: 0.0021 - val_loss: 0.9389
Epoch 77/100
300/300 [==============================] - 0s 179us/sample - loss: 0.0020 - val_loss: 0.9427
```
```Epoch 78/100
300/300 [==============================] - 0s 169us/sample - loss: 0.0020 - val_loss: 0.9448
Epoch 79/100
300/300 [==============================] - 0s 145us/sample - loss: 0.0019 - val_loss: 0.9469
Epoch 80/100
300/300 [==============================] - 0s 156us/sample - loss: 0.0018 - val_loss: 0.9510
Epoch 81/100
300/300 [==============================] - 0s 143us/sample - loss: 0.0018 - val_loss: 0.9532
Epoch 82/100
300/300 [==============================] - 0s 155us/sample - loss: 0.0017 - val_loss: 0.9547
Epoch 83/100
300/300 [==============================] - 0s 142us/sample - loss: 0.0017 - val_loss: 0.9541
Epoch 84/100
300/300 [==============================] - 0s 149us/sample - loss: 0.0017 - val_loss: 0.9589
Epoch 85/100
300/300 [==============================] - 0s 142us/sample - loss: 0.0016 - val_loss: 0.9624
Epoch 86/100
300/300 [==============================] - 0s 151us/sample - loss: 0.0016 - val_loss: 0.9668
Epoch 87/100
300/300 [==============================] - 0s 147us/sample - loss: 0.0015 - val_loss: 0.9661
Epoch 88/100
300/300 [==============================] - 0s 144us/sample - loss: 0.0015 - val_loss: 0.9678
Epoch 89/100
300/300 [==============================] - 0s 141us/sample - loss: 0.0014 - val_loss: 0.9687
Epoch 90/100
300/300 [==============================] - 0s 146us/sample - loss: 0.0014 - val_loss: 0.9704
Epoch 91/100
300/300 [==============================] - 0s 148us/sample - loss: 0.0014 - val_loss: 0.9751
Epoch 92/100
300/300 [==============================] - 0s 146us/sample - loss: 0.0013 - val_loss: 0.9773
Epoch 93/100
300/300 [==============================] - 0s 147us/sample - loss: 0.0013 - val_loss: 0.9780
Epoch 94/100
300/300 [==============================] - 0s 140us/sample - loss: 0.0013 - val_loss: 0.9792
Epoch 95/100
300/300 [==============================] - 0s 149us/sample - loss: 0.0012 - val_loss: 0.9820
Epoch 96/100
300/300 [==============================] - 0s 145us/sample - loss: 0.0012 - val_loss: 0.9860
Epoch 97/100
300/300 [==============================] - 0s 140us/sample - loss: 0.0012 - val_loss: 0.9876
Epoch 98/100
300/300 [==============================] - 0s 152us/sample - loss: 0.0012 - val_loss: 0.9895
Epoch 99/100
300/300 [==============================] - 0s 151us/sample - loss: 0.0011 - val_loss: 0.9892
Epoch 100/100
300/300 [==============================] - 0s 146us/sample - loss: 0.0011 - val_loss: 0.9936
```
```<tensorflow.python.keras.callbacks.History at 0x7fde385ecf60>
```
```plt.figure(figsize=(20,3))
loss  = model.history.history["loss"]
vloss = model.history.history["val_loss"]
plt.plot(loss, lw=4, alpha=.5, label="loss")
plt.plot(vloss, lw=4, alpha=.5, label="val loss")
plt.grid();
plt.legend();
```

### measure accuracies¶

• why are we using argmax below?

```preds_train = model.predict(X_train).argmax(axis=1)
preds_test = model.predict(X_test).argmax(axis=1)

print("accuracy train %.3f"%(np.mean(preds_train==y_train)))
print("accuracy test  %.3f"%(np.mean(preds_test==y_test)))
```
```accuracy train 1.000
accuracy test  0.798
```

## Multimodal network¶

We will simulate we have information about our data from an additional source. This can be the case when we have, for instance, medical images and associated clinical data. In this situation we have multimodal data (images and numeric).

We would like to have an arquitecture in which we can inject both image and numeric data.

In this case, we assume we have an additional information source, telling us with a size 2 vector whether each image contains an odd or even number (with vaues `[1 0]` or `[0 1]`)

This new info is injected at LAYER 3 simply concatenating the neurons

```Image(filename='local/imgs/ann2.png')
```

Number of connections:

```INPUT 1 to LAYER 1:              784*50 + 50 (bias) = 39250
LAYER 1 to LAYER 2:               50*30 + 30 (bias) = 1530
LAYER 2 to LAYER 3:               30*20 + 20 (bias) = 620
LAYER 3 + INPUT 2 to OUTPUT:  (20+2)*10 + 10 (bias) = 230

TOTAL 41630
```

observe how this new architecture is built, and how the two kinds of information are handled both when building the network or when fitting or predicting

```def get_model_B(input_dim, extra_info_dim,  s1, s2, s3, s3_activation="relu"):
clear_session()
inp1 = Input(shape=(input_dim,), name="input_img")
l11 = Dense(s1, activation="relu", name="dense1")(inp1)
l12 = Dense(s2, activation="relu", name="dense2")(l11)
l13 = Dense(s3, activation=s3_activation, name="dense3")(l12)

inp2 = Input(shape=(extra_info_dim,), name="input_extra")
cc1 = tf.concat([l13, inp2],axis=1) # Merge row, same column
output = Dense(10, activation='softmax', name="output")(cc1)
model = Model(inputs=[inp1, inp2], outputs=output)

model.reset_states()
return model
```

We simulate extra information, we could actually have several choices to encode this information, for instance

• `[ 1, 0] [ 0, 1]` or

• `[ 1,-1] [-1, 1]` or

• `[10, 0] [ 0,10]` among others

Observe how k0, k1 control how the data is represented. Try:

• k0=0, k1=1

• k0=-0.5, k1=2

• k0=0, k2=10

• k0=-0.5, k1=20

to understand how this coding affects the representation

```def get_X_extra(y_train, y_test, k0, k1):
X_train_extra = (np.eye(2)[y_train%2]+k0)*k1
X_test_extra  = (np.eye(2)[y_test%2]+k0)*k1
return X_train_extra, X_test_extra

X_train_extra, X_test_extra = get_X_extra(y_train, y_test, k0=-.5, k1=2)
X_train_extra[:10]
```
```array([[-1.,  1.],
[ 1., -1.],
[-1.,  1.],
[ 1., -1.],
[ 1., -1.],
[ 1., -1.],
[-1.,  1.],
[-1.,  1.],
[-1.,  1.],
[-1.,  1.]])
```
```model = get_model_B(input_dim=X.shape[1], extra_info_dim=X_train_extra.shape[1], s1=50, s2=30, s3=20,
s3_activation="tanh")
model.summary()
```
```Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_img (InputLayer)          [(None, 784)]        0
__________________________________________________________________________________________________
dense1 (Dense)                  (None, 50)           39250       input_img[0][0]
__________________________________________________________________________________________________
dense2 (Dense)                  (None, 30)           1530        dense1[0][0]
__________________________________________________________________________________________________
dense3 (Dense)                  (None, 20)           620         dense2[0][0]
__________________________________________________________________________________________________
input_extra (InputLayer)        [(None, 2)]          0
__________________________________________________________________________________________________
tf_op_layer_concat (TensorFlowO [(None, 22)]         0           dense3[0][0]
input_extra[0][0]
__________________________________________________________________________________________________
output (Dense)                  (None, 10)           230         tf_op_layer_concat[0][0]
==================================================================================================
Total params: 41,630
Trainable params: 41,630
Non-trainable params: 0
__________________________________________________________________________________________________
```
```model.fit([X_train, X_train_extra], y_train_oh, epochs=100, batch_size=32,
validation_data=([X_test, X_test_extra], y_test_oh))
```
```Train on 300 samples, validate on 1200 samples
Epoch 1/200
300/300 [==============================] - 0s 1ms/sample - loss: 2.2208 - val_loss: 2.0656
Epoch 2/200
300/300 [==============================] - 0s 186us/sample - loss: 1.9353 - val_loss: 1.8616
Epoch 3/200
300/300 [==============================] - 0s 197us/sample - loss: 1.6757 - val_loss: 1.6130
Epoch 4/200
300/300 [==============================] - 0s 196us/sample - loss: 1.4212 - val_loss: 1.3889
Epoch 5/200
300/300 [==============================] - 0s 200us/sample - loss: 1.1756 - val_loss: 1.2010
Epoch 6/200
300/300 [==============================] - 0s 193us/sample - loss: 0.9764 - val_loss: 1.0640
Epoch 7/200
300/300 [==============================] - 0s 197us/sample - loss: 0.8130 - val_loss: 0.9314
Epoch 8/200
300/300 [==============================] - 0s 196us/sample - loss: 0.6877 - val_loss: 0.8559
Epoch 9/200
300/300 [==============================] - 0s 193us/sample - loss: 0.5882 - val_loss: 0.7672
Epoch 10/200
300/300 [==============================] - 0s 200us/sample - loss: 0.4998 - val_loss: 0.7380
Epoch 11/200
300/300 [==============================] - 0s 199us/sample - loss: 0.4319 - val_loss: 0.6846
Epoch 12/200
300/300 [==============================] - 0s 188us/sample - loss: 0.3673 - val_loss: 0.6468
Epoch 13/200
300/300 [==============================] - 0s 166us/sample - loss: 0.3206 - val_loss: 0.6241
Epoch 14/200
300/300 [==============================] - 0s 165us/sample - loss: 0.2804 - val_loss: 0.5929
Epoch 15/200
300/300 [==============================] - 0s 169us/sample - loss: 0.2418 - val_loss: 0.5843
Epoch 16/200
300/300 [==============================] - 0s 191us/sample - loss: 0.2141 - val_loss: 0.5577
Epoch 17/200
300/300 [==============================] - 0s 173us/sample - loss: 0.1914 - val_loss: 0.5527
Epoch 18/200
300/300 [==============================] - 0s 168us/sample - loss: 0.1701 - val_loss: 0.5357
Epoch 19/200
300/300 [==============================] - 0s 157us/sample - loss: 0.1517 - val_loss: 0.5244
Epoch 20/200
300/300 [==============================] - 0s 158us/sample - loss: 0.1379 - val_loss: 0.5278
Epoch 21/200
300/300 [==============================] - 0s 159us/sample - loss: 0.1199 - val_loss: 0.5158
Epoch 22/200
300/300 [==============================] - 0s 162us/sample - loss: 0.1085 - val_loss: 0.5092
Epoch 23/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0992 - val_loss: 0.5031
Epoch 24/200
300/300 [==============================] - 0s 180us/sample - loss: 0.0916 - val_loss: 0.4994
Epoch 25/200
300/300 [==============================] - 0s 181us/sample - loss: 0.0836 - val_loss: 0.4974
Epoch 26/200
300/300 [==============================] - 0s 181us/sample - loss: 0.0774 - val_loss: 0.5017
Epoch 27/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0707 - val_loss: 0.4908
Epoch 28/200
300/300 [==============================] - 0s 180us/sample - loss: 0.0659 - val_loss: 0.4878
Epoch 29/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0612 - val_loss: 0.4905
Epoch 30/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0574 - val_loss: 0.4880
Epoch 31/200
300/300 [==============================] - 0s 176us/sample - loss: 0.0536 - val_loss: 0.4856
Epoch 32/200
300/300 [==============================] - 0s 169us/sample - loss: 0.0506 - val_loss: 0.4869
Epoch 33/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0474 - val_loss: 0.4837
Epoch 34/200
300/300 [==============================] - 0s 187us/sample - loss: 0.0448 - val_loss: 0.4839
Epoch 35/200
300/300 [==============================] - 0s 204us/sample - loss: 0.0424 - val_loss: 0.4801
Epoch 36/200
300/300 [==============================] - 0s 183us/sample - loss: 0.0402 - val_loss: 0.4783
Epoch 37/200
300/300 [==============================] - 0s 180us/sample - loss: 0.0380 - val_loss: 0.4838
Epoch 38/200
300/300 [==============================] - 0s 177us/sample - loss: 0.0362 - val_loss: 0.4808
Epoch 39/200
300/300 [==============================] - 0s 165us/sample - loss: 0.0344 - val_loss: 0.4820
Epoch 40/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0327 - val_loss: 0.4817
Epoch 41/200
300/300 [==============================] - 0s 163us/sample - loss: 0.0311 - val_loss: 0.4807
Epoch 42/200
300/300 [==============================] - 0s 178us/sample - loss: 0.0298 - val_loss: 0.4784
Epoch 43/200
300/300 [==============================] - 0s 205us/sample - loss: 0.0285 - val_loss: 0.4799
Epoch 44/200
300/300 [==============================] - 0s 202us/sample - loss: 0.0273 - val_loss: 0.4823
Epoch 45/200
300/300 [==============================] - 0s 205us/sample - loss: 0.0262 - val_loss: 0.4771
Epoch 46/200
300/300 [==============================] - 0s 207us/sample - loss: 0.0251 - val_loss: 0.4828
Epoch 47/200
300/300 [==============================] - 0s 204us/sample - loss: 0.0241 - val_loss: 0.4855
Epoch 48/200
300/300 [==============================] - 0s 202us/sample - loss: 0.0232 - val_loss: 0.4841
Epoch 49/200
300/300 [==============================] - 0s 219us/sample - loss: 0.0223 - val_loss: 0.4847
Epoch 50/200
300/300 [==============================] - 0s 226us/sample - loss: 0.0215 - val_loss: 0.4862
Epoch 51/200
300/300 [==============================] - 0s 206us/sample - loss: 0.0207 - val_loss: 0.4855
Epoch 52/200
300/300 [==============================] - 0s 189us/sample - loss: 0.0200 - val_loss: 0.4839
Epoch 53/200
300/300 [==============================] - 0s 187us/sample - loss: 0.0193 - val_loss: 0.4833
Epoch 54/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0187 - val_loss: 0.4859
Epoch 55/200
300/300 [==============================] - 0s 175us/sample - loss: 0.0180 - val_loss: 0.4905
Epoch 56/200
300/300 [==============================] - 0s 183us/sample - loss: 0.0175 - val_loss: 0.4905
Epoch 57/200
300/300 [==============================] - 0s 184us/sample - loss: 0.0169 - val_loss: 0.4898
Epoch 58/200
300/300 [==============================] - 0s 191us/sample - loss: 0.0165 - val_loss: 0.4860
Epoch 59/200
300/300 [==============================] - 0s 205us/sample - loss: 0.0159 - val_loss: 0.4903
Epoch 60/200
300/300 [==============================] - 0s 181us/sample - loss: 0.0154 - val_loss: 0.4925
Epoch 61/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0150 - val_loss: 0.4925
Epoch 62/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0145 - val_loss: 0.4929
Epoch 63/200
300/300 [==============================] - 0s 169us/sample - loss: 0.0142 - val_loss: 0.4933
Epoch 64/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0138 - val_loss: 0.4935
Epoch 65/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0134 - val_loss: 0.4939
Epoch 66/200
300/300 [==============================] - 0s 196us/sample - loss: 0.0130 - val_loss: 0.4952
Epoch 67/200
300/300 [==============================] - 0s 192us/sample - loss: 0.0127 - val_loss: 0.4958
Epoch 68/200
300/300 [==============================] - 0s 197us/sample - loss: 0.0124 - val_loss: 0.4986
Epoch 69/200
300/300 [==============================] - 0s 187us/sample - loss: 0.0120 - val_loss: 0.4968
Epoch 70/200
300/300 [==============================] - 0s 188us/sample - loss: 0.0117 - val_loss: 0.4984
Epoch 71/200
300/300 [==============================] - 0s 274us/sample - loss: 0.0114 - val_loss: 0.5004
Epoch 72/200
300/300 [==============================] - 0s 193us/sample - loss: 0.0111 - val_loss: 0.5009
Epoch 73/200
300/300 [==============================] - 0s 179us/sample - loss: 0.0109 - val_loss: 0.5017
Epoch 74/200
300/300 [==============================] - 0s 185us/sample - loss: 0.0106 - val_loss: 0.5000
Epoch 75/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0104 - val_loss: 0.5028
Epoch 76/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0101 - val_loss: 0.5029
Epoch 77/200
300/300 [==============================] - 0s 155us/sample - loss: 0.0099 - val_loss: 0.5040
```
```Epoch 78/200
300/300 [==============================] - 0s 173us/sample - loss: 0.0097 - val_loss: 0.5047
Epoch 79/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0095 - val_loss: 0.5053
Epoch 80/200
300/300 [==============================] - 0s 162us/sample - loss: 0.0093 - val_loss: 0.5030
Epoch 81/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0090 - val_loss: 0.5055
Epoch 82/200
300/300 [==============================] - 0s 162us/sample - loss: 0.0089 - val_loss: 0.5067
Epoch 83/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0087 - val_loss: 0.5096
Epoch 84/200
300/300 [==============================] - 0s 152us/sample - loss: 0.0085 - val_loss: 0.5097
Epoch 85/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0083 - val_loss: 0.5113
Epoch 86/200
300/300 [==============================] - 0s 152us/sample - loss: 0.0081 - val_loss: 0.5107
Epoch 87/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0080 - val_loss: 0.5117
Epoch 88/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0078 - val_loss: 0.5123
Epoch 89/200
300/300 [==============================] - 0s 165us/sample - loss: 0.0077 - val_loss: 0.5116
Epoch 90/200
300/300 [==============================] - 0s 167us/sample - loss: 0.0075 - val_loss: 0.5128
Epoch 91/200
300/300 [==============================] - 0s 156us/sample - loss: 0.0074 - val_loss: 0.5151
Epoch 92/200
300/300 [==============================] - 0s 149us/sample - loss: 0.0072 - val_loss: 0.5168
Epoch 93/200
300/300 [==============================] - 0s 150us/sample - loss: 0.0071 - val_loss: 0.5163
Epoch 94/200
300/300 [==============================] - 0s 156us/sample - loss: 0.0069 - val_loss: 0.5167
Epoch 95/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0068 - val_loss: 0.5178
Epoch 96/200
300/300 [==============================] - 0s 166us/sample - loss: 0.0067 - val_loss: 0.5174
Epoch 97/200
300/300 [==============================] - 0s 191us/sample - loss: 0.0066 - val_loss: 0.5176
Epoch 98/200
300/300 [==============================] - 0s 184us/sample - loss: 0.0065 - val_loss: 0.5192
Epoch 99/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0063 - val_loss: 0.5209
Epoch 100/200
300/300 [==============================] - 0s 154us/sample - loss: 0.0062 - val_loss: 0.5203
Epoch 101/200
300/300 [==============================] - 0s 154us/sample - loss: 0.0061 - val_loss: 0.5210
Epoch 102/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0060 - val_loss: 0.5216
Epoch 103/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0059 - val_loss: 0.5229
Epoch 104/200
300/300 [==============================] - 0s 162us/sample - loss: 0.0058 - val_loss: 0.5240
Epoch 105/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0057 - val_loss: 0.5234
Epoch 106/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0056 - val_loss: 0.5236
Epoch 107/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0055 - val_loss: 0.5251
Epoch 108/200
300/300 [==============================] - 0s 152us/sample - loss: 0.0054 - val_loss: 0.5261
Epoch 109/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0054 - val_loss: 0.5260
Epoch 110/200
300/300 [==============================] - 0s 156us/sample - loss: 0.0053 - val_loss: 0.5261
Epoch 111/200
300/300 [==============================] - 0s 151us/sample - loss: 0.0052 - val_loss: 0.5278
Epoch 112/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0051 - val_loss: 0.5286
Epoch 113/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0050 - val_loss: 0.5291
Epoch 114/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0049 - val_loss: 0.5290
Epoch 115/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0049 - val_loss: 0.5306
Epoch 116/200
300/300 [==============================] - 0s 152us/sample - loss: 0.0048 - val_loss: 0.5317
Epoch 117/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0047 - val_loss: 0.5330
Epoch 118/200
300/300 [==============================] - 0s 165us/sample - loss: 0.0047 - val_loss: 0.5343
Epoch 119/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0046 - val_loss: 0.5340
Epoch 120/200
300/300 [==============================] - 0s 163us/sample - loss: 0.0045 - val_loss: 0.5341
Epoch 121/200
300/300 [==============================] - 0s 162us/sample - loss: 0.0044 - val_loss: 0.5345
Epoch 122/200
300/300 [==============================] - 0s 164us/sample - loss: 0.0044 - val_loss: 0.5356
Epoch 123/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0043 - val_loss: 0.5360
Epoch 124/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0043 - val_loss: 0.5370
Epoch 125/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0042 - val_loss: 0.5366
Epoch 126/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0041 - val_loss: 0.5371
Epoch 127/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0041 - val_loss: 0.5377
Epoch 128/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0040 - val_loss: 0.5384
Epoch 129/200
300/300 [==============================] - 0s 149us/sample - loss: 0.0040 - val_loss: 0.5401
Epoch 130/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0039 - val_loss: 0.5395
Epoch 131/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0039 - val_loss: 0.5413
Epoch 132/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0038 - val_loss: 0.5421
Epoch 133/200
300/300 [==============================] - 0s 156us/sample - loss: 0.0038 - val_loss: 0.5424
Epoch 134/200
300/300 [==============================] - 0s 151us/sample - loss: 0.0037 - val_loss: 0.5433
Epoch 135/200
300/300 [==============================] - 0s 155us/sample - loss: 0.0037 - val_loss: 0.5432
Epoch 136/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0036 - val_loss: 0.5441
Epoch 137/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0036 - val_loss: 0.5444
Epoch 138/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0035 - val_loss: 0.5453
Epoch 139/200
300/300 [==============================] - 0s 156us/sample - loss: 0.0035 - val_loss: 0.5463
Epoch 140/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0034 - val_loss: 0.5481
Epoch 141/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0034 - val_loss: 0.5473
Epoch 142/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0033 - val_loss: 0.5481
Epoch 143/200
300/300 [==============================] - 0s 155us/sample - loss: 0.0033 - val_loss: 0.5490
Epoch 144/200
300/300 [==============================] - 0s 149us/sample - loss: 0.0033 - val_loss: 0.5495
Epoch 145/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0032 - val_loss: 0.5515
Epoch 146/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0032 - val_loss: 0.5515
Epoch 147/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0031 - val_loss: 0.5534
Epoch 148/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0031 - val_loss: 0.5539
Epoch 149/200
300/300 [==============================] - 0s 150us/sample - loss: 0.0031 - val_loss: 0.5536
Epoch 150/200
300/300 [==============================] - 0s 150us/sample - loss: 0.0030 - val_loss: 0.5540
Epoch 151/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0030 - val_loss: 0.5543
Epoch 152/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0029 - val_loss: 0.5559
Epoch 153/200
300/300 [==============================] - 0s 169us/sample - loss: 0.0029 - val_loss: 0.5565
Epoch 154/200
300/300 [==============================] - 0s 162us/sample - loss: 0.0029 - val_loss: 0.5574
```
```Epoch 155/200
300/300 [==============================] - 0s 164us/sample - loss: 0.0028 - val_loss: 0.5574
Epoch 156/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0028 - val_loss: 0.5584
Epoch 157/200
300/300 [==============================] - 0s 165us/sample - loss: 0.0028 - val_loss: 0.5589
Epoch 158/200
300/300 [==============================] - 0s 167us/sample - loss: 0.0027 - val_loss: 0.5586
Epoch 159/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0027 - val_loss: 0.5589
Epoch 160/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0027 - val_loss: 0.5604
Epoch 161/200
300/300 [==============================] - 0s 170us/sample - loss: 0.0027 - val_loss: 0.5613
Epoch 162/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0026 - val_loss: 0.5614
Epoch 163/200
300/300 [==============================] - 0s 167us/sample - loss: 0.0026 - val_loss: 0.5617
Epoch 164/200
300/300 [==============================] - 0s 153us/sample - loss: 0.0026 - val_loss: 0.5627
Epoch 165/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0025 - val_loss: 0.5633
Epoch 166/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0025 - val_loss: 0.5632
Epoch 167/200
300/300 [==============================] - 0s 149us/sample - loss: 0.0025 - val_loss: 0.5639
Epoch 168/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0025 - val_loss: 0.5641
Epoch 169/200
300/300 [==============================] - 0s 154us/sample - loss: 0.0024 - val_loss: 0.5652
Epoch 170/200
300/300 [==============================] - 0s 152us/sample - loss: 0.0024 - val_loss: 0.5664
Epoch 171/200
300/300 [==============================] - 0s 155us/sample - loss: 0.0024 - val_loss: 0.5668
Epoch 172/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0023 - val_loss: 0.5678
Epoch 173/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0023 - val_loss: 0.5681
Epoch 174/200
300/300 [==============================] - 0s 159us/sample - loss: 0.0023 - val_loss: 0.5694
Epoch 175/200
300/300 [==============================] - 0s 158us/sample - loss: 0.0023 - val_loss: 0.5694
Epoch 176/200
300/300 [==============================] - 0s 157us/sample - loss: 0.0022 - val_loss: 0.5701
Epoch 177/200
300/300 [==============================] - 0s 154us/sample - loss: 0.0022 - val_loss: 0.5706
Epoch 178/200
300/300 [==============================] - 0s 151us/sample - loss: 0.0022 - val_loss: 0.5718
Epoch 179/200
300/300 [==============================] - 0s 160us/sample - loss: 0.0022 - val_loss: 0.5725
Epoch 180/200
300/300 [==============================] - 0s 161us/sample - loss: 0.0022 - val_loss: 0.5726
Epoch 181/200
300/300 [==============================] - 0s 171us/sample - loss: 0.0021 - val_loss: 0.5733
Epoch 182/200
300/300 [==============================] - 0s 186us/sample - loss: 0.0021 - val_loss: 0.5741
Epoch 183/200
300/300 [==============================] - 0s 185us/sample - loss: 0.0021 - val_loss: 0.5746
Epoch 184/200
300/300 [==============================] - 0s 182us/sample - loss: 0.0021 - val_loss: 0.5744
Epoch 185/200
300/300 [==============================] - 0s 185us/sample - loss: 0.0020 - val_loss: 0.5759
Epoch 186/200
300/300 [==============================] - 0s 180us/sample - loss: 0.0020 - val_loss: 0.5765
Epoch 187/200
300/300 [==============================] - 0s 174us/sample - loss: 0.0020 - val_loss: 0.5769
Epoch 188/200
300/300 [==============================] - 0s 176us/sample - loss: 0.0020 - val_loss: 0.5770
Epoch 189/200
300/300 [==============================] - 0s 185us/sample - loss: 0.0020 - val_loss: 0.5775
Epoch 190/200
300/300 [==============================] - 0s 183us/sample - loss: 0.0019 - val_loss: 0.5776
Epoch 191/200
300/300 [==============================] - 0s 179us/sample - loss: 0.0019 - val_loss: 0.5782
Epoch 192/200
300/300 [==============================] - 0s 185us/sample - loss: 0.0019 - val_loss: 0.5785
Epoch 193/200
300/300 [==============================] - 0s 190us/sample - loss: 0.0019 - val_loss: 0.5795
Epoch 194/200
300/300 [==============================] - 0s 188us/sample - loss: 0.0019 - val_loss: 0.5809
Epoch 195/200
300/300 [==============================] - 0s 180us/sample - loss: 0.0018 - val_loss: 0.5817
Epoch 196/200
300/300 [==============================] - 0s 190us/sample - loss: 0.0018 - val_loss: 0.5827
Epoch 197/200
300/300 [==============================] - 0s 189us/sample - loss: 0.0018 - val_loss: 0.5826
Epoch 198/200
300/300 [==============================] - 0s 186us/sample - loss: 0.0018 - val_loss: 0.5834
Epoch 199/200
300/300 [==============================] - 0s 196us/sample - loss: 0.0018 - val_loss: 0.5836
Epoch 200/200
300/300 [==============================] - 0s 187us/sample - loss: 0.0018 - val_loss: 0.5843
```
```<tensorflow.python.keras.callbacks.History at 0x7fde3c9b98d0>
```
```plt.figure(figsize=(20,3))
loss  = model.history.history["loss"]
vloss = model.history.history["val_loss"]
plt.plot(loss, lw=4, alpha=.5, label="loss")
plt.plot(vloss, lw=4, alpha=.5, label="val loss")
plt.grid();
plt.legend();
```
```preds_train = model.predict([X_train, X_train_extra]).argmax(axis=1)
preds_test = model.predict([X_test, X_test_extra]).argmax(axis=1)

print("accuracy train %.3f"%(np.mean(preds_train==y_train)))
print("accuracy test  %.3f"%(np.mean(preds_test==y_test)))
```
```accuracy train 1.000
accuracy test  0.835
```