5.9 CNN-LSTM architectures¶

```!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
```
```import sys
print ("setting tensorflow version in colab")
%tensorflow_version 2.x
import tensorflow as tf
tf.__version__
```

This type of architecture is useful for different applications, for instance, action recognition in video sequences. In order to show its use, we are going to create a syntethic dataset.

The dataset is composed of videos where a point moves through the frames forming four different patterns: a constant point, a point ascending from bottom-left corner to top-right corner, a point descending from top-lef corner to bottom-right corner, and a point following a sin function.

```import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from keras.utils import np_utils
from tensorflow.keras.layers import LSTM, Conv2D, Dense, TimeDistributed, MaxPooling2D, Flatten
from tensorflow.keras.models import Sequential
```
```Using TensorFlow backend.
```
```import tensorflow as tf
gpus= tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
```
```%matplotlib notebook
```
```import math
t = np.linspace(math.pi/10, 2*math.pi, num=20)
y = np.sin(t) + 0.05*np.random.randn(1, 20)
y = y.flatten()
i = 1
t2 = 6*i
y2 = int(np.round(23*(y[i-1]+1)+4))
```
```def f(t2, y2):
m = np.zeros((50,130))
m[y2-3:y2+3,t2-3:t2+3] = 255
return m

def updatefig(*args):
global y,i
if i == 20:
i = 1
t2 = 6*i
y2 = int(np.round(23*(y[i-1]+1)+4))
i += 1
im.set_array(f(t2,y2))
return im,

def updatefig2(*args):
global y,i
if i == 20:
i = 1
t2 = 6*(i+1)
y2 = int(np.round(2*(i+1) + 2 + np.random.randn(1)))
i += 1
im.set_array(f(t2,y2))
return im,
```
```fig = plt.figure()
im = plt.imshow(f(t2, y2))
ani = animation.FuncAnimation(fig, updatefig, interval=50, frames=20, blit=True)
```

This is an example of a sin pattern.

```from IPython.display import HTML
HTML(ani.to_jshtml())
```
```i=1
t2 = 6*(i+1)
y2 = int(np.round(2*(i+1) + 2 + np.random.randn(1)))

fig = plt.figure()
im = plt.imshow(f(t2, y2))
ani = animation.FuncAnimation(fig, updatefig2, interval=50, frames=20, blit=True)

from IPython.display import HTML
HTML(ani.to_jshtml())
```

The data must have the form [n_samples,n_times,n_rows,n_columns,n_channels]

```#Class sin
Videos1 = np.zeros((20,20,50,130,1))
for j in range(20):
y = np.sin(t) + 0.05*np.random.randn(1, 20)
y = y.flatten()
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(23*(y[i]+1)+4))
Videos1[j,i,:,:,0] = f(t2,y2)/255
#Class constan
Videos2 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = int(np.round(25 + np.random.randn(1)))
y2 = int(np.round(65 + np.random.randn(1)))
Videos2[j,i,:,:,0] = f(t2,y2)/255
#Class ascending
Videos3 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(2*(i+1) + 2 + np.random.randn(1)))
Videos3[j,i,:,:,0] = f(t2,y2)/255
#Class descending
Videos4 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(2*(20-i)+ 2 + np.random.randn(1)))
Videos4[j,i,:,:,0] = f(t2,y2)/255
Videos = np.concatenate((Videos1,Videos2,Videos3,Videos4),axis=0)
```
```Videos.shape
```
```(80, 20, 50, 130, 1)
```
```Y = np.r_[np.zeros(20),np.ones(20),2*np.ones(20),3*np.ones(20)]
Y.shape
```
```(80,)
```
```# convert list of labels to binary class matrix
y_trainOHE = np_utils.to_categorical(Y)
nb_classes = y_trainOHE.shape[1]
```
```nb_classes
```
```4
```

There are Three ways to define the network:¶

```# define CNN model
cnn = Sequential()
# define LSTM model
model = Sequential()
```
```model = Sequential()
# define CNN model
# define LSTM model
```

Let’s define our architecture:

```rows = 50
columns = 130
channels = 1

model1 = Sequential()
activation='relu',
input_shape=(rows, columns, channels))))
activation='relu',
input_shape=(rows, columns, channels))))
activation='relu',
input_shape=(rows, columns, channels))))

# add the LSTM layer, and a final Dense layer
```
```WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
```
```model1.fit(Videos,y_trainOHE,epochs=40)
```
```Epoch 1/40
3/3 [==============================] - 10s 2s/step - loss: 1.3965 - accuracy: 0.0000e+00
Epoch 2/40
3/3 [==============================] - 0s 26ms/step - loss: 1.3859 - accuracy: 0.1531
Epoch 3/40
3/3 [==============================] - 0s 24ms/step - loss: 1.3817 - accuracy: 0.5133
Epoch 4/40
3/3 [==============================] - 0s 22ms/step - loss: 1.3776 - accuracy: 0.5391
Epoch 5/40
3/3 [==============================] - 0s 24ms/step - loss: 1.3717 - accuracy: 0.5117
Epoch 6/40
3/3 [==============================] - 0s 23ms/step - loss: 1.3581 - accuracy: 0.6234
Epoch 7/40
3/3 [==============================] - 0s 23ms/step - loss: 1.3255 - accuracy: 0.7617
Epoch 8/40
3/3 [==============================] - 0s 22ms/step - loss: 1.2354 - accuracy: 0.7656
Epoch 9/40
3/3 [==============================] - 0s 29ms/step - loss: 1.0149 - accuracy: 0.7500
Epoch 10/40
3/3 [==============================] - 0s 24ms/step - loss: 0.8141 - accuracy: 0.7203
Epoch 11/40
3/3 [==============================] - 0s 22ms/step - loss: 0.6688 - accuracy: 0.7305
Epoch 12/40
3/3 [==============================] - 0s 22ms/step - loss: 0.5424 - accuracy: 0.7539
Epoch 13/40
3/3 [==============================] - 0s 23ms/step - loss: 0.4381 - accuracy: 0.7695
Epoch 14/40
3/3 [==============================] - 0s 22ms/step - loss: 0.4224 - accuracy: 0.7227
Epoch 15/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3509 - accuracy: 0.7539
Epoch 16/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3967 - accuracy: 0.7148
Epoch 17/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3375 - accuracy: 0.7539
Epoch 18/40
3/3 [==============================] - 0s 23ms/step - loss: 0.4693 - accuracy: 0.7219
Epoch 19/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3175 - accuracy: 0.7656
Epoch 20/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3638 - accuracy: 0.7305
Epoch 21/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3207 - accuracy: 0.7617
Epoch 22/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3099 - accuracy: 0.7695
Epoch 23/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3154 - accuracy: 1.0000
Epoch 24/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3466 - accuracy: 1.0000
Epoch 25/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3359 - accuracy: 1.0000
Epoch 26/40
3/3 [==============================] - 0s 22ms/step - loss: 0.2925 - accuracy: 1.0000
Epoch 27/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3369 - accuracy: 1.0000
Epoch 28/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3298 - accuracy: 1.0000
Epoch 29/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3689 - accuracy: 1.0000
Epoch 30/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3623 - accuracy: 1.0000
Epoch 31/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3201 - accuracy: 1.0000
Epoch 32/40
3/3 [==============================] - 0s 23ms/step - loss: 0.3391 - accuracy: 1.0000
Epoch 33/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3277 - accuracy: 1.0000
Epoch 34/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3064 - accuracy: 1.0000
Epoch 35/40
3/3 [==============================] - 0s 22ms/step - loss: 0.3102 - accuracy: 1.0000
Epoch 36/40
3/3 [==============================] - 0s 21ms/step - loss: 0.3140 - accuracy: 1.0000
Epoch 37/40
3/3 [==============================] - 0s 22ms/step - loss: 0.2978 - accuracy: 1.0000
Epoch 38/40
3/3 [==============================] - 0s 21ms/step - loss: 0.3016 - accuracy: 1.0000
Epoch 39/40
3/3 [==============================] - 0s 21ms/step - loss: 0.3053 - accuracy: 1.0000
Epoch 40/40
3/3 [==============================] - 0s 21ms/step - loss: 0.3286 - accuracy: 1.0000
```
```<tensorflow.python.keras.callbacks.History at 0x7f3e2c2c3150>
```
```model1.summary()
```
```Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
time_distributed (TimeDistri (None, 20, 50, 130, 5)    85
_________________________________________________________________
time_distributed_1 (TimeDist (None, 20, 25, 65, 5)     0
_________________________________________________________________
time_distributed_2 (TimeDist (None, 20, 25, 65, 5)     1605
_________________________________________________________________
time_distributed_3 (TimeDist (None, 20, 12, 32, 5)     0
_________________________________________________________________
time_distributed_4 (TimeDist (None, 20, 12, 32, 5)     1605
_________________________________________________________________
time_distributed_5 (TimeDist (None, 20, 6, 16, 5)      0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 20, 480)           0
_________________________________________________________________
lstm (LSTM)                  (None, 5)                 9720
_________________________________________________________________
dense (Dense)                (None, 4)                 24
=================================================================
Total params: 13,039
Trainable params: 13,039
Non-trainable params: 0
_________________________________________________________________
```

Let’s create a new set of videos to validate the model:

```#Class sin
Videos1 = np.zeros((20,20,50,130,1))
for j in range(20):
y = np.sin(t) + 0.05*np.random.randn(1, 20)
y = y.flatten()
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(23*(y[i]+1)+4))
Videos1[j,i,:,:,0] = f(t2,y2)/255
#Class constan
Videos2 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = int(np.round(25 + np.random.randn(1)))
y2 = int(np.round(65 + np.random.randn(1)))
Videos2[j,i,:,:,0] = f(t2,y2)/255
#Class ascending
Videos3 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(2*(i+1) + 2 + np.random.randn(1)))
Videos3[j,i,:,:,0] = f(t2,y2)/255
#Class descending
Videos4 = np.zeros((20,20,50,130,1))
for j in range(20):
for i in range(20):
t2 = 6*(i+1)
y2 = int(np.round(2*(20-i)+ 2 + np.random.randn(1)))
Videos4[j,i,:,:,0] = f(t2,y2)/255
VideosTest = np.concatenate((Videos1,Videos2,Videos3,Videos4),axis=0)
```
```y_est = np.argmax(model1.predict(VideosTest),axis=1)
print('accuracy testing = {}'.format(np.sum(y_est==Y)/80))
```
```accuracy testing = 1.0
```

3) Convolutional LSTM¶

```from IPython.display import Image
Image(filename='local/imgs/ConvLSTM.png', width=600)
```

Image taken from here

ConvLSTMs are similar to a LSTMs, but the internal matrix multiplications are replaced by convolutions. The object that flows trough the cell is a 3D tensor instead of being just a 1D vector with features, like in ‘peephole’ LSTMs.

```from tensorflow.keras.layers import ConvLSTM2D, BatchNormalization
frames = 20
model2 = Sequential()
activation='relu',
input_shape=(frames,rows, columns, channels),return_sequences=True))
activation='relu',
return_sequences=True))
activation='relu',

```
```model2.fit(Videos,y_trainOHE,epochs=80)
```
```Epoch 1/80
3/3 [==============================] - 20s 4s/step - loss: 1.3864 - accuracy: 0.0797
Epoch 2/80
3/3 [==============================] - 1s 332ms/step - loss: 1.3855 - accuracy: 0.3148
Epoch 3/80
3/3 [==============================] - 1s 328ms/step - loss: 1.3829 - accuracy: 0.2500
Epoch 4/80
3/3 [==============================] - 1s 327ms/step - loss: 1.3702 - accuracy: 0.2539
Epoch 5/80
3/3 [==============================] - 1s 326ms/step - loss: 1.2301 - accuracy: 0.2766
Epoch 6/80
3/3 [==============================] - 1s 326ms/step - loss: 2.4703 - accuracy: 0.4383
Epoch 7/80
3/3 [==============================] - 1s 326ms/step - loss: 1.1096 - accuracy: 0.7578
Epoch 8/80
3/3 [==============================] - 1s 326ms/step - loss: 1.3064 - accuracy: 0.7156
Epoch 9/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3381 - accuracy: 0.5000
Epoch 10/80
3/3 [==============================] - 1s 351ms/step - loss: 1.3508 - accuracy: 0.4961
Epoch 11/80
3/3 [==============================] - 1s 342ms/step - loss: 1.3557 - accuracy: 0.5234
Epoch 12/80
3/3 [==============================] - 1s 338ms/step - loss: 1.3569 - accuracy: 0.5273
Epoch 13/80
3/3 [==============================] - 1s 325ms/step - loss: 1.3601 - accuracy: 0.5117
Epoch 14/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3586 - accuracy: 0.5117
Epoch 15/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3593 - accuracy: 0.4883
Epoch 16/80
3/3 [==============================] - 1s 327ms/step - loss: 1.3512 - accuracy: 0.5039
Epoch 17/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3440 - accuracy: 0.4883
Epoch 18/80
3/3 [==============================] - 1s 323ms/step - loss: 1.3392 - accuracy: 0.5000
Epoch 19/80
3/3 [==============================] - 1s 322ms/step - loss: 1.3193 - accuracy: 0.4922
Epoch 20/80
3/3 [==============================] - 1s 324ms/step - loss: 1.2831 - accuracy: 0.4961
Epoch 21/80
3/3 [==============================] - 1s 323ms/step - loss: 1.1252 - accuracy: 0.6234
Epoch 22/80
3/3 [==============================] - 1s 328ms/step - loss: 0.9421 - accuracy: 0.6000
Epoch 23/80
3/3 [==============================] - 1s 332ms/step - loss: 0.6475 - accuracy: 0.7344
Epoch 24/80
3/3 [==============================] - 1s 327ms/step - loss: 1.9005 - accuracy: 0.9500
Epoch 25/80
3/3 [==============================] - 1s 334ms/step - loss: 1.3344 - accuracy: 0.8836
Epoch 26/80
3/3 [==============================] - 1s 322ms/step - loss: 0.9062 - accuracy: 0.7578
Epoch 27/80
3/3 [==============================] - 1s 325ms/step - loss: 1.2068 - accuracy: 0.7500
Epoch 28/80
3/3 [==============================] - 1s 321ms/step - loss: 1.2847 - accuracy: 0.7656
Epoch 29/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3123 - accuracy: 0.7383
Epoch 30/80
3/3 [==============================] - 1s 331ms/step - loss: 1.3144 - accuracy: 0.7383
Epoch 31/80
3/3 [==============================] - 1s 333ms/step - loss: 1.3167 - accuracy: 0.7461
Epoch 32/80
3/3 [==============================] - 1s 326ms/step - loss: 1.3109 - accuracy: 0.7500
Epoch 33/80
3/3 [==============================] - 1s 324ms/step - loss: 1.3071 - accuracy: 0.7305
Epoch 34/80
3/3 [==============================] - 1s 327ms/step - loss: 1.2995 - accuracy: 0.7656
Epoch 35/80
3/3 [==============================] - 1s 329ms/step - loss: 1.2771 - accuracy: 0.7539
Epoch 36/80
3/3 [==============================] - 1s 325ms/step - loss: 1.2568 - accuracy: 0.7578
Epoch 37/80
3/3 [==============================] - 1s 326ms/step - loss: 1.2318 - accuracy: 0.7656
Epoch 38/80
3/3 [==============================] - 1s 329ms/step - loss: 1.1938 - accuracy: 0.7461
Epoch 39/80
3/3 [==============================] - 1s 329ms/step - loss: 1.1027 - accuracy: 0.7266
Epoch 40/80
3/3 [==============================] - 1s 324ms/step - loss: 0.9896 - accuracy: 0.7656
Epoch 41/80
3/3 [==============================] - 1s 326ms/step - loss: 0.9392 - accuracy: 0.7539
Epoch 42/80
3/3 [==============================] - 1s 336ms/step - loss: 0.9569 - accuracy: 0.7578
Epoch 43/80
3/3 [==============================] - 1s 327ms/step - loss: 0.9174 - accuracy: 0.7578
Epoch 44/80
3/3 [==============================] - 1s 329ms/step - loss: 0.9180 - accuracy: 0.7578
Epoch 45/80
3/3 [==============================] - 1s 324ms/step - loss: 0.8685 - accuracy: 0.7500
Epoch 46/80
3/3 [==============================] - 1s 333ms/step - loss: 0.8817 - accuracy: 0.7305
Epoch 47/80
3/3 [==============================] - 1s 328ms/step - loss: 0.8501 - accuracy: 0.7578
Epoch 48/80
3/3 [==============================] - 1s 326ms/step - loss: 0.8742 - accuracy: 0.7578
Epoch 49/80
3/3 [==============================] - 1s 324ms/step - loss: 0.7940 - accuracy: 0.7500
Epoch 50/80
3/3 [==============================] - 1s 325ms/step - loss: 0.7483 - accuracy: 0.7344
Epoch 51/80
3/3 [==============================] - 1s 327ms/step - loss: 0.6650 - accuracy: 0.7773
Epoch 52/80
3/3 [==============================] - 1s 330ms/step - loss: 0.6372 - accuracy: 0.7500
Epoch 53/80
3/3 [==============================] - 1s 328ms/step - loss: 0.6439 - accuracy: 0.7344
Epoch 54/80
3/3 [==============================] - 1s 326ms/step - loss: 0.5533 - accuracy: 0.7461
Epoch 55/80
3/3 [==============================] - 1s 325ms/step - loss: 0.5639 - accuracy: 0.8070
Epoch 56/80
3/3 [==============================] - 1s 325ms/step - loss: 0.4996 - accuracy: 0.9234
Epoch 57/80
3/3 [==============================] - 1s 326ms/step - loss: 0.4788 - accuracy: 1.0000
Epoch 58/80
3/3 [==============================] - 1s 331ms/step - loss: 0.4304 - accuracy: 1.0000
Epoch 59/80
3/3 [==============================] - 1s 326ms/step - loss: 0.3954 - accuracy: 1.0000
Epoch 60/80
3/3 [==============================] - 1s 341ms/step - loss: 0.3930 - accuracy: 1.0000
Epoch 61/80
3/3 [==============================] - 1s 345ms/step - loss: 0.3937 - accuracy: 1.0000
Epoch 62/80
3/3 [==============================] - 1s 326ms/step - loss: 0.3780 - accuracy: 1.0000
Epoch 63/80
3/3 [==============================] - 1s 327ms/step - loss: 0.3185 - accuracy: 1.0000
Epoch 64/80
3/3 [==============================] - 1s 330ms/step - loss: 0.3266 - accuracy: 1.0000
Epoch 65/80
3/3 [==============================] - 1s 329ms/step - loss: 0.3255 - accuracy: 1.0000
Epoch 66/80
3/3 [==============================] - 1s 331ms/step - loss: 0.3503 - accuracy: 1.0000
Epoch 67/80
3/3 [==============================] - 1s 349ms/step - loss: 0.2804 - accuracy: 1.0000
Epoch 68/80
3/3 [==============================] - 1s 343ms/step - loss: 0.2840 - accuracy: 1.0000
Epoch 69/80
3/3 [==============================] - 1s 343ms/step - loss: 0.3019 - accuracy: 1.0000
Epoch 70/80
3/3 [==============================] - 1s 323ms/step - loss: 0.3245 - accuracy: 1.0000
Epoch 71/80
3/3 [==============================] - 1s 330ms/step - loss: 0.2802 - accuracy: 1.0000
Epoch 72/80
3/3 [==============================] - 1s 333ms/step - loss: 0.3074 - accuracy: 1.0000
Epoch 73/80
3/3 [==============================] - 1s 328ms/step - loss: 0.3013 - accuracy: 1.0000
Epoch 74/80
3/3 [==============================] - 1s 344ms/step - loss: 0.3188 - accuracy: 1.0000
Epoch 75/80
3/3 [==============================] - 1s 327ms/step - loss: 0.3596 - accuracy: 1.0000
Epoch 76/80
3/3 [==============================] - 1s 327ms/step - loss: 0.2603 - accuracy: 1.0000
Epoch 77/80
3/3 [==============================] - 1s 328ms/step - loss: 0.3103 - accuracy: 1.0000
Epoch 78/80
3/3 [==============================] - 1s 321ms/step - loss: 0.3414 - accuracy: 1.0000
Epoch 79/80
3/3 [==============================] - 1s 341ms/step - loss: 0.2986 - accuracy: 1.0000
Epoch 80/80
3/3 [==============================] - 1s 327ms/step - loss: 0.3158 - accuracy: 1.0000
```
```<tensorflow.python.keras.callbacks.History at 0x7f3ef6688890>
```
```model2.summary()
```
```Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d (ConvLSTM2D)    (None, 20, 50, 130, 5)    1940
_________________________________________________________________
time_distributed_7 (TimeDist (None, 20, 25, 65, 5)     0
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D)  (None, 20, 25, 65, 5)     12820
_________________________________________________________________
time_distributed_8 (TimeDist (None, 20, 12, 32, 5)     0
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, 12, 32, 5)         12820
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 6, 16, 5)          0
_________________________________________________________________
flatten_1 (Flatten)          (None, 480)               0
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 1924
=================================================================
Total params: 29,504
Trainable params: 29,504
Non-trainable params: 0
_________________________________________________________________
```
```y_est = np.argmax(model2.predict(VideosTest),axis=1)
print('accuracy testing = {}'.format(np.sum(y_est==Y)/80))
```
```accuracy testing = 1.0
```