3.1 - Symbolic computing for ML#

!wget -nc --no-cache -O init.py -q https://raw.githubusercontent.com/rramosp/2021.deeplearning/main/content/init.py
import init; init.init(force_download=False); 
import sys
import sympy as sy
%load_ext tensorboard
sy.init_printing(use_latex=True)    
import tensorflow as tf
tf.__version__
'2.4.0'

Recall the machine learning algorithm design process#

from IPython.display import Image
Image(filename='local/imgs/mldesign.jpg', width=800)

And how we sorted it out for linear regression using a generic optimization library#

Input and expected output (supervised learning)

  • \(\mathbf{x}^{(i)} \in \mathbb{R}^n\), \(y^{(i)} \in \mathbb{R}\)

Predicción model

  • \(\hat{y}^{(i)} = \overline{\theta} \dot \; \mathbf{x}^{(i)}\), \(\;\;\;\;\)with \(\overline{\theta} \in \mathbb{R}^n\) and assuming \(\mathbf{x}^{(i)}_0=1\)

Loss function

  • \(J(\overline{\theta}) = \frac{1}{m} \sum_{i=0}^{m-1}(\overline{\theta} \dot \; \mathbf{x}^{(i)} - y^{(i)})^2\)

Gradient of loss function (matrix form)

  • \(\nabla J = \begin{bmatrix} \frac{\partial J}{\partial \theta_0}\\ \frac{\partial J}{\partial \theta_1} \end{bmatrix} = \frac{1}{m}2X^{T}\cdot(X\cdot\theta-Y)\)

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import minimize
%matplotlib inline
d = pd.read_csv("local/data/trilotropicos.csv")
print(d.shape)
plt.scatter(d.longitud, d.densidad_escamas)
plt.xlabel(d.columns[0])
plt.ylabel(d.columns[1]);
(150, 2)
../_images/8ec717136a4f60588c2098fe8559b623af588cd8834b1493d8a439e184cfdfc9.png
y = d.densidad_escamas.values
X = np.r_[[[1]*len(d), d.longitud.values]].T


def n_cost(t):
    return np.mean((X.dot(t)-y)**2)

def n_grad(t):
    return 2*X.T.dot(X.dot(t)-y)/len(X)

init_t = np.random.random()*40-5, np.random.random()*20-10
r = minimize(n_cost, init_t, method="BFGS", jac=n_grad)
r
      fun: 2.7447662570806624
 hess_inv: array([[ 5.57455133, -1.09184708],
       [-1.09184708,  0.23443488]])
      jac: array([-1.43724881e-06, -7.18659553e-06])
  message: 'Optimization terminated successfully.'
     nfev: 11
      nit: 10
     njev: 11
   status: 0
  success: True
        x: array([12.68999789, -0.71805919])

Using sympy computer algebra system (CAS)#

x,y = sy.symbols("x y")
z = x**2 + x*sy.cos(y)
z
../_images/326cdb4c717b4c6f34201c90bc84c51866cc75f6080426a0f8fec3a6702fded6.png

we can evaluate the expresion by providing concrete values for the symbolic variables

z.subs({x: 2, y: sy.pi/4})
../_images/9919d66b081357237aa8eab01a4d0793a63db2e02ddaeaacdf4b9685333a01c9.png

and obtain numerical approximations of these values

sy.N(z.subs({x: 2, y: sy.pi/4}))
../_images/4cba5e0e88413e47c75a3b57d6fe6f87fc597002c68cb7be3351668af485c5bc.png

a derivative can be seen as a function that inputs and expression and outputs another expression

observe how we compute \(\frac{\partial z}{\partial x}\) and \(\frac{\partial z}{\partial y}\)

z.diff(x)
../_images/943d3a05ac86f89897095b27c1ca698aa21d3171b60decf1782d9a7d3ccd820f.png
z.diff(y)
../_images/6a7bd9188fcf4be79429b68580d416937d748211ab3749fb82d8354d619ac0ff.png
r = z.diff(x).subs({x: 2, y: sy.pi/4})
r, sy.N(r)
../_images/f52fdd0de41a290477686714d2bc89477065e225c2110bb2dc2680a7762a3441.png
r = z.diff(y).subs({x: 2, y: sy.pi/4})
r, sy.N(r)
../_images/83df28362074c6d09e117c62ccca65a5bdc3c84f8087749d071d0c343637531e.png

EXERCISE: draw the computational graph of \(x^2+x\cos(x)\) and show how to differentiate mechanically using the graphs.

More things you can do with sympy (and almost any CAS)

sy.expand((x+2)**2)
../_images/20171b8c1ba281577ac4f0bc96aed12b4aef2ea46efb200e1818b22729d5b743.png
sy.factor( x**2-2*x-8 )
../_images/d6b8ec8f116c0f992daa6f76acb34094b577968031150a2f22406e35c9ce6f85.png
sy.solve( x**2 + 2*x - 8, x)
../_images/5aa4c743211abc0c175e26ec351c20b45765d38fa30821bfddafbcf6db7e20cc.png
a = sy.symbols("alpha")
sy.solve( a*x**2 + 2*x - 8, x)
../_images/45fa51666412e4a451bf96d55254be476eb7fe2b381a2ee6186c7b2a943aaba1.png

differential equations, solving \(\frac{df}{dt}=f(t)+t\)

t, C1 = sy.symbols("t C1")
f = sy.symbols("f", cls=sy.Function)
dydt = f(t)+t
eq = dydt-sy.diff(f(t),t)
yt = sy.dsolve(eq, f(t))
yt
../_images/7625d63bb7579cb7dbe31aa36d14c1c823e6ba6adc4d02bc374a28c9424b734f.png

systems of equations

sy.solve ([x**2+y, 3*y-x])
../_images/811d0828ead3c1067981fd33b060305283c474a915c943e2a651115d0e9ef254.png

Sympy to Python and Numpy#

See Sympy Numeric Computation

f = (sy.sin(x) + x**2)/2
f
../_images/27bcc6a2680c831227172c596a8a839ca809ce66732f66b34742021d4e4a4ed3.png
f.subs({x:10})
../_images/2e5238f086951b5b9c7f6a60f22b58a70df3d0a0720a5259b54e422007481c22.png
sy.N(f.subs({x:10}))
../_images/77b5301887afad6a52a9bd95698ef2a2d987c3c327ce11146211ebcf9bb8ce76.png
f1 = sy.lambdify(x, f)
f1(10)
../_images/77b5301887afad6a52a9bd95698ef2a2d987c3c327ce11146211ebcf9bb8ce76.png

and a vectorized version

f2 = sy.lambdify(x, f, "numpy")
f2(10)
../_images/77b5301887afad6a52a9bd95698ef2a2d987c3c327ce11146211ebcf9bb8ce76.png
f2(np.array([10,2,3]))
array([49.72798944,  2.45464871,  4.57056   ])

the lambdified version is faster, and the vectorized one is even faster

%timeit sy.N(f.subs({x:10}))
125 µs ± 8.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit f1(10)
1.33 µs ± 26.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [f1(i) for i in range(1000)]
1.38 ms ± 26.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit f2(np.arange(1000))
17.6 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using sympy to obtain the gradient.#

y = d.densidad_escamas.values
X = np.r_[[[1]*len(d), d.longitud.values]].T

t0,t1 = sy.symbols("theta_0 theta_1")
t0,t1
../_images/f7e7cd4f16364918a0ee0f86bc2226359c22c743be432820d53e52d0000a4296.png

we first obtain the cost expression for a few summation terms, so that we can print it out and understand it

expr = 0
for i in range(10):
    expr += (X[i,0]*t0+X[i,1]*t1-y[i])**2
expr = expr/len(X)
expr
../_images/bec04c6c3a987f0f26299e039bde52aef3308401953759bee64d22440d04f9fa.png

find X[0] and y[0] in the expression above, beware that you might get simplifications and reordering of the expression by sympy

y[:10]

we can now simplify the expression, using sympy mechanics

expr = expr.simplify()
expr
../_images/4a78c56f126c63276d42cb5f6e92d8976616047cd66dc18f0833450b73b970c6.png

we now build the full expression

def build_logisitic_regression_cost_expression(X,y):
    expr_cost = 0
    for i in range(len(X)):
        expr_cost += (X[i,0]*t0+X[i,1]*t1-y[i])**2/len(X)
    expr_cost = expr_cost.simplify()
    return expr_cost
y = d.densidad_escamas.values
X = np.r_[[[1]*len(d), d.longitud.values]].T

expr_cost = build_logisitic_regression_cost_expression(X,y)
expr_cost
../_images/a3206f902f88cd537bc0734b49d0bd22ce2b3176fe81f577236b0e4d548c0423.png
%timeit build_logisitic_regression_cost_expression(X,y)
2.6 s ± 68.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

obtain derivatives symbolically

expr_dt0 = expr_cost.diff(t0)
expr_dt1 = expr_cost.diff(t1)
expr_dt0, expr_dt1
../_images/95d04781d80663dffc0db800d8839516dac5b9c5e205c5299ba49483522588a1.png

and obtain regular Python so that we can use them in optimization

s_cost = sy.lambdify([[t0,t1]], expr_cost, "numpy")

d0 = sy.lambdify([[t0,t1]], expr_dt0, "numpy")
d1 = sy.lambdify([[t0,t1]], expr_dt1, "numpy")
s_grad = lambda x: np.array([d0(x), d1(x)])

and now we can minimize

r = minimize(s_cost, [0,0], jac=s_grad, method="BFGS")
r
      fun: 2.7447662570799594
 hess_inv: array([[ 5.57425906, -1.09086733],
       [-1.09086733,  0.23434848]])
      jac: array([-1.73778293e-07, -9.26928138e-07])
  message: 'Optimization terminated successfully.'
     nfev: 10
      nit: 9
     njev: 10
   status: 0
  success: True
        x: array([12.6899981, -0.7180591])

observe that hand derived functions and the ones obtained by sympy evaluate to the same values

t0 = np.random.random()*5+10
t1 = np.random.random()*4-3
t = np.r_[t0,t1]

print ("theta:",t)
print ("cost analytic:", n_cost(t))
print ("cost symbolic:", s_cost(t))

print ("gradient analytic:", n_grad(t))
print ("gradient symbolic:", s_grad(t))
theta: [13.38981165 -1.04706873]
cost analytic: 3.664362725033482
cost symbolic: 3.6643627250332145
gradient analytic: [-1.66013083 -9.12123711]
gradient symbolic: [-1.66013083 -9.12123711]

and we are still using black box optimization!!!!#