In [3]:
import tensorflow as tf

From "Hands-On Machine Learning with SciKit-Learn,  Keras and Tensorflow, 2nd Ed

Geron,  available in O'Reilly

**So just what is Tensorflow anyway?**

It is a library for numerical computation, much like Numpy, but intended mainly for large scale computations.

-numerical tools and storage forms like Numpy

-it supports distribued computing,  like DASK and Spark it uses delayed execution and computational graphs

-it has a just in time compiler (jit) that extracts the computational graph from python code,  optimizes it and runs it in parallel

-Computation graphs can be exported and run in a platform agnostic way

-It has an autodiff function and has a number of strong optimizers

-Keras is a model building front end to simplify construction of large models

Python API pieces

High level ML API
-tf.keras
-tf.estimator

Lowlevel Deep learning API
tf.nn
tf.losses
tf.metrics
tf.optimizers
tf.train
tf. initializers

Autodiff
tf.GradientTape
tf.gradients()

I/O and preprocessing
tf.data
tf.feature_column
tf.audio
tf.image
tf.io
tf.queue

Visualization with Tensorboard
tf.summary

Deployment and Optimization
tf.distribute
tf.saved_model
tf.autograph
tf.graph_util
tf.lite
tf.quantization
tf.tpu
tf.xla

Special Data Structures
tf.lookup
tf.next
tf.ragged
tf.sets
tf.sparse
tf.strings

Mathematics
tf.math
tf.linalg
tf.signal
tf.random
tf.bitwise

Misc
tf.compat
ft.confir

Most opearations are written in C++

Coding Directly in Tensorflow

In [4]:
tf.constant([[1,2,3],[4,5,6]])   # create a matrix,  from python lists



<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6]], dtype=int32)>

In [5]:
tf.constant(42)

<tf.Tensor: shape=(), dtype=int32, numpy=42>

In [6]:
t=tf.constant([[1,2,3],[4,5,6]])
t.shape

TensorShape([2, 3])

In [7]:
t.dtype

tf.int32

In [8]:
# indexing works just like Numpy

t[:,1:]

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[2, 3],
       [5, 6]], dtype=int32)>

In [9]:
t+10

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[11, 12, 13],
       [14, 15, 16]], dtype=int32)>

In [10]:
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)>

In [11]:
# this is matrix multiplication t times it's own transpose

t@tf.transpose(t)

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[14, 32],
       [32, 77]], dtype=int32)>

Many common functions include the word "reduce" meaning they are gathering data that may be spread on many machines

In [12]:
# most numpy style operations on arrays are available,  tf.reduce_mean(), tf.reduce_sum, tf.math.log(), tf.reduce_max

x=tf.reduce_mean(t)
x

<tf.Tensor: shape=(), dtype=int32, numpy=3>

**The keras backend K**

Keras mostly has high level functions to set up and train models, but there are
low level models in the Keras "backend", called K

These are transportable to other systems that use keras

(you can use Keras with other ML packages and tools, beyond Tensorflow)

In [13]:
from tensorflow import keras
K=keras.backend
K.square(K.transpose(t))+10

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[11, 26],
       [14, 35],
       [19, 46]], dtype=int32)>

**Tensors and Numpy**

You can convert easily between tensors and numpy matrices

You can apply numpy operations to tensors and vice versa

{So what it the difference?   Tensors can be distributed across other computers, or GPUs and TPUs and have operations run on them there.   Numpy is strictly for one machine}

In [14]:
# numpy to tensor
import numpy as np


a = np.array([2., 4., 5.])
tf.constant(a)



<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [15]:
t.numpy() # or np.array(t),  converts from tensor to numpy

array([[1, 2, 3],
       [4, 5, 6]], dtype=int32)

In [16]:
tf.square(a)           # tf operation on a numpy array

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

In [17]:
np.square(t)          #numpy operation on a tensor

array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)

Question/Action

What happens if you try a tf operation on a pandas dataframe

Experiment and find out...

Type Conversions

Python is pretty relaxed about type conversions, it does them "on the fly"

1+1.5 is the addition of an integer to a float,  python just converts the int to float and continues

But, this is an expensive operation in terms of time,  tf doesn't allow this

In [18]:
tf.constant(2.) + tf.constant(40)

InvalidArgumentError: ignored

In [20]:
tf.constant(2.) + tf.constant(40.,dtype=tf.float64)

InvalidArgumentError: ignored

The default float in TensorFlow looks to be 32 bits rather than the default python 64 bits.  Why would you do this?

In [21]:
t2 = tf.constant(40., dtype=tf.float64)

tf.constant(2.0) + tf.cast(t2, tf.float32)

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

In [22]:
# tf constants are immutable, variables can be changed

v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])

In [23]:
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [24]:
v.assign(2*v)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [25]:
v[0,1].assign(42)
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [26]:
v[:, 2].assign([0., 1.])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

In [27]:
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

Other tf data structrues

Sparse Tensors
-for mostly zero matrics

Tensor Arrays
-lists of tensors

Ragged Tensors
-lists of tensors, where every tensor has the the same shape and data type.

Stirng Tensors

Tensors of type tf.string

Sets

Represented as regular or sparse tensors,  there is a tf.sets package to work with them-   ???in Python, sets are hashed values, so determining set membership runs really fast, much faster than comparing a constant to an array. I don't know if that holds in tf,  worth knowing

**Customizing Models,  Loss Functions and Training Algorithms**

In [28]:
# an alternative loss, called at Huber Loss,  this returns the squared error at small errors
# and a linear absolute error at larger values.   It is less sensitive to large residuals that 
# mse is, but equals mse at small values

# note that this is a vectorized operation,   y_true and y_pred are tensors

def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss  = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

This function could now be used in a training operation

model.compile(loss=huber_fn, optimizer="nadam")

model.fit(X_train, y_train, [...])

Saving a model with a custom function

This operation saves the model with the custom function included

model = keras.models.load_model("my_model_with_a_custom_loss.h5",
                                custom_objects={"huber_fn": huber_fn})

More customizations

-Activation functions
-Initializers
-Regularizers
-Constraints

In [29]:
# this is a custom activation function for neurons

def my_softplus(z): # note: tf.nn.softplus(z) better handles large inputs
    return tf.math.log(tf.exp(z) + 1.0)

# this is a custom initializer, sets up initial weights

def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

# this is an alternative L1 regularizer

def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

# this forces weights to be positive, this is a constraint

def my_positive_weights(weights): # return value is just tf.nn.relu(weights)
    return tf.where(weights < 0., tf.zeros_like(weights), weights)

In [30]:
# here is how we would use these in a Keras model

layer = keras.layers.Dense(30, activation=my_softplus,
                           kernel_initializer=my_glorot_initializer,
                           kernel_regularizer=my_l1_regularizer,
                           kernel_constraint=my_positive_weights)

Building Custom Layers

-for functions not found in Keras

-If you have many repeating layers A,B,C,A,B,C,A,B,C
      
      -create D=A,B,C,   the model is then D,D,D



In [31]:
# custom Layer
exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

In [32]:
# custom multilayer "Layer"
class MyMultiLayer(keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        return [X1 + X2, X1 * X2, X1 / X2]

    def compute_output_shape(self, batch_input_shape):
        b1, b2 = batch_input_shape
        return [b1, b1, b1] # should probably handle broadcasting rules

**Computing Gradients with Autodiff**

In [33]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

In [34]:
w1=5
w2=3
eps=1e-6


In [35]:
# calculating a derivative with respect to w1

(f(w1 + eps, w2) - f(w1, w2)) / eps

36.000003007075065

In [36]:
# Derivative with respect to w

(f(w1, w2 + eps) - f(w1, w2)) / eps

10.000000003174137

In [37]:
# using Gradient Tape and gradients

w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

In [38]:
tape

<tensorflow.python.eager.backprop.GradientTape at 0x7fd4766c0e50>

In [39]:
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

Tapes erase when you call the gradient function on it, so make it persistent to store it,  but this isn't usually necessary,  you want it erased to save memory

In [45]:
with tf.GradientTape() as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1) # => tensor 36.0
#dz_dw2 = tape.gradient(z, w2)

In [46]:
dz_dw1

<tf.Tensor: shape=(), dtype=float32, numpy=36.0>

In [41]:
c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2]) # returns [None, None],  these are constants

In [42]:
gradients

[None, None]

In [43]:
# tapes can watch variables or constants thoug

with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])

In [44]:
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

More to come

-custom training loops

-Generating TF functions and graphs,  converting python based functions to tf functions by computing their computational graphs