<font color="#de3023"><h1><b>REMINDER: MAKE A COPY OF THIS NOTEBOOK, DO NOT EDIT</b></h1></font>

# Lab 2: Tensorflow Basics

This tutorial is based on original work from the Tensorflow team at Google. For more information, please visit https://www.tensorflow.org/tutorials/quickstart/beginner

This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:

1. Load a prebuilt dataset.
1. Build a neural network machine learning model that classifies images.
2. Train this neural network.
3. Evaluate the accuracy of the model.

This tutorial is a [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) notebook. Python programs are run directly in the browser—a great way to learn and use TensorFlow. To follow this tutorial, run the notebook in Google Colab by clicking the button at the top of this page.

1. In Colab, connect to a Python runtime: At the top-right of the menu bar, select *CONNECT*.
2. Run all the notebook code cells: Select *Runtime* > *Run all*.

## Set up TensorFlow

Import TensorFlow into your program to get started:

In [None]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.8.2


## Load a dataset

Don't worry about the details in the cells below: we will get to understanding this later! Just run it for now: the goal is to understand the overall "machine learning framework"

Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the sample data from integers to floating-point numbers:

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Build a machine learning model

Build a `tf.keras.Sequential` model by stacking layers.

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

For each example, the model returns a vector of [logits](https://developers.google.com/machine-learning/glossary#logits) or [log-odds](https://developers.google.com/machine-learning/glossary#log-odds) scores, one for each class.

In [None]:
predictions = model(x_train[:1]).numpy()
predictions

array([[ 0.21755221, -0.8391819 , -0.02329049,  0.00626514,  0.46226948,
        -0.25543362, -0.34161922,  0.2674954 ,  0.2777203 ,  0.20201778]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to *probabilities* for each class: 

In [None]:
tf.nn.softmax(predictions).numpy()

array([[0.11746754, 0.04083044, 0.0923254 , 0.09509486, 0.1500366 ,
        0.07319859, 0.06715415, 0.12348321, 0.1247523 , 0.11565685]],
      dtype=float32)

Note: It is possible to bake the `tf.nn.softmax` function into the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output. 

Define a loss function for training using `losses.SparseCategoricalCrossentropy`, which takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [None]:
loss_fn(y_train[:1], predictions).numpy()

2.6145792

Before you start training, configure and compile the model using Keras `Model.compile`. Set the [`optimizer`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) class to `adam`, set the `loss` to the `loss_fn` function you defined earlier, and specify a metric to be evaluated for the model by setting the `metrics` parameter to `accuracy`.

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

## Train and evaluate your model

Use the `Model.fit` method to adjust your model parameters and minimize the loss: 

In [None]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f72696baf50>

The `Model.evaluate` method checks the models performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

In [None]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.0737 - accuracy: 0.9765 - 488ms/epoch - 2ms/step


[0.07366463541984558, 0.9764999747276306]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:

In [None]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

In [None]:
probability_model(x_test[:5])

## Conclusion

Congratulations! You have trained a machine learning model using a prebuilt dataset using the [Keras](https://www.tensorflow.org/guide/keras/overview) API.

For more examples of using Keras, check out the [tutorials](https://www.tensorflow.org/tutorials/keras/). To learn more about building models with Keras, read the [guides](https://www.tensorflow.org/guide/keras). If you want learn more about loading and preparing data, see the tutorials on [image data loading](https://www.tensorflow.org/tutorials/load_data/images) or [CSV data loading](https://www.tensorflow.org/tutorials/load_data/csv).


# Introduction to Tensors

In [None]:
import tensorflow as tf
import numpy as np

Tensors are multi-dimensional arrays with a uniform type (called a `dtype`).  You can see all supported `dtypes` at `tf.dtypes.DType`.

If you're familiar with [NumPy](https://numpy.org/devdocs/user/quickstart.html){:.external}, tensors are (kind of) like `np.arrays`.

All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.


## Basics

First, create some basic tensors.

Here is a "scalar" or "rank-0" tensor . A scalar contains a single value, and no "axes".

In [None]:
# This will be an int32 tensor by default; see "dtypes" below.
rank_0_tensor = tf.constant(4)
print(rank_0_tensor)

tf.Tensor(4, shape=(), dtype=int32)


A "vector" or "rank-1" tensor is like a list of values. A vector has one axis:

In [None]:
# Let's make this a float tensor.
rank_1_tensor = tf.constant([2.0, 3.0, 4.0])
print(rank_1_tensor)

In [None]:
#############################################################################
# Task
# • Construct a rank-1 tensor of ones (tf.ones) of length 52.
#############################################################################

# Replace "____" statements with your code
x = tf.ones(52)
print(x.shape)

(52,)


A "matrix" or "rank-2" tensor has two axes:

In [None]:
# If you want to be specific, you can set the dtype (see below) at creation time
rank_2_tensor = tf.constant([[1, 2],
                             [3, 4],
                             [5, 6]], dtype=tf.float16)
print(rank_2_tensor)

<table>
<tr>
  <th>A scalar, shape: <code>[]</code></th>
  <th>A vector, shape: <code>[3]</code></th>
  <th>A matrix, shape: <code>[3, 2]</code></th>
</tr>
<tr>
  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/scalar.png?raw=1" alt="A scalar, the number 4" />
  </td>

  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/vector.png?raw=1" alt="The line with 3 sections, each one containing a number."/>
  </td>
  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/matrix.png?raw=1" alt="A 3x2 grid, with each cell containing a number.">
  </td>
</tr>
</table>


In [None]:
#############################################################################
# Task
# • Construct a rank-2 tensor of random (tf.random.uniform) of shape 3 x 9. Cast
#   the x to be of dtype float64
#############################################################################

# Replace "____" statements with your code
x = tf.random.uniform((3, 9))
print(x.shape)
print(x.dtype)

(3, 9)
<dtype: 'float32'>


Tensors may have more axes; here is a tensor with three axes:

In [None]:
# There can be an arbitrary number of
# axes (sometimes called "dimensions")
rank_3_tensor = tf.constant([
  [[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]],
  [[10, 11, 12, 13, 14],
   [15, 16, 17, 18, 19]],
  [[20, 21, 22, 23, 24],
   [25, 26, 27, 28, 29]],])

print(rank_3_tensor)

There are many ways you might visualize a tensor with more than two axes.

<table>
<tr>
  <th colspan=3>A 3-axis tensor, shape: <code>[3, 2, 5]</code></th>
<tr>
<tr>
  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/3-axis_numpy.png?raw=1"/>
  </td>
  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/3-axis_front.png?raw=1"/>
  </td>

  <td>
   <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/3-axis_block.png?raw=1"/>
  </td>
</tr>

</table>

In [None]:
#############################################################################
# Task
# • Construct a rank-3 tensor of your choice of shape 3 x 2 x 7. Cast
#   the x to be of dtype float32
#############################################################################

# Replace "____" statements with your code
x = tf.random.uniform((3,2,7))
print(x.shape)
print(x.dtype)

(3, 2, 7)
<dtype: 'float32'>


You can convert a tensor to a NumPy array either using `np.array` or the `tensor.numpy` method:

In [None]:
np.array(x)

In [None]:
#############################################################################
# Task
# • Call the numpy() method on x to convert from a tf.Tensor to a np.Array
#############################################################################

# Replace "____" statements with your code
print(type(x.numpy()))

<class 'numpy.ndarray'>


Tensors often contain floats and ints, but have many other types, including:

* complex numbers
* strings

The base `tf.Tensor` class requires tensors to be "rectangular"---that is, along each axis, every element is the same size.  However, there are specialized types of tensors that can handle different shapes:

* Ragged tensors (see [RaggedTensor](#ragged_tensors) below)
* Sparse tensors (see [SparseTensor](#sparse_tensors) below)

You can do basic math on tensors, including addition, element-wise multiplication, and matrix multiplication.

In [None]:
a = tf.constant([[1, 2],
                 [3, 4]])
b = tf.constant([[1, 1],
                 [1, 1]]) # Could have also said `tf.ones([2,2])`

print(tf.add(a, b), "\n")
print(tf.multiply(a, b), "\n")
print(tf.matmul(a, b), "\n")

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32) 

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

tf.Tensor(
[[3 3]
 [7 7]], shape=(2, 2), dtype=int32) 



In [None]:
print(a + b, "\n") # element-wise addition
print(a * b, "\n") # element-wise multiplication
print(a @ b, "\n") # matrix multiplication

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32) 

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

tf.Tensor(
[[3 3]
 [7 7]], shape=(2, 2), dtype=int32) 



In [None]:
#############################################################################
# Task (numpy -> TF review)
# • Print the shape of the *matrix multiplication* of x and y. What shape do you expect?
# • Print the shape of the *matrix multiplication* of x, y, and z. What shape do you expect?
#############################################################################

# Replace "____" statements with your code
x = tf.convert_to_tensor(np.random.random((12, 19)))
y = tf.convert_to_tensor(np.random.random((19, 21)))
z = tf.convert_to_tensor(np.random.random((21, 23)))

print((x @ y).shape)
print((x @ y @ z).shape)

(12, 21)
(12, 23)


Tensors are used in all kinds of operations (or "Ops").

In [None]:
c = tf.constant([[4.0, 5.0], [10.0, 1.0]])

# Find the largest value
print(tf.reduce_max(c))
# Find the index of the largest value
print(tf.math.argmax(c))
# Compute the softmax
print(tf.nn.softmax(c))

Note: Typically, anywhere a TensorFlow function expects a `Tensor` as input, the function will also accept anything that can be converted to a `Tensor` using `tf.convert_to_tensor`. See below for an example.

In [None]:
tf.convert_to_tensor([1,2,3])

In [None]:
tf.reduce_max([1,2,3])

In [None]:
tf.reduce_max(np.array([1,2,3]))

In [None]:
#############################################################################
# Task
# Search google/documentation to find functions for the following
# • Compute the sum of the elements of x
# • Compute the average of the sum of x and y
# • Compute the min of the matrix multiply of x, y, and z
#############################################################################

# Replace "____" statements with your code
x = tf.convert_to_tensor(np.random.random((12, 12)))
y = tf.convert_to_tensor(np.random.random((12, 12)))
z = tf.convert_to_tensor(np.random.random((12, 19)))

print(____)
print(____)
print(____)

## About shapes

Tensors have shapes.  Some vocabulary:

* **Shape**: The length (number of elements) of each of the axes of a tensor.
* **Rank**: Number of tensor axes.  A scalar has rank 0, a vector has rank 1, a matrix is rank 2.
* **Axis** or **Dimension**: A particular dimension of a tensor.
* **Size**: The total number of items in the tensor, the product of the shape vector's elements.


Note: Although you may see reference to a "tensor of two dimensions", a rank-2 tensor does not usually describe a 2D space.

Tensors and `tf.TensorShape` objects have convenient properties for accessing these:

In [None]:
rank_4_tensor = tf.zeros([3, 2, 4, 5])

<table>
<tr>
  <th colspan=2>A rank-4 tensor, shape: <code>[3, 2, 4, 5]</code></th>
</tr>
<tr>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/shape.png?raw=1" alt="A tensor shape is like a vector.">
    <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/4-axis_block.png?raw=1" alt="A 4-axis tensor">
  </td>
  </tr>
</table>


In [None]:
print("Type of every element:", rank_4_tensor.dtype)
print("Number of axes:", rank_4_tensor.ndim)
print("Shape of tensor:", rank_4_tensor.shape)
print("Elements along axis 0 of tensor:", rank_4_tensor.shape[0])
print("Elements along the last axis of tensor:", rank_4_tensor.shape[-1])
print("Total number of elements (3*2*4*5): ", tf.size(rank_4_tensor).numpy())

But note that the `Tensor.ndim` and `Tensor.shape` attributes don't return `Tensor` objects. If you need a `Tensor` use the `tf.rank` or `tf.shape` function. This difference is subtle, but it can be important when building graphs (later).

In [None]:
tf.rank(rank_4_tensor)

In [None]:
tf.shape(rank_4_tensor)

While axes are often referred to by their indices, you should always keep track of the meaning of each. Often axes are ordered from global to local: The batch axis first, followed by spatial dimensions, and features for each location last. This way feature vectors are contiguous regions of memory.

<table>
<tr>
<th>Typical axis order</th>
</tr>
<tr>
    <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/shape2.png?raw=1" alt="Keep track of what each axis is. A 4-axis tensor might be: Batch, Width, Height, Features">
  </td>
</tr>
</table>

## Indexing

### Single-axis indexing

TensorFlow follows standard Python indexing rules, similar to [indexing a list or a string in Python](https://docs.python.org/3/tutorial/introduction.html#strings){:.external}, and the basic rules for NumPy indexing.

* indexes start at `0`
* negative indices count backwards from the end
* colons, `:`, are used for slices: `start:stop:step`


In [None]:
rank_1_tensor = tf.constant([0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
print(rank_1_tensor.numpy())

[ 0  1  1  2  3  5  8 13 21 34]


Indexing with a scalar removes the axis:

In [None]:
print("First:", rank_1_tensor[0].numpy())
print("Second:", rank_1_tensor[1].numpy())
print("Last:", rank_1_tensor[-3].numpy())

First: 0
Second: 1
Last: 13


Indexing with a `:` slice keeps the axis:

In [None]:
print("Everything:", rank_1_tensor[:].numpy())
print("Before 4:", rank_1_tensor[:4].numpy())
print("From 4 to the end:", rank_1_tensor[4:].numpy())
print("From 2, before 7:", rank_1_tensor[2:7].numpy())
print("Every other item:", rank_1_tensor[::2].numpy())
print("Reversed:", rank_1_tensor[::-1].numpy())

In [None]:
#############################################################################
# Task
# • Print the shape of test (confirm it's what you'd expect from the initialization)
# • Print the entire range with 0 fixed as the 1st index of x. What should its shape be?
# • Print the entire range with 2 fixed as the 3rd index x. What should its shape be?
# • Print the (inclusive) range 1-3, 4-5, 0-1 for the three axes. What should its shape be?
# • Print the "bottom right" element of x.
#############################################################################

# Replace "____" statements with your code
x = tf.convert_to_tensor(np.random.random((12, 19, 3)))

first_slice = x[0,:,:]
print(first_slice.shape)

# second_slice = ____
# print(second_slice.shape)

third_slice = x[1:4, 4:6, 0:2]
print(third_slice.shape)

# fourth_slice = ____
# print(fourth_slice.shape)

(19, 3)
(3, 2, 2)


### Multi-axis indexing

Higher rank tensors are indexed by passing multiple indices.

The exact same rules as in the single-axis case apply to each axis independently.

In [None]:
rank_2_tensor = tf.convert_to_tensor(np.random.random((12, 3)))
rank_3_tensor = tf.convert_to_tensor(np.random.random((4, 9, 6)))

In [None]:
print(rank_2_tensor.numpy())

Passing an integer for each index, the result is a scalar.

In [None]:
# Pull out a single value from a 2-rank tensor
print(rank_2_tensor[1, 1].numpy())

You can index using any combination of integers and slices:

In [None]:
# Get row and column tensors
print("Second row:", rank_2_tensor[1, :].numpy())
print("Second column:", rank_2_tensor[:, 1].numpy())
print("Last row:", rank_2_tensor[-1, :].numpy())
print("First item in last column:", rank_2_tensor[0, -1].numpy())
print("Skip the first row:")
print(rank_2_tensor[1:, :].numpy(), "\n")

Here is an example with a 3-axis tensor:

In [None]:
print(rank_3_tensor[:, :, 4])

<table>
<tr>
<th colspan=2>Selecting the last feature across all locations in each example in the batch </th>
</tr>
<tr>
    <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/index1.png?raw=1" alt="A 3x2x5 tensor with all the values at the index-4 of the last axis selected.">
  </td>
      <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/index2.png?raw=1" alt="The selected values packed into a 2-axis tensor.">
  </td>
</tr>
</table>

Read the [tensor slicing guide](https://tensorflow.org/guide/tensor_slicing) to learn how you can apply indexing to manipulate individual elements in your tensors.

## Manipulating Shapes

Reshaping a tensor is of great utility. 


In [None]:
# Shape returns a `TensorShape` object that shows the size along each axis
x = tf.constant([[1], [2], [3]])
print(x.shape)

In [None]:
# You can convert this object into a Python list, too
print(x.shape.as_list())

You can reshape a tensor into a new shape. The `tf.reshape` operation is fast and cheap as the underlying data does not need to be duplicated.

In [None]:
# You can reshape a tensor to a new shape.
# Note that you're passing in a list
reshaped = tf.reshape(x, [1, 3])

In [None]:
print(x.shape)
print(reshaped.shape)

The data maintains its layout in memory and a new tensor is created, with the requested shape, pointing to the same data. TensorFlow uses C-style "row-major" memory ordering, where incrementing the rightmost index corresponds to a single step in memory.

In [None]:
print(rank_3_tensor)

If you flatten a tensor you can see what order it is laid out in memory.

In [None]:
# A `-1` passed in the `shape` argument says "Whatever fits".
print(tf.reshape(rank_3_tensor, [-1]))

Typically the only reasonable use of `tf.reshape` is to combine or split adjacent axes (or add/remove `1`s).

For this 4x9x6 tensor, reshaping to (4x9)x6 or 4x(9x6) are both reasonable things to do, as the slices do not mix:

In [None]:
print(tf.reshape(rank_3_tensor, [4*9, 6]), "\n")
print(tf.reshape(rank_3_tensor, [4, -1]))

<table>
<th colspan=3>
Some good reshapes.
</th>
<tr>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-before.png?raw=1" alt="A 3x2x5 tensor">
  </td>
  <td>
  <img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-good1.png?raw=1" alt="The same data reshaped to (3x2)x5">
  </td>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-good2.png?raw=1" alt="The same data reshaped to 3x(2x5)">
  </td>
</tr>
</table>


Reshaping will "work" for any new shape with the same total number of elements, but it will not do anything useful if you do not respect the order of the axes.

Swapping axes in `tf.reshape` does not work; you need `tf.transpose` for that. 


In [None]:
# Bad examples: don't do this

# You can't reorder axes with reshape. 4, 9, 6
print(tf.reshape(rank_3_tensor, [6, 9, 4]), "\n") 

# This is a mess
print(tf.reshape(rank_3_tensor, [9*6, 4]), "\n")

# This doesn't work at all
try:
  tf.reshape(rank_3_tensor, [7, -1])
except Exception as e:
  print(f"{type(e).__name__}: {e}")

<table>
<th colspan=3>
Some bad reshapes.
</th>
<tr>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-bad.png?raw=1" alt="You can't reorder axes, use tf.transpose for that">
  </td>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-bad4.png?raw=1" alt="Anything that mixes the slices of data together is probably wrong.">
  </td>
  <td>
<img src="https://github.com/tensorflow/docs/blob/master/site/en/guide/images/tensor/reshape-bad2.png?raw=1" alt="The new shape must fit exactly.">
  </td>
</tr>
</table>

Note: `tf.transpose` and `tf.reshape` seem quite similar, and may coincide with what you want to do, but a `.reshape` effectively flattens the array (from the inner most to the outer most) and then groups values according to the new shape. This may reshuffle the values in a way you don't want, but `.transpose` effectively just changes the order in which you iterate through the values:

![](https://lihan.me/assets/images/numpy-arrays.png)

`.transpose` requires that you specify *which* dimension goes where. This should be more clear after a couple example exercises:

In [None]:
N, T, D = 4, 60, 3
cube = tf.Variable(tf.zeros((N, T, D)))

The interface is:
```
tf.transpose(
    a, perm=[x,y,z]
)
```
Where `x,y,z` are the *new* dimension order you want. So, if you specify `perm=[0,2,1]`, this will *switch* the 1st and 2nd dimensions. That means, the output will have dimension `[N, D, T]`. Let's try out some examples:

In [None]:
#############################################################################
# Task
# • Using transpose, create "rotated_cube" to be D x N x T
#############################################################################

# Replace "____" statements with your code
rotated_cube = ____

print(rotated_cube.shape)

In [None]:
#############################################################################
# Task
# • Using transpose, create "rotated_cube" to be T x N x D
#############################################################################

# Replace "____" statements with your code
rotated_cube = ____

print(rotated_cube.shape)

# Introduction to Variables

A TensorFlow **variable** is the recommended way to represent shared, persistent state your program manipulates. This guide covers how to create, update, and manage instances of `tf.Variable` in TensorFlow.

Variables are created and tracked via the `tf.Variable` class. A `tf.Variable` represents a tensor whose value can be changed by running ops on it.  Specific ops allow you to read and modify the values of this tensor. Higher level libraries like `tf.keras` use `tf.Variable` to store model parameters. 

## Setup

This notebook discusses variable placement.  If you want to see on what device your variables are placed, uncomment this line.

In [None]:
import tensorflow as tf

## Create a variable

To create a variable, provide an initial value.  The `tf.Variable` will have the same `dtype` as the initialization value.

In [None]:
my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
my_variable = tf.Variable(my_tensor)

# Variables can be all kinds of types, just like tensors
bool_variable = tf.Variable([False, False, False, True])
complex_variable = tf.Variable([5 + 4j, 6 + 1j])

In [None]:
my_variable[0,0] = 4

TypeError: ignored

A variable looks and acts like a tensor, and, in fact, is a data structure backed by a `tf.Tensor`.  Like tensors, they have a `dtype` and a shape, and can be exported to NumPy.

In [None]:
print("Shape: ", my_variable.shape)
print("DType: ", my_variable.dtype)
print("As NumPy: ", my_variable.numpy())

Most tensor operations work on variables as expected, although variables cannot be reshaped.

In [None]:
print("A variable:", my_variable)
print("\nViewed as a tensor:", tf.convert_to_tensor(my_variable))
print("\nIndex of highest value:", tf.math.argmax(my_variable))

# This creates a new tensor; it does not reshape the variable.
print("\nCopying and reshaping: ", tf.reshape(my_variable, [1,4]))

As noted above, variables are backed by tensors. You can reassign the tensor using `tf.Variable.assign`.  Calling `assign` does not (usually) allocate a new tensor; instead, the existing tensor's memory is reused.

In [None]:
a = tf.Variable([2.0, 3.0])
print(a)
# This will keep the same dtype, float32
a.assign([1, 2]) 
print(a)
# Not allowed as it resizes the variable: 
# try:
#   a.assign([1.0, 2.0, 3.0])
# except Exception as e:
#   print(f"{type(e).__name__}: {e}")

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>
<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([1., 2.], dtype=float32)>


In [None]:
#############################################################################
# Task
# • Try assigning position 1 of a to 6.0 using indexing. What happens?
#############################################################################

# Replace "____" statements with your code
a = tf.Variable(list(range(10)))
a[1] = 6.0

print(a)

TypeError: ignored

If you use a variable like a tensor in operations, you will usually operate on the backing tensor.  

Creating new variables from existing variables duplicates the backing tensors. Two variables will not share the same memory.

In [None]:
a = tf.Variable([2.0, 3.0])
# Create b based on the value of a
b = tf.Variable(a)
a.assign([5, 6])

# a and b are different
print(a.numpy())
print(b.numpy())

# There are other versions of assign
print(a.assign_add([2,3]).numpy())  # [7. 9.]
print(a.assign_sub([7,9]).numpy())  # [0. 0.]

In [None]:
#############################################################################
# Task
# • Construct a variable with value [1,2,3]
# • Construct a variable of size 2 x 3 initialized randomly (either use the
#   tf.random.uniform function or the equivalent from numpy)
#############################################################################

# Replace "____" statements with your code
x = ____
print(x)

y = ____
print(y)

## Lifecycles, naming, and watching

In Python-based TensorFlow, `tf.Variable` instance have the same lifecycle as other Python objects. When there are no references to a variable it is automatically deallocated.

Variables can also be named which can help you track and debug them.  You can give two variables the same name.

In [None]:
# Create a and b; they will have the same name but will be backed by
# different tensors.
a = tf.Variable(my_tensor, name="Mark")
# A new variable with the same name, but different value
# Note that the scalar add is broadcast
b = tf.Variable(my_tensor + 1, name="Mark")

# These are elementwise-unequal, despite having the same name
print(a == b)

Variable names are preserved when saving and loading models. By default, variables in models will acquire unique variable names automatically, so you don't need to assign them yourself unless you want to.

Although variables are important for differentiation, some variables will not need to be differentiated.  You can turn off gradients for a variable by setting `trainable` to false at creation. An example of a variable that would not need gradients is a training step counter.

In [None]:
step_counter = tf.Variable(1, trainable=False)

# The Sequential model

## Setup

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## When to use a Sequential model

A `Sequential` model is appropriate for **a plain stack of layers**
where each layer has **exactly one input tensor and one output tensor**.

Schematically, the following `Sequential` model:

In [None]:
# Define Sequential model with 3 layers
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)

In [None]:
# Call model on a test input
x = tf.ones((3,8))
y = model(x)

In [None]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (3, 2)                    18        
                                                                 
 layer2 (Dense)              (3, 3)                    9         
                                                                 
 layer3 (Dense)              (3, 4)                    16        
                                                                 
Total params: 43
Trainable params: 43
Non-trainable params: 0
_________________________________________________________________


is equivalent to this function:

In [None]:
# Create 3 layers
layer1 = layers.Dense(2, activation="relu", name="layer1")
layer2 = layers.Dense(3, activation="relu", name="layer2")
layer3 = layers.Dense(4, name="layer3")

# Call layers on a test input
x = tf.ones((3, 3))
y = layer3(layer2(layer1(x)))

A Sequential model is **not appropriate** when:

- Your model has multiple inputs or multiple outputs
- Any of your layers has multiple inputs or multiple outputs
- You need to do layer sharing
- You want non-linear topology (e.g. a residual connection, a multi-branch
model)

## Creating a Sequential model

You can create a Sequential model by passing a list of layers to the Sequential
constructor:

In [None]:
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)

Its layers are accessible via the `layers` attribute:

In [None]:
model.layers

In [None]:
#############################################################################
# Task
# • Construct a Sequential model with 2 Dense layers of sizes 3 and 7
#############################################################################

# Replace "____" statements with your code
model = ____

You can also create a Sequential model incrementally via the `add()` method:

In [None]:
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu"))
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(4))

In [None]:
#############################################################################
# Task
# • Construct a Sequential model with 2 Dense layers of sizes 3 and 7 using add
#############################################################################

# Replace "____" statements with your code
model = keras.Sequential()
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(7))

joe = []
joe.append(layers.Dense(3, activation="relu"))
joe.append(layers.Dense(7))
model = keras.Sequential(joe)

Note that there's also a corresponding `pop()` method to remove layers:
a Sequential model behaves very much like a list of layers.

In [None]:
model.pop()
print(len(model.layers))  # 2

Also note that the Sequential constructor accepts a `name` argument, just like
any layer or model in Keras. This is useful to annotate TensorBoard graphs
with semantically meaningful names.

In [None]:
model = keras.Sequential(name="my_sequential")
model.add(layers.Dense(2, activation="relu", name="layer1"))
model.add(layers.Dense(3, activation="relu", name="layer2"))
model.add(layers.Dense(4, name="layer3"))

In [None]:
#############################################################################
# Task
# • Use a for loop to construct a model with layers of the following sizes
#   specified by layer_sizes
# • Look at the properties of the model using model.summary(). How many parameters
#   are there in this model? 
#############################################################################

# Replace "____" statements with your code
layer_sizes = ____

## Specifying the input shape in advance

Generally, all layers in Keras need to know the shape of their inputs
in order to be able to create their weights. So when you create a layer like
this, initially, it has no weights:

In [None]:
layer = layers.Dense(3)
layer.weights  # Empty

[]

It creates its weights the first time it is called on an input, since the shape
of the weights depends on the shape of the inputs:

In [None]:
# Call layer on a test input
x = tf.ones((1, 4))
y = layer(x)
layer.weights  # Now it has weights, of shape (4, 3) and (3,)

[<tf.Variable 'dense_8/kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[ 0.8912183 , -0.79186845,  0.39720702],
        [-0.16956949,  0.7996043 , -0.06287152],
        [ 0.44419873, -0.34761232, -0.87322545],
        [-0.6336578 , -0.55048776, -0.3201068 ]], dtype=float32)>,
 <tf.Variable 'dense_8/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

Naturally, this also applies to Sequential models. When you instantiate a
Sequential model without an input shape, it isn't "built": it has no weights
(and calling
`model.weights` results in an error stating just this). The weights are created
when the model first sees some input data:

In [None]:
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)  # No weights at this stage!

# At this point, you can't do this:
# model.weights

# You also can't do this:
# model.summary()

# Call the model on a test input
x = tf.ones((1, 4))
y = model(x)
print("Number of weights after calling the model:", len(model.weights))  # 6

Number of weights after calling the model: 6


Once a model is "built", you can call its `summary()` method to display its
contents:

In [None]:
model.summary()

However, it can be very useful when building a Sequential model incrementally
to be able to display the summary of the model so far, including the current
output shape. In this case, you should start your model by passing an `Input`
object to your model, so that it knows its input shape from the start:

In [None]:
model = keras.Sequential()
model.add(keras.Input(shape=(4,)))
model.add(layers.Dense(2, activation="relu"))

model.summary()

Note that the `Input` object is not displayed as part of `model.layers`, since
it isn't a layer:

In [None]:
model.layers

A simple alternative is to just pass an `input_shape` argument to your first
layer:

In [None]:
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu", input_shape=(4,)))

model.summary()

Models built with a predefined input shape like this always have weights (even
before seeing any data) and always have a defined output shape.

In general, it's a recommended best practice to always specify the input shape
of a Sequential model in advance if you know what it is.

In [None]:
#############################################################################
# Task
# • Print the summary iteratively as you add layers as above (using the input_shape given)
#############################################################################

# Replace "____" statements with your code
layer_sizes = ____

## A common debugging workflow: `add()` + `summary()`

When building a new Sequential architecture, it's useful to incrementally stack
layers with `add()` and frequently print model summaries. For instance, this
enables you to monitor how a stack of `Conv2D` and `MaxPooling2D` layers is
downsampling image feature maps:

In [None]:
model = keras.Sequential()
model.add(keras.Input(shape=(250, 250, 3)))  # 250x250 RGB images
model.add(layers.Conv2D(32, 5, strides=2, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))

# Can you guess what the current output shape is at this point? Probably not.
# Let's just print it:
model.summary()

# The answer was: (40, 40, 32), so we can keep downsampling...

model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(2))

# And now?
model.summary()

# Now that we have 4x4 feature maps, time to apply global max pooling.
model.add(layers.GlobalMaxPooling2D())

# Finally, we add a classification layer.
model.add(layers.Dense(10))

# Training Revisited
Let's now revisit the training from the beginning of class. Training involves three things:
- Compiling the model with a loss function
- Training on some `x_train`, `y_train`
- Testing on some `x_test`, `y_test`
Let's walk through an example of doing this

In [None]:
#############################################################################
# Task
# • Construct a model with 3 Dense layers of the following sizes: 3, 8, 1
#############################################################################

# Replace "____" statements with your code
model = keras.Sequential(
    [
        layers.Dense(3, activation="relu"),
        layers.Dense(8, activation="relu"),
        layers.Dense(1),
    ]
)

Two common loss functions are "mse" (Mean Squared Error) for regression tasks and "sparse_categorical_crossentropy" for classification tasks. There are others, but these are the two primary ones we'll see in this course. Let's try it with "mse"

In [None]:
#############################################################################
# Task
# • Compile the model using Adam as the optimizer and the mse loss
#############################################################################

# Replace "____" statements with your code
model.compile(optimizer='adam', loss="mse")

In [None]:
#############################################################################
# Task
# • Train your model on train_x, train_y
#############################################################################

x = np.random.random((100, 5))
y = np.random.random(100)

train_x, test_x = x[:80, :], x[80:, :]
train_y, test_y = y[:80], y[80:]

# Replace "____" statements with your code
model.fit(train_x, train_y, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f71e973fd90>

In [None]:
#############################################################################
# Task
# • Evaluate your model on test_x, test_y
#############################################################################
# model.evaluate(test_x)
model.evaluate(test_x,  test_y)



0.642832338809967