## ECBM E4040 Neural Networks and Deep Learning
### Columbia University, Fall 2025

# ECBM E4040 - Assignment 0

## Welcome to ECBM E4040 Neural Networks & Deep Learning

This course teaches theory, concepts, modeling and programming techniques. The assignments are focused on practical coding for creating, testing and running deep learning models.

The **Assignment 0** introduces our programming environment and tools, and basic TensorFlow operations.

The assignment consists of the following components:
* Programming environment setup - Google Compute Engine/local machine, Python, TensorFlow.
* Using Jupyter Notebook
* TensorFlow basics
* A demo of a program written in TensorFlow

<p style='color:red'>Cells marked with "<strong>TODO</strong>" need to be completed by students, as a part of the assignment.</p>

Please consult TAs or post your problems/issues on Ed Discussion.

## Part 1 - Setting Up the Programming Environment

For this course, we use [**Python**](https://www.python.org) as the programming language, and [**TensorFlow**](https://www.tensorflow.org) as the deep learning framework.

Our course website (https://ecbme4040.github.io/) provides a number of tutorials:
* Local Environment Setup
* Google Compute Engine (GCE) Setup on the Google Cloud Platform (GCP)
* Python Tutorial
* TensorFlow Tutorial
* Linux Tutorial
* Git Commands
* Google Colab

<p style='color:red'><strong>TODO:</strong></p>

This list shows an overview of what students need to do to complete the assignment. Later cells in this Jupyter notebook describe the details of the operations that need to be executed as TODO items.

* Create an account on Github, and familiarize yourself both with git operations and with Github. 

    * Github is a __version control__ system, which will be helpful in your learning journey. 

    * You can get familiar with git & github by following [guide](https://github.com/git-guides).

* Execute `Assignment0.ipynb`.

  Three platforms are mainly recommended for __executing code__ in this course, which is required for the Assignments and the Final Project.

    1. __Google Cloud__: Primary platform for all assignments. Requires credits 
    
        * You can initially use the $300 that every new account gets from Google Cloud. 

        * After the add/drop period you will be given codes/coupons from the instructors.  

    2. __Google Colab__: Easy setup, free limited compute. Use before codes/coupons are distributed.

    3. __Personal Computer__: Optional. Useful for quick & free debugging locally; final runs should be on Google Cloud.
    
  Please use the three platforms flexibly to run code, but __submit the final version from Google Cloud__, you will be required to submit the screenshots of your __Google Cloud VM Instance__. Requirements on screenshots will be given in the `README.md` of `Assignment0.ipynb` after add/drop. 
        
  For example, for `Assignment0.ipynb`:

    * During add/drop: run it in **Colab** (fetch from GitHub or Courseworks, upload to Colab, execute, and save your solutions).  

    * After add/drop: set up a **Google Cloud VM** ([GCE tutorial](https://ecbme4040.github.io/2024_fall/EnvSetup/gcp.html)), re-run the notebook, save it, and take required screenshots (VM dashboard + notebook in VM). Submit the modified notebook (with completed TODOs) via GitHub, along with screenshots as proof of cloud execution.

* **(Optional)** Knowledge of Python, Linux and Git operations is needed for this course. Doing the tutorials is optional, depending on your prior knowledge of these topics.

* **(Optional)** Tutorials on TensorFlow are extensive. As you progress with this course, they are an excellent complement to our class.

## Part 2 - How to Use Jupyter Notebook - Basics

Jupyter Notebook is an interactive Python programming interface. Jupyter Notebook files have a postfix `.ipynb`, and each file is made up of several blocks of code which are called **cells**. Each cell can be configured as a **coding cell** or a **markdown text cell**. Google colab is the easiest installation-free platform to run and familiarize oneself with Jupyter Notebook operations.

Basic instructions:
* The menu bars are located on the top of a notebook.
* To execute a cell, select it, and press `ctrl + Enter`. (You may also try `shift + Enter` and `alt + Enter` to see the difference).
* To switch between code and Markdown, select a cell, and select the mode you want in the dropdown menu in the menu bar.

A full guide to Jupyter Notebook can be accessed in [Official Document](https://jupyter-notebook.readthedocs.io/en/latest/).

In [1]:
# To test that you understand how to use a coding cell, make this cell output a string 'Hello Jupyter!'.
# We've written the code, all you need to do is to execute it.
print('Hello Jupyter!')

Hello Jupyter!


## Part 3 - TensorFlow Basics

TensorFlow is one of the most popular deep learning frameworks. Originally created by Google, it has received a lot of community support. TensorFlow versions before 2.0 (until summer 2019) contained a number of detailed functions for construction of deep learning (neural network) models, and relied on a two-step process for model creation and execution.

TensorFlow versions 2.0 and beyond focus on simplicity and ease of use, with updates like eager execution, intuitive Keras-based higher-level APIs, and flexible model building on any platform.

In this part, we look at some basic TensorFlow concepts and operations.

In [2]:
# This cell installs matplotlib tool, directly from Jupyter Notebook.
# Run this cell just once after removing the '#' sign before the command.

#!pip install --upgrade matplotlib
print("matplotlib installed")

matplotlib installed


In [8]:
# Activating TensorFlow in Jupyter Notebook

# Disable some warnings, to simplify the output when running the cells
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Import the TensorFlow module
import tensorflow as tf

print('TensorFlow Version =', tf.__version__)

# The following modules will be used in later parts of the assignment.
# Make sure that you install the latest version of numpy and matplotlib (inside your conda workspace)
# Alternatively, inside your virtual environment, use `conda install numpy`
# and "conda install matplotlib" and restart the jupyter notebook.
import numpy as np
from matplotlib import pyplot as plt

ModuleNotFoundError: No module named 'tensorflow'

#### 1. Use of String Constant with TensorFlow

In TensorFlow 2.0 the concept of sessions (as in versions before 2.0) has been deprecated. Now, variables are accessible instantly (as compared to after `sess.run`), making this flow "more pythonic" in style.

In [None]:
# Example
# Define a string constant
string = tf.constant('Hello TensorFlow!')
tf.print(string)

<p style='color:red'><strong>TODO:</strong></p>

Follow the example above and use TensorFlow to output the string "YOUR_NAME: YOUR_UNI".

<p style='color:red'><strong>SOLUTION (enter a new cell below):</strong></p>

#### 2. Basic Maths in TensorFlow

In [None]:
# Example

# Define 2 constant nodes. It is a good habit to name your nodes.
# The name of the nodes will appear in the TensorBoard graph.
a = tf.constant(10, dtype=tf.float32, name='a')
b = tf.constant(20, dtype=tf.float32, name='b')

# Addition and subtraction
add = tf.add(a, b, name='add') # same as a+b
sub = tf.subtract(a, b, name='sub') # same as a-b

# There is no need for the session to run these operations (as in TensorFlow 1.*)
tf.print("a =", a)
tf.print("b =", b)
tf.print("a + b =", add)
tf.print("a - b =", sub)

<p style='color:red'><strong>TODO:</strong></p>

Visit https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/math, find proper operations to calculate and print:
- `a*b` (multiplication)
- `a/b` (division)
- `a^b` (power)
- `log(a)` (natural logarithm)

***Note:*** `a` and `b` are defined in the previous cell, you should use them directly.

<p style='color:red'><strong>SOLUTION (enter a new cell below):</strong></p>

#### 3. Constant Tensor, Sequences and Random Numbers in TensorFlow

In [None]:
# In TensorFlow, a tensor is an n-dimensional array.
# 0-d tensor is a scalar. 1-d tensor is a vector, and so on.

# We can use TF functions to create all-zero and all-one tensors.
zero_array = tf.zeros(shape=[2,3], dtype=tf.float32, name='zero_array')
one_array = tf.ones(shape=[2,3], dtype=tf.float32, name='one_array')

# Or use a template to infer the shape.
template = tf.constant([[1,2,3],[4,5,6]], dtype=tf.float32, name='template') # Has [2,3] shape
zero_like = tf.zeros_like(template, name='zero_like')
one_like = tf.ones_like(template, name='one_like')

# Some sequence generating functions
lin_seq = tf.linspace(start=0.0, stop=5.0, num=5, name='lin_seq')
lin_range = tf.range(start=0, limit=7, delta=1, name='lin_range')

# A random number function
norm = tf.random.normal(shape=[5], mean=3, stddev=2.0)

# Printing out
tf.print('Zero array:\n', zero_array, '\n')
tf.print('One array:\n', one_array, '\n')
tf.print('Zero-like:\n', zero_like, '\n')
tf.print('One-like:\n', one_like, '\n')
tf.print('Linear sequence:\n', lin_seq, '\n')
tf.print('Range:\n', lin_range, '\n')
tf.print('Random normal:\n', norm, '\n')

<p style='color:red'><strong>TODO:</strong></p>

1. Generate a $3\times 3$ matrix filled with number `8`.
2. Generate a sequence starting from `7.0` to `-6.0` (left inclusive), with step size of `-1.0`.
3. Generate another $3\times 3$ matrix with normal distribution, choose any mean and stddev you like.

<p style='color:red'><strong>SOLUTION (enter a new cell below):</strong></p>

#### 4. Variables in TensorFlow

In [None]:
# The values of constants (previously described) can not be changed.
# For TensorFlow variables, their values can be updated during the training of a network.

initial_value = tf.Variable([2,3], dtype=tf.float32) # You need to give an initial value to the variable.
tf.print("Initial value:", initial_value)

# Several ops that can be used to change the value of a variable.
# Note that they all become "nodes" in the computational graph.
new_value = initial_value.assign([4,5])
tf.print("Assigned value:", new_value)
add = initial_value.assign_add([1,1])
tf.print("Add:", add)

<p style='color:red'><strong>TODO:</strong></p>

1. Create a $3\times 3$ tensor variable (the initial values don't matter).
2. Assign values from 1 to 9 to it and then add 1 to each assigned value in the $3\times 3$ tensor.
3. Print out (i) the initial values (ii) the new values after the assign operation.

<p style='color:red'><strong>SOLUTION (enter a new cell below):</strong></p>

#### 5. Impact of Data Types

In [None]:
# In TensorFlow, the float type of data includes float32 and float64.
# In later assignments, you should ALWAYS consider float32 as the first choice for the sake of efficiency,
# even though the precision is lower than for float64.

# Here we compare the precision difference between these two types.
a32 = tf.Variable([[1,2,3],[4,5,6]], dtype=tf.float32)
b32 = (a32 + 0.2) ** 2
a64 = tf.Variable([[1,2,3],[4,5,6]], dtype=tf.float64)
b64 = (a64 + 0.2) ** 2

tf.print('Results with float32: \n {}'.format(b32), '\n')
tf.print('Results with float64: \n {}'.format(b64), '\n')

#### 6. Distance Calculation



The most common way to measure the distance between two points in an $n$-dimensional space is the **Euclidean distance**.  
Given two vectors $x, y \in \mathbb{R}^n$, it is defined as

$$
d_2(x,y) = \|x - y\|_2 = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2 }.
$$

---

**Euclidean norm**

In linear algebra, the corresponding norm is the **Euclidean norm** (the $L^2$ norm).  
For a vector $x=(x_1,\dots,x_n)$,

$$
\|x\|_2 = \sqrt{x_1^2 + \cdots + x_n^2},
$$

and formally the mapping is

$$
\|\cdot\|_2 : \mathbb{R}^n \to [0,\infty).
$$

---

**General $L^p$ norm**

More generally, the Euclidean norm is a special case of the family of **Lebesgue space norms** $L^p$.  
For any real $p \ge 1$, the $L^p$ norm of $x$ is

$$
\|x\|_p = \left(|x_1|^p + \cdots + |x_n|^p\right)^{1/p}.
$$

- When $p=2$, we recover the Euclidean norm.  
- When $p=1$, we obtain the **Manhattan (taxicab) norm**:

$$
\|x\|_1 = |x_1| + \cdots + |x_n|.
$$

---

**Distances from norms**

Using this notation, the $L^p$ distance between two vectors is simply the norm of their difference:

$$
d_p(x,y) = \|x - y\|_p.
$$

In coordinates, if
$$
x = (x_1, x_2, \dots, x_n), \quad y = (y_1, y_2, \dots, y_n),
$$
then

$$
d_p(x,y)
= \left( \, |x_1 - y_1|^p + |x_2 - y_2|^p + \cdots + |x_n - y_n|^p \, \right)^{1/p}.
$$

In particular, the Manhattan ($L^1$) distance is

$$
d_1(x,y) = \sum_{i=1}^n |x_i - y_i|.
$$


---

**NOTE**  
> - The $p$-norms we introduced (e.g. Euclidean norm, Manhattan norm) are **vector norms** — they are defined only on vectors.  
>
> - In **NumPy**, the function [`np.linalg.norm`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) applied to a **matrix** will compute a **matrix norm**:  
>   - By default (`ord=None`), it gives the **Frobenius norm**:  
>     $$
>     \|A\|_F = \sqrt{\sum_{i,j} a_{ij}^2}.
>     $$  
>   - With `ord=1`, it gives the **induced 1-norm**, i.e. the **maximum column sum**:  
>     $$
>     \|A\|_1 = \max_j \sum_i |a_{ij}|.
>     $$  
>   These are standard definitions for matrices, but note they are **different** from simply flattening the matrix into a vector.  
>
> - In **TensorFlow**, the function [`tf.norm`](https://www.tensorflow.org/api_docs/python/tf/norm) behaves differently:  
>   - If `axis=None`, TensorFlow first **flattens the entire tensor into a 1D vector**, then computes the chosen vector norm.  
>   - Example: `tf.norm(A, ord=1, axis=None)` will compute the sum of absolute values of **all entries** in `A`, not the induced matrix norm.  
>
> For the purposes of this assignment, **always interpret norms as vector norms on flattened tensors**.


In [None]:
# Distance Calculation
#
# Create a tensor among the following types : float32, float64, complex64, complex128
# Mentioned link details the meaning of each variable: https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/norm

# Create two tensors of shape 2x3.
tensor_a = tf.constant([[1,2,3], [3,2,1]], dtype=tf.float32, name='template_a')
tensor_b = tf.constant([[1,1,1], [1,1,1]], dtype=tf.float32, name='template_b')

# 1. Calculating Euclidean distance or the L2 norm.

eucledian_distance = tf.norm(tensor_a - tensor_b, ord='euclidean', axis=None, keepdims=None)
tf.print("Euclidean distance:", eucledian_distance)

# 2. Calculating the Manhattan distance or the L1 norm
manhattan_distance = tf.norm(tensor_a - tensor_b, ord=1, axis=None, keepdims=None)
tf.print("Manhattan distance:", manhattan_distance)

<p style='color:red'><strong>TODO:</strong></p>

Create two functions using `numpy` which will return the Euclidean distance and Manhattan distance between the above tensors. Print their outputs.

<p style='color:red'><strong>SOLUTION (enter a new cell below):</strong></p>

We have introduced only the basic TensorFlow operations and concepts. We recommend that you visit the TensorFlow tutorial link (https://www.tensorflow.org/tutorials) for many more. You will need them to be able to efficiently build and execute deep learning models.

## Part 4 - TensorFlow Demos
Part 4 of this assignment consists of a demo. All you need to do is to run the demos and observe the results. This is meant to give you an idea of how TensorFlow is used.

Please run the code and look at the outputs. We do not ask you to fully understand the model at this point. However, it would be a good practice if you searched [www.tensorflow.org](https://www.tensorflow.org) to examine the functions which are used in the code. They will be really helpful when you start programming by yourself.

### Demo 1: Multiclass Logistic Regression
#### Loading and preparing the MNIST dataset

In [None]:
# Multi-class Logistic Regression

from tensorflow.keras.datasets import mnist

# Load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Convert to float32.
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)

# Flatten images to 1-D vector of 784 features (28*28, which is the size of the images in MNIST).
num_features = 28 * 28
x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])

# Normalize values into [0, 1].
x_train, x_test = x_train / 255., x_test / 255.

print("Cell executed. Move to the next cell.")

#### Setting up hyperparameters and dataset parameters

In [None]:
# MNIST dataset parameters
num_classes = 10 # 0 to 9 digits
num_features = 784 # 28*28

# Training parameters
learning_rate = 0.005
training_steps = 500
batch_size = 256
display_step = 100

print("Cell executed. Move to the next cell.")

#### Shuffling and Batching the data

In [None]:
# Using tf.data API to shuffle and batch data.
train_data = tf.data.Dataset.from_tensor_slices((x_train,y_train))

train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)
print("Cell executed. Move to the next cell.")

#### Initializing weights and biases

In [None]:
# Weight of shape [784, 10], the 28*28 image features, and a total number of classes.
w = tf.Variable(tf.ones([num_features, num_classes]), name="weight")

# Bias of shape [10], the total number of classes.
b = tf.Variable(tf.zeros([num_classes]), name="bias")

print("Cell executed. Move to the next cell.")

#### Defining Logistic Regression and Cost function

In [None]:
# Logistic regression (x@w + b).
def logistic_regression(x):
    # Apply softmax to normalize the logits to a probability distribution.
    return tf.nn.softmax(tf.matmul(x, w) + b)

# Cross-Entropy loss function.
def cross_entropy(y_pred, y_true):
    # Encode label to an one hot vector.
    y_true = tf.one_hot(y_true, depth=num_classes)

    # Clip prediction values to avoid log(0) error.
    y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)

    # Compute cross-entropy.
    return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred), axis=1))

print("Cell executed. Move to the next cell.")

#### Defining Optimizers and Accuracy Metrics

In [None]:
# Accuracy metric.
def accuracy(y_pred, y_true):
    # Predicted class is the index of the highest score in prediction vector (i.e. argmax).
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))

    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Stochastic gradient descent optimizer.
optimizer = tf.optimizers.SGD(learning_rate)

print("Cell executed. Move to the next cell.")

#### Optimization process and Updating Weights and Biases

In [None]:
# Optimization process.

def run_optimization(x, y):
    # Wrap computation inside a GradientTape for automatic differentiation.
    with tf.GradientTape() as g:
        pred = logistic_regression(x)
        loss = cross_entropy(pred, y)

    # Compute gradients.
    gradients = g.gradient(loss, [w, b])

    # Update W and b following gradients.
    optimizer.apply_gradients(zip(gradients, [w, b]))

# printf
print("Cell executed. Move to the next cell.")

#### Training loop

In [None]:
# Run training for the given number of steps.
print("This cell should run and show output for 500 steps. Running ...")

for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
    # Run the optimization to update W and b values.
    run_optimization(batch_x, batch_y)
    if step % display_step == 0:
        pred = logistic_regression(batch_x)
        loss = cross_entropy(pred, batch_y)
        acc = accuracy(pred, batch_y)
        print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))

print("Cell executed. Move to the next cell.")

#### Testing model accuracy using the testing data

In [None]:
# Test model on validation set.
pred = logistic_regression(x_test)
print("Test Accuracy: %f" % accuracy(pred, y_test))

## Part 5 - Organizing the Code for Development of Deep Learning Models

This assignment was distributed as a collection of directories and files (either through the Github or a collection of files in a `*.zip` package).

The organization of directories and files is described in the corresponding README.md file, and illustrated below as well:

<p style='color:red'><strong>TODO:</strong></p> Study the organization of directories and files for this assignment

In particular examine the way in which Jupyter notebook files and utility Python files are distributed across directories. Students will be required to follow this (or very similar) directory structure for their assignments and projects. For every project/assignment, a similar tree structure needs to be generated and appended to the README.md file.

A typical organization of the top directory can be seen at the end of the `README.md` file.

### Steps to create and visualize the tree-like directory structure:

<p style='color:red'><strong>For Linux:</strong></p>

1. Go to the directory for which you want to create and visualize the tree structure for.
2. Type `tree ./ >> README.md` in the terminal (or `!tree ./ >> README.md` a Jupyter Notebook cell) - this will append the tree to the `README.md` file in the same directory.
3. After running the above command, go to the `README.md` file and **manually** enclose the appended directory structure text in 3 inverted quote "```" (you may refer to the provided example in `README.md`).

<p style='color:red'><strong>For Windows:</strong></p>

1. Replace the above command with `tree ./ /f /a >> README.md`.
2. After running the above command, go to the `README.md` and remove the lines that look like: `Folder PATH listing for volume DATA Volume serial number is 0A8A-1CBE D:\CA_4040\MASTER_BRANCH\e4040-2024fall-assign0`.

**Note:** The two chevrons `>>` are used to append the output of a command to the bottom of a file (instead of one chevron `>` which overwrites the whole file).

## Submission of the Assignment

The method of the submission and the naming are described in the `README.md` file, distributed in Github. For formal submission, additional instructions will be provided a few days before the assignment is due.

## End of the assignment

In [None]:
#!sudo apt-get install tree
!tree ./ >> README.md