In [1]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.11.0


# Introduction à TensorFlow 2.x et Keras

## 1. Entrainement d'un modèle linéaire (perceptron)

### a. Démarche générale 
- **Réunir les données**
- **Isoler le jeu de test**
    - Il doit être le plus proche possible de la réalité de production.
    - Souvent le même ordre de grandeur que le jeu de validation
- **Pré-processing du jeu de données restant**
    - Sauvegarder les résultats du pré-processing si celui-ci est couteux   
- **Séparer les données en jeu d’entraînement et de validation**
- **Créer des batchs dont la taille correspondra aux capacités de la machine**

### b. Importer et préparer le dataset

#### Explorez votre dataset
Prenez le temps de regarder le contenu et la structure des données
Nous voulons travailler sur le dataset "boston housing" pour la prédiction des prix d'immeubles à partir de certaines caractéristiques données en entrée.

Les données: 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. Targets are the median values of the houses at a location (in k$)

Variables in order:
 
 * CRIM     per capita crime rate by town
 * ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
 * INDUS    proportion of non-retail business acres per town
 * CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 * NOX      nitric oxides concentration (parts per 10 million)
 * RM       average number of rooms per dwelling
 * AGE      proportion of owner-occupied units built prior to 1940
 * DIS      weighted distances to five Boston employment centres
 * RAD      index of accessibility to radial highways
 * TAX      full-value property-tax rate per 10,000 dollars
 * PTRATIO  pupil-teacher ratio by town
 * B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 * LSTAT    percentage lower status of the population
 * MEDV     Median value of owner-occupied homes in k$

Warning: This dataset has an ethical problem: the authors of this dataset included a variable, "B", that may appear to assume that racial self-segregation influences house prices. As such, we strongly discourage the use of this dataset, unless in the context of illustrating ethical issues in data science and machine learning.

#### Data preparation

In [1]:
import tensorflow as tf

# Load the Boston housing dataset. It's available in the tf.keras.datasets module: search the documentation to find a way to load the data.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data()

# Prepare your data. What could you do as a preprocessing on your data and why ?

# Scale the features between [0;1]
# x_train, x_test = x_train / x_train.max(), x_test / x_test.max()

# Using a Scaler from sklearn.preprocessing module
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# first we fit the scaler on the training dataset
scaler.fit(x_train)

# then we call the transform method to scale both the training and testing data
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

### c. Build a linear model

Your data which are quite simple in this case is now ready. 
You can now build a simple model ! 
There are two options to build a neural network : you can either use the sequential API or the functional API
* Sequential API : A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. https://www.tensorflow.org/guide/keras/sequential_model
* Functional API: The Keras functional API is a way to create models that are more flexible than the tf.keras.Sequential API. The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs. https://www.tensorflow.org/guide/keras/functional

In our use case : we'll work using the **sequential API**

In [2]:
# Build a simple perceptron using the Sequential API : you can find code sample in the documentation.
# see tf.keras.models.Sequential() for more details.
perceptron = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(13,)),  # The input has 13 features
    tf.keras.layers.Dense(1) # The output has 1 value (the predicted price)
])

In [3]:
perceptron.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 1)                 14        
                                                                 
Total params: 14
Trainable params: 14
Non-trainable params: 0
_________________________________________________________________


*You now have your architecture, it is time to build your optimizer and your loss function*

In [4]:
# Define a loss function and the optimizer : you can explore the losses and optimizers modules to see the possibilities.
# Especially, focus here on the MSE and the Adam optimizer.
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()

*You defined all the elements required to train the model, you can now write your training loop.*

*How does it work ? 
At each epoch your model will predict values which will be compared to the ground truth using your loss function. 
Then, you will compute the gradient of your loss function and apply this gradient using your optimizer in order to update the trainable parameters (weights and biases)*



In [None]:
# Train the model for 300 epochs
for epoch in range(300):
    # Compute the loss and the gradients for this epoch => Checkout the GradientTape() method 
    with tf.GradientTape() as tape:
        y_pred = perceptron(x_train_scaled)
        loss = loss_fn(y_train, y_pred)
    gradients = tape.gradient(loss, perceptron.trainable_variables)
    
    # Update the model's parameters using the gradients
    optimizer.apply_gradients(zip(gradients, perceptron.trainable_variables))
    
    # Print the loss for this epoch
    print(f"Epoch {epoch+1}: Loss = {loss.numpy()}")

In [6]:
# Evaluate the model on the test set
y_pred = perceptron(x_test_scaled)
test_loss = loss_fn(y_test, y_pred)
print(f"Test loss: {test_loss}")

Test loss: 605.9854736328125


#### What are your conclusions ? 

## 2. Construction du modèle (architecture NN)

In [1]:
# Reload the data
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data()

scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [2]:
#Build a new model with two hidden layers of 8 and 16 neurons. Don't forget to choose an activation function
# Same logic but different syntax
model_nn = tf.keras.models.Sequential()
model_nn.add(tf.keras.layers.Dense(8, activation='relu', input_shape=[x_train_scaled.shape[1]]))
model_nn.add(tf.keras.layers.Dense(16, activation='relu'))
model_nn.add(tf.keras.layers.Dense(1))

In [9]:
model_nn.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 8)                 112       
                                                                 
 dense_2 (Dense)             (None, 16)                144       
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
Total params: 273
Trainable params: 273
Non-trainable params: 0
_________________________________________________________________


In [3]:
# define you loss function and your optimizer. 
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.experimental.RMSprop()

In [4]:
# Train the model for 300 epochs (you can reuse your training loop either by defining a function and using as such or simply copy paste it) 
for epoch in range(300):
    # Compute the loss and the gradients for this epoch => Checkout the GradientTape() method 
    with tf.GradientTape() as tape:
        y_pred = model_nn(x_train_scaled)
        loss = loss_fn(y_train, y_pred)
    gradients = tape.gradient(loss, model_nn.trainable_variables)
    
    # Update the model's parameters using the gradients
    optimizer.apply_gradients(zip(gradients, model_nn.trainable_variables))
    
    # Print the loss for this epoch
    print(f"Epoch {epoch+1}: Loss = {loss.numpy()}")

Epoch 1: Loss = 606.5911254882812
Epoch 2: Loss = 604.5787353515625
Epoch 3: Loss = 603.1592407226562
Epoch 4: Loss = 601.990478515625
Epoch 5: Loss = 600.9701538085938
Epoch 6: Loss = 600.0491333007812
Epoch 7: Loss = 599.1987915039062
Epoch 8: Loss = 598.3994750976562
Epoch 9: Loss = 597.6405029296875
Epoch 10: Loss = 596.9205322265625
Epoch 11: Loss = 596.22412109375
Epoch 12: Loss = 595.552001953125
Epoch 13: Loss = 594.8994140625
Epoch 14: Loss = 594.2662353515625
Epoch 15: Loss = 593.6448974609375
Epoch 16: Loss = 593.0313110351562
Epoch 17: Loss = 592.425537109375
Epoch 18: Loss = 591.8255004882812
Epoch 19: Loss = 591.2300415039062
Epoch 20: Loss = 590.6417236328125
Epoch 21: Loss = 590.06298828125
Epoch 22: Loss = 589.49267578125
Epoch 23: Loss = 588.923583984375
Epoch 24: Loss = 588.3543701171875
Epoch 25: Loss = 587.7960205078125
Epoch 26: Loss = 587.241943359375
Epoch 27: Loss = 586.6889038085938
Epoch 28: Loss = 586.1356811523438
Epoch 29: Loss = 585.5839233398438
Epoch 30

#### Predict with your model

In [5]:
# we call the predict method
predictions = model_nn.predict(x_test_scaled)

# print the predictions
print(f"model prediction : {predictions[0][0]}")

# print the real values
print(f"ground truth : {y_test[0]}")


model prediction : 13.06377124786377
ground truth : 7.2


In [None]:
#######################################################

In [6]:
# Last element to propose=> uses keras Model.compile
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data()

scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

# Build the model

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(8, activation='relu', input_shape=[x_train_scaled.shape[1]]),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1)  # The output has 1 value (the predicted price)
])

# Compile the model with a mean squared error loss function
model.compile(loss="mse", optimizer="adam")

# Train the model for 100 epochs
model.fit(x_train, y_train, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x1a4d65c09d0>

In [None]:
# Helper function to get a batch of data from the training set
def get_batch(index, batch_size):
  # Calculate the start and end indices for the batch
  start = index * batch_size
  end = start + batch_size

  # Get the batch of data
  x_batch = X_train[start:end]
  y_batch = y_train[start:end]

  return x_batch, y_batch

In [None]:
import tensorflow as tf

# Load the Boston housing dataset
from tensorflow.keras.datasets import boston_housing
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

# Get the number of features in the dataset
num_features = X_train.shape[1]

# Input data
X = tf.placeholder(tf.float32, shape=[None, num_features])

# Weights and biases
W = tf.Variable(tf.zeros([num_features, 1]))
b = tf.Variable(0.0)

# Output (predicted price)
y = tf.matmul(X, W) + b

# True price
y_true = tf.placeholder(tf.float32, shape=[None, 1])

# Loss function
loss = tf.reduce_mean(tf.square(y - y_true))

# Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)

# Training loop
with tf.Session() as sess:
  # Initialize variables
  sess.run(tf.global_variables_initializer())

  # Loop over epochs
  for epoch in range(num_epochs):
    # Loop over dataset
    for i in range(num_examples):
      # Get batch of data
      x_batch, y_batch = get_batch(i, batch_size)

      # Run optimization step
      sess.run(optimizer, feed_dict={X: x_batch, y_true: y_batch})
