## Regresion Lineal en Tensorflow

En este proyecto  trataremos predecir los precios de la casa utilizando regresion lineal y reducir el gradient.

Ecuacion a utilizar:
$ h(x) = wx +b$

In [1]:

import numpy as np
import tensorflow.compat.v1 as tf
import pandas as pd
import matplotlib.pyplot as plt
import datetime
%load_ext tensorboard


In [2]:
 # read data set
data = np.load('C:\R_File\Master of Data Science\Python\Project\proyecto_data\proyecto_training_data.npy') 


In [3]:
# Crear Dataframe  y normalizar los datos

casadf = pd.DataFrame(data =data, columns = ["precio", "Calif", "fsqrt","trooms","yearbuilt","LotFrontage"])
casadf['precio']= casadf['precio']/1000
casadf['fsqrt']= casadf['fsqrt']/1000
casadf.fillna(0,inplace=True)
casadf.describe(include='all')

Unnamed: 0,precio,Calif,fsqrt,trooms,yearbuilt,LotFrontage
count,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0
mean,180.921196,6.099315,1.162627,6.517808,1971.267808,57.623288
std,79.442503,1.382997,0.386588,1.625393,30.202904,34.664304
min,34.9,1.0,0.334,2.0,1872.0,0.0
25%,129.975,5.0,0.882,5.0,1954.0,42.0
50%,163.0,6.0,1.087,6.0,1973.0,63.0
75%,214.0,7.0,1.39125,7.0,2000.0,79.0
max,755.0,10.0,4.692,14.0,2010.0,313.0


In [4]:
casadf.corr(method='pearson', min_periods=1)

Unnamed: 0,precio,Calif,fsqrt,trooms,yearbuilt,LotFrontage
precio,1.0,0.790982,0.605852,0.533723,0.522897,0.209624
Calif,0.790982,1.0,0.476224,0.427452,0.572323,0.176561
fsqrt,0.605852,0.476224,1.0,0.409516,0.281986,0.245181
trooms,0.533723,0.427452,0.409516,1.0,0.095589,0.221396
yearbuilt,0.522897,0.572323,0.281986,0.095589,1.0,0.036853
LotFrontage,0.209624,0.176561,0.245181,0.221396,0.036853,1.0


**Correlation de Variables.**
Basado la correlacion de los datos podemos indicar que:

* ``` Calif ``` con un $R = 0.79$
* ``` fsqrt ``` con un $R = 0.61$

Son las mas predictivas

In [5]:
#Random seed para reproducibilidad de los resultados

np.random.seed(24)

#Shuffle de datos
df = casadf.sample(frac = 1)


#Split Data entre entrenamiento y test

msk = np.random.rand(len(df)) < 0.8

train = df[msk]

test = df[~msk]

print('entrenamiento: ',len(train), ', set de prueba: ', len(test))

entrenamiento:  1187 , set de prueba:  273


### Modelo de regresión lineal mediante ***gradient descent***

El modelo de regresión lineal se entrenará utilizando la siguiente función de costo.

Función de costo:

$C(w,b) = \frac{1}{2m} \sum_{i=1}^{m} (y_{i} − h(x_i))^2$

donde:

* $y_{i}$ : Valor real de cada dato en el dataset
* $wx_{i}+b$ : Valor predecido por el modelo


*Gradient descent*:

$w = w - \alpha \frac{1}{n} \sum_{i=1}^{n} (y_i - h(x_i))*m)$

$b = b - \alpha \frac{1}{n} \sum_{i=1}^{n} (y_i − h(x_i))$


Los valores de $m$ y $b$ son actualizados iterativamente hasta minimizar el error.

In [40]:
class regression_lineal:
    def __init__ (self):
        # vector.
        self.w = tf.get_variable("weights", dtype = tf.float32, shape = [1,2], initializer = tf.zeros_initializer())
        
    # Funcion para Y_hat
    def __call__(self, x):
        with tf.name_scope("model"):
            return tf.matmul(self.w,x)
    # Funcion gradient descent
    def update(self, x, y, lr):
        with tf.name_scope("error"):
            error = self.error(x,y)
            # Escalar
            error_summary = tf.summary.scalar("ErrorSummary", error)
        gradient = tf.gradients(error, [self.w])
        updated_w = tf.assign(self.w, self.w -lr * gradient[0])
        return updated_w, error, error_summary
    
    # MSE
    def error(self, x, y):
        error = 1/2 * tf.reduce_mean(tf.math.square(y - self(x)))
        return error

In [41]:
y = train["precio"]
x = train["Calif"]
x = np.array([x, np.ones_like(x)], dtype = "float64")
# Funcion para entrenamiento

def training(lr, epochs, frecprint):
    # String para definicion de experimento
    string = './graphs/'+ datetime.datetime.now().strftime("%Y%m%d-%H%M%S") +"_lr="+str(lr)+ "_epochs="+str(epochs)

    g = tf.Graph()
    with g.as_default():
        # Inicialización
        modelo = regression_lineal()
        # placeholders
        tensorflow_x = tf.placeholder(tf.float32, [2,len(train["Calif"])], "tensorflow_x")
        tensorflow_y = tf.placeholder(tf.float32, [len(train["precio"])], "tensorflow_y")
        #  entrenamiento
        update_parameters = modelo.update(tensorflow_x, tensorflow_y, lr)
        
        # tensorboard
        writer = tf.summary.FileWriter(string, g)
        
        with tf.train.MonitoredSession() as session:
            feed_dict = {tensorflow_x:x, tensorflow_y:y}
            for i in range(epochs+1):
                
                # Entrenamiento
                training = session.run(update_parameters, feed_dict = feed_dict)
            
                if (i)%frecprint == 0:
                    # epocas
                    weights = session.run(modelo.w, feed_dict = feed_dict)
                    # visualizarlos en tensorboard
                    writer.add_summary(training[2], i)
                    print("Epoch: ", i, "Weights: ", weights, "Cost: ", training[1])
                    print("-------------------------------------------------------------------------")
                    
            writer.close()

In [50]:

import os
filename = 'C:\R_File\Master of Data Science\Python\Grafo_Lineal_regression.png'
print(os.path.abspath(filename))

C:\R_File\Master of Data Science\Python\Grafo_Lineal_regression.png


In [43]:
# test 1
training(0.001, 2000,100)

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Epoch:  0 Weights:  [[3.5573308 0.5412705]] Cost:  19256.78
-------------------------------------------------------------------------
Epoch:  100 Weights:  [[29.751112   3.2364564]] Cost:  1322.5784
-------------------------------------------------------------------------
Epoch:  200 Weights:  [[29.960613  1.894829]] Cost:  1316.4308
-------------------------------------------------------------------------
Epoch:  300 Weights:  [[30.167131   0.5718509]] Cost:  1310.4531
-------------------------------------------------------------------------
Epoch:  400 Weights:  [[30.370779  -0.7327246]] Cost:  1304.6404
-------------------------------------------------------------------------
Epoch:  500 Weights:  [[30.571594  -2.0191548]] Cost:  1298.9885
-------------------------------------------------------------------------
Epoch:  600 Weights:  [[30.769613 -3.287693]] Cost:  

In [16]:
# test 2
training(0.02, 2000,100)

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Epoch:  0 Weights:  [[47.43108    7.2169394]] Cost:  19256.78
-------------------------------------------------------------------------
Epoch:  1000 Weights:  [[ 42.45708  -78.158394]] Cost:  1105.4858
-------------------------------------------------------------------------
Epoch:  2000 Weights:  [[ 44.44563 -90.89719]] Cost:  1100.1705
-------------------------------------------------------------------------
Epoch:  3000 Weights:  [[ 44.752357 -92.8621  ]] Cost:  1100.0441
-------------------------------------------------------------------------
Epoch:  4000 Weights:  [[ 44.79967  -93.165215]] Cost:  1100.0413
-------------------------------------------------------------------------
Epoch:  5000 Weights:  [[ 44.806953 -93.21182 ]] Cost:  1100.041
-------------------------------------------------------------------------
Epoch:  6000 Weights:  [[ 44.80797 -93.21834]] 

In [17]:
# test 3
training(0.03, 2000,100)

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Epoch:  0 Weights:  [[47.43108    7.2169394]] Cost:  19256.78
-------------------------------------------------------------------------
Epoch:  1000 Weights:  [[ 42.45708  -78.158394]] Cost:  1105.4858
-------------------------------------------------------------------------
Epoch:  2000 Weights:  [[ 44.44563 -90.89719]] Cost:  1100.1705
-------------------------------------------------------------------------
Epoch:  3000 Weights:  [[ 44.752357 -92.8621  ]] Cost:  1100.0441
-------------------------------------------------------------------------
Epoch:  4000 Weights:  [[ 44.79967  -93.165215]] Cost:  1100.0413
-------------------------------------------------------------------------
Epoch:  5000 Weights:  [[ 44.806953 -93.21182 ]] Cost:  1100.041
-------------------------------------------------------------------------
Epoch:  6000 Weights:  [[ 44.80797 -93.21834]] 

In [37]:
# test 4
training(0.04, 2000,100)

ERROR: Timed out waiting for TensorBoard to start. It may still be running as pid 13060.

In [None]:
# test 5
training(0.004, 2000,100)

In [44]:
%tensorboard --logdir ./graphs  --port 6006

Reusing TensorBoard on port 6006 (pid 5224), started 3 days, 23:16:25 ago. (Use '!kill 5224' to kill it.)