<a href="https://colab.research.google.com/github/vvalcristina/datascience_codenation/blob/master/Codenation_Tensor_Flow_Transform.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Tensor Flow Transform**

Aula com base no módulo AceleraDev Data Science.

Documentação do [Transform](https://www.tensorflow.org/tfx/transform/get_started)

**Instalando as Libs**

In [1]:
!pip install tensorflow-transform



**Importando as libs**

In [0]:
import tempfile
import pandas as pd
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_transform.beam.impl as tft_beam

import apache_beam.io.iobase

from __future__ import print_function
from tensorflow_transform.tf_metadata import dataset_metadata, dataset_schema, schema_utils 

**Pré-processamento dos dados**


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
dataset = pd.read_csv("/content/drive/My Drive/polution_small.csv")

In [6]:
dataset.head()

Unnamed: 0,Date,pm10,no2,so2,soot
0,1/1/2009,98.67,14.1,44.38,34.81
1,1/2/2009,52.33,14.1,29.75,33.06
2,1/3/2009,74.67,20.5,36.25,39.25
3,1/4/2009,72.0,17.3,46.44,34.38
4,1/5/2009,81.0,25.64,56.56,45.59


**Removendo a coluna com Data**

In [0]:
features = dataset.drop("Date", axis = 1)

In [8]:
features.head()

Unnamed: 0,pm10,no2,so2,soot
0,98.67,14.1,44.38,34.81
1,52.33,14.1,29.75,33.06
2,74.67,20.5,36.25,39.25
3,72.0,17.3,46.44,34.38
4,81.0,25.64,56.56,45.59


**Covertendo em um dicionário**

In [0]:
dict_features = list(features.to_dict("index").values())

In [10]:
dict_features[0:2]

[{'no2': 14.1, 'pm10': 98.67, 'so2': 44.38, 'soot': 34.81},
 {'no2': 14.1, 'pm10': 52.33, 'so2': 29.75, 'soot': 33.06}]

**Definindo metadados**

In [11]:
data_metadata = dataset_metadata.DatasetMetadata(dataset_schema.from_feature_spec({
    "no2": tf.io.FixedLenFeature([], tf.float32),
    "pm10": tf.io.FixedLenFeature([], tf.float32),
    "so2": tf.io.FixedLenFeature([], tf.float32),
    "soot": tf.io.FixedLenFeature([], tf.float32),
}))

Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec


In [12]:
data_metadata

{'_schema': feature {
  name: "no2"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "pm10"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "so2"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "soot"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
}

**Função de Pré-processamento**

In [0]:
def preprocessing_fn(inputs):
  no2 = inputs["no2"]
  pm10 = inputs["pm10"]
  so2 = inputs["so2"]
  soot = inputs["soot"]
  
  no2_normalized = no2 - tft.mean(no2)
  so2_normalized = so2 - tft.mean(so2)
  
  pm10_normalized = tft.scale_to_0_1(pm10)
  soot_normalized = tft.scale_by_min_max(soot)
  
  return {
      "no2_normalized": no2_normalized,
      "so2_normalized": so2_normalized,
      "pm10_normalized": pm10_normalized,
      "sott_normalized": soot_normalized
  }

**Codificando**

Tensorflow Transform use Apache Beam background to perform operations.

Function parameters:

dict_features - Our database converted to dict
data_metadata - Defined metadata
preprocessing_fn - preprocessing function


Apache Beam Syntax

result = data_to_pass | where_to_pass_the_data
Explaining:

result -> transformed_dataset, transform_fn

data_to_pass -> (dict_features, data_metadata)

where_to_pass_the_data -> tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)

transformed_dataset, transform_fn = ((dict_features, data_metadata) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))

**Referências**:

[Apache Beam]( https://beam.apache.org/documentation/programming-guide/)

In [0]:
def data_transform():
  with tft_beam.Context(temp_dir = tempfile.mkdtemp()):
    transformed_dataset, transform_fn = ((dict_features, data_metadata) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
    
  transformed_data, transformed_metadata = transformed_dataset
  
  for i in range(len(transformed_data)):
    print("Initial: ", dict_features[i])
    print("Transformed: ", transformed_data[i])

In [16]:
data_transform

<function __main__.data_transform>