# Engenharia de Software para Ciência de Dados - PUC-Rio

### Persistindo e Carregando Modelos Treinados com Pickle e Joblib

### Implantando um modelo em um endpoint na Google Cloud 

Marcos Kalinowski, Tatiana Escovedo, Hugo Villamizar e Hélio Lopes

## Persistindo e Carregando Modelos Treinados com Pickle e Joblib

https://docs.python.org/2/library/pickle.html

https://joblib.readthedocs.io/en/latest/

É a maneira padrão de serializar objetos em Python, sendo possível serializar modelos de aprendizado de máquina e salvar o formato serializado em um arquivo. Posteriormente, é possível pode carregar esse arquivo para desserializar o modelo e usá-lo para fazer novas previsões.




In [None]:
# Import das bibliotecas que serão usadas para criar um modelo de ML
import pandas as pd
import pickle
import joblib
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

Carregando um dataset que será usado para treinar um modelo

In [None]:
# Informa a URL de importação do dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

# Informa o cabeçalho das colunas
colunas = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

# Lê o arquivo utilizando as colunas informadas
dataset = pd.read_csv(url, names=colunas, skiprows=0, delimiter=',')

# Pega apenas os dados do dataset e guardando em um array
array = dataset.values

# Separa o array em variáveis preditoras (X) e variável target (Y)
X = array[:,0:8]
Y = array[:,8]

Dividindo os dados em treino e teste para treinar um modelo de ML usando o algoritmo LogisticRegression 

In [None]:
# Divide os dados em treino e teste
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7)

# Cria o modelo
modelo = LogisticRegression(solver='liblinear') 

# Treina o modelo
modelo.fit(X_train, Y_train)

LogisticRegression(solver='liblinear')

Salvando o modelo usando a biblioteca Picke

In [None]:
artifact_pkl_filename = 'model.pkl'

local_path = artifact_pkl_filename
with open(local_path, 'wb') as model_file:
  pickle.dump(modelo, model_file)

Salvando o mesmo modelo usando a biblioteca Joblib

In [None]:
artifact_joblib_filename = 'model.joblib'

local_path = artifact_joblib_filename
joblib.dump(modelo, local_path)

['model.joblib']

Carrega ambos os modelos do disco e avalia eles usando os dados de teste

In [None]:
# Pickle
loaded_pkl_model = pickle.load(open(artifact_pkl_filename, 'rb'))

# Joblib
loaded_joblib_model = joblib.load(artifact_joblib_filename)

# Avaliando os modelos carregados com os dados de teste
pkl_results = loaded_pkl_model.score(X_test, Y_test) 
print('Pickle: ', pkl_results)

joblib_results = loaded_joblib_model.score(X_test, Y_test) 
print('Joblib: ', joblib_results)

Pickle:  0.7559055118110236
Joblib:  0.7559055118110236


Usando o modelo carregado para obter previsões

In [None]:
loaded_pkl_model.predict(X_test[0:5])

array([0., 1., 1., 0., 0.])

In [None]:
loaded_joblib_model.predict(X_test[0:5])

array([0., 1., 1., 0., 0.])

## Implantando um modelo em um endpoint na Google Cloud

https://github.com/googleapis/python-aiplatform

Para usar os serviços de computação em nuvem da Google e outras plataformas é necessário ter uma conta ativa. A maioria dos serviços de armazenamento e processamento são pagos. Porém, no caso da Google, cada usuário novo tem um crédito de 300 dólares que pode ser usado em até 3 meses. 

In [None]:
!pip install google-cloud-aiplatform

In [None]:
# Bibliotecas para conectar com a google cloud
# Importante: instalar a biblioteca --> pip install google-cloud-aiplatform
from google.cloud import aiplatform
from google.cloud import storage
from google.oauth2 import service_account

Conectando com a plataforma de computação da Google

In [None]:
google_cloud_credentials = service_account.Credentials.from_service_account_file('google_cloud_secrets.json')

**Usando** o modulo 'storage' para conectar com o serviço de armazenamento da Google Cloud

In [None]:
project_id = 'project_id'
storage_client = storage.Client(project=project_id, credentials=google_cloud_credentials)

Usando o método 'bucket' do objeto 'storage_client' para armazenar o modelo que foi exportado usando a biblioteca Pickle

In [None]:
# Especificando o nome do bucket e do arquivo que será armazenado
bucket_name = 'models-teste'
model_name =  'model.pkl'

# Conectando com o bucket especificado
bucket = storage_client.bucket(bucket_name)

# Especificando o caminho onde será salvo o arquivo dentro do bucket
destination_blob_name = 'model/{}'.format(bucket_name)

# Fazendo o upload do arquivo no bucket
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(model_name)

print(f"Arquivo {model_name} carregado no caminho {destination_blob_name}")

Arquivo model.pkl carregado no caminho model/models-hugo


Inicializando o workspace de inteligencia artificial da Google Cloud para registrar um modelo e criar um endpoint

In [None]:
# Especificando a região
region = 'us-east1'

# Init vertex ai
aiplatform.init(project=project_id, location=region, credentials=google_cloud_credentials, staging_bucket='gs://models-teste')

Importando um modelo, neste caso, o modelo exportado usando a biblioteca Pickle, no workspace de inteligencia artificial da Google

In [None]:
model = aiplatform.Model.upload(display_name = 'logistic-regression-model-v1',
    description = 'Modelo de teste',
    artifact_uri = 'gs://models-teste/model/',
    serving_container_image_uri = 'us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'
)

Creating Model


INFO:google.cloud.aiplatform.models:Creating Model


Create Model backing LRO: projects/962955980454/locations/us-east1/models/6979400745959292928/operations/8880230950500302848


INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/962955980454/locations/us-east1/models/6979400745959292928/operations/8880230950500302848


Model created. Resource name: projects/962955980454/locations/us-east1/models/6979400745959292928@1


INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/962955980454/locations/us-east1/models/6979400745959292928@1


To use this Model in another session:


INFO:google.cloud.aiplatform.models:To use this Model in another session:


model = aiplatform.Model('projects/962955980454/locations/us-east1/models/6979400745959292928@1')


INFO:google.cloud.aiplatform.models:model = aiplatform.Model('projects/962955980454/locations/us-east1/models/6979400745959292928@1')


Criando um endpoint no workspace de inteligencia artificial da Google onde será hospedado o nosso modelo

In [None]:
# Create and endpoint
endpoint = aiplatform.Endpoint.create(display_name = 'logistic_regression_endpoint', 
                                      project = project_id, 
                                      location = region)

Creating Endpoint


INFO:google.cloud.aiplatform.models:Creating Endpoint


Create Endpoint backing LRO: projects/962955980454/locations/us-east1/endpoints/1153009465537069056/operations/7170551941959778304


INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/962955980454/locations/us-east1/endpoints/1153009465537069056/operations/7170551941959778304


Endpoint created. Resource name: projects/962955980454/locations/us-east1/endpoints/1153009465537069056


INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/962955980454/locations/us-east1/endpoints/1153009465537069056


To use this Endpoint in another session:


INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:


endpoint = aiplatform.Endpoint('projects/962955980454/locations/us-east1/endpoints/1153009465537069056')


INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/962955980454/locations/us-east1/endpoints/1153009465537069056')


Implantação do modelo no endpoint que foi criado

In [None]:
endpoint.deploy(model, machine_type='n1-standard-2')

Deploying Model projects/962955980454/locations/us-east1/models/6979400745959292928 to Endpoint : projects/962955980454/locations/us-east1/endpoints/1153009465537069056


INFO:google.cloud.aiplatform.models:Deploying Model projects/962955980454/locations/us-east1/models/6979400745959292928 to Endpoint : projects/962955980454/locations/us-east1/endpoints/1153009465537069056


Deploy Endpoint model backing LRO: projects/962955980454/locations/us-east1/endpoints/1153009465537069056/operations/2437831683546808320


INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/962955980454/locations/us-east1/endpoints/1153009465537069056/operations/2437831683546808320


Endpoint model deployed. Resource name: projects/962955980454/locations/us-east1/endpoints/1153009465537069056


INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/962955980454/locations/us-east1/endpoints/1153009465537069056


Obtendo previsões do endpoint que foi implantando na Google Cloud. Veja que foram passados os mesmos dados de teste dos modelos exportados como arquivos. 

In [None]:
endpoint.predict(X_test.tolist())

Prediction(predictions=[0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0