<a href="https://colab.research.google.com/github/viniciusrpb/datavis_book/blob/main/cap7_projecoes_multidimensionais.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Capítulo 7 - Visualizações baseadas em Projeções Multi-dimensionais

Instalação do plotly e do renderer Orca, para que o Github apresente as imagens geradas pelo Plotly

In [83]:
#!pip install plotly==4.14.3
#!pip install orca
#!apt-get install xvfb libgtk2.0-0 libgconf-2-4

Bibliotecas importantes

In [84]:
import pandas as pd
from sklearn import preprocessing
#from ipywidgets import interactive, HBox, VBox
import plotly.express as px
#from plotly.offline import init_notebook_mode, iplot
##import plotly.offline as py
import plotly.graph_objects as go
import numpy as np
from sklearn.decomposition import PCA as sklearnPCA
from sklearn.manifold import TSNE
from sklearn.manifold import Isomap
from sklearn.manifold.t_sne import trustworthiness


Abre o arquivo .data utilizando o Pandas. O conjunto auto-mpg pode ser baixado aqui:


https://archive.ics.uci.edu/ml/datasets/Auto+MPG

In [85]:
raw_data = pd.read_csv("auto-mpg.csv")

In [86]:
raw_data.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


Informações sobre os atributos


In [87]:
raw_data = raw_data.replace('?',np.NaN)
raw_data = raw_data.replace('NA',np.NaN)
raw_data = raw_data.dropna(axis=0)

In [88]:
paises = {1: "USA", 2: "Europe",3: "Japan"}

label = raw_data["origin"]
label_int = raw_data["origin"]

label = label.replace(paises)
info_hover = raw_data["car name"]
raw_data = raw_data.drop(labels=["origin","car name"],axis=1)

raw_data

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year
0,18.0,8,307.0,130,3504,12.0,70
1,15.0,8,350.0,165,3693,11.5,70
2,18.0,8,318.0,150,3436,11.0,70
3,16.0,8,304.0,150,3433,12.0,70
4,17.0,8,302.0,140,3449,10.5,70
...,...,...,...,...,...,...,...
393,27.0,4,140.0,86,2790,15.6,82
394,44.0,4,97.0,52,2130,24.6,82
395,32.0,4,135.0,84,2295,11.6,82
396,28.0,4,120.0,79,2625,18.6,82


In [89]:
raw_data.dtypes

raw_data["horsepower"] = raw_data["horsepower"].astype(float)

In [90]:
#import plotly.io as pio
#pio.renderers.default = 'png'
import plotly.express as px
df = raw_data.copy()
df["origin"] = label
df["nomecarro"] = info_hover
df["origin_int"] = label_int

fig = px.scatter(df, x="mpg", y="displacement",color="origin")
fig.show()

Matriz de Scatterplot

## Visualização baseada no posicionamento de pontos utilizando Análise de Componentes Principais

Reduz a dimensionalidade dos dados

In [91]:
raw_data

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year
0,18.0,8,307.0,130.0,3504,12.0,70
1,15.0,8,350.0,165.0,3693,11.5,70
2,18.0,8,318.0,150.0,3436,11.0,70
3,16.0,8,304.0,150.0,3433,12.0,70
4,17.0,8,302.0,140.0,3449,10.5,70
...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82
394,44.0,4,97.0,52.0,2130,24.6,82
395,32.0,4,135.0,84.0,2295,11.6,82
396,28.0,4,120.0,79.0,2625,18.6,82


In [92]:
X = raw_data.values
y = label.values

sklearn_pca = sklearnPCA(n_components=2)
X_pca = sklearn_pca.fit_transform(X)

#pca = PCA(n_components=2)
#components = pca.fit_transform(X)

fig = px.scatter(X_pca, x=0, y=1, color=label.astype(object))
fig.show()

## Isometric Mapping

In [93]:
isomap = Isomap(n_components=2,n_neighbors=10)
X_isomap = isomap.fit_transform(X)

fig = px.scatter(X_isomap, x=0, y=1, color=label.astype(object))
fig.show()

## t-Stochastic Distributed Neighbor Embedding Projection (t-SNE)


Vamos gerar um gráfico de linhas para o conjunto de dados Auto-MPG Cars. Considere que cada carro é enumerado por um número inteiro de 1 a N, em que N é a quantidade de instâncias. Agora vamos criar uma nova coluna no DataFrame para possibilitar seu uso como eixo X no gráfico de linhas (Line Chart) do Plotly:

In [116]:
tsne = TSNE(n_components=2,perplexity=12,learning_rate=50.0,metric='euclidean', init='random')
X_tsne = tsne.fit_transform(X)

fig = px.scatter(X_tsne, x=0, y=1, color=label.astype(object))
fig.show()


## Avaliação da Qualidade de Projeções

In [110]:
kneigh = []
y_tsne = []
y_pca = []
y_isomap = []

for k in range(1,51):
  precision = trustworthiness(X, X_tsne, n_neighbors=k, metric='euclidean')
  kneigh.append(k)
  y_tsne.append(precision)
  precision = trustworthiness(X, X_pca, n_neighbors=k, metric='euclidean')
  y_pca.append(precision)
  precision = trustworthiness(X, X_isomap, n_neighbors=k, metric='euclidean')
  y_isomap.append(precision)


Em seguida, podemos criar um gráfico de linhas:

In [111]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=kneigh, y=y_tsne,
                    mode='lines+markers',
                    name='t-SNE'))
fig.add_trace(go.Scatter(x=kneigh, y=y_pca,
                    mode='lines+markers',
                    name='PCA'))
fig.add_trace(go.Scatter(x=kneigh, y=y_isomap,
                    mode='lines+markers',
                    name='Isomap'))

fig.show()

Vamos gerar outro gráfico de linhas, agora considerando a ordenação do atributo de preço médio do veículo (displacement):


## Coordenadas Paralelas