# Altair
documentacion: https://altair-viz.github.io/index.html

Videos intoductorios a la visualizacion de datos, las diferentes librerias de visualizacion usadas en python actualmente y la el uso de modelos estadisticos que usan data para aproximar resultados usando codigo de python intuitivo. 

*  Jake VanderPlas - How to Think about Data Visualization - PyCon 2019 link: https://www.youtube.com/watch?v=vTingdk_pVM
*  Jake VanderPlas The Python Visualization Landscape PyCon 2017 https://www.youtube.com/watch?v=FytuB8nFHPQ
*  Jake Vanderplas - Statistics for Hackers - PyCon 2016 https://www.youtube.com/watch?v=Iq9DzN6mvYA





## Workshop Exploratory Data Visualization with Vega, Vega-Lite, and Altair

link workshop: https://www.youtube.com/watch?v=ms29ZPUKxbU

links tutorial: 
* https://colab.research.google.com/github/altair-viz/altair-tutorial/blob/master/notebooks/Index.ipynb
* https://github.com/altair-viz/altair-tutorial/tree/master/notebooks

In [None]:
import altair as alt

In [None]:
from vega_datasets import data

cars = data.cars()

### Bases

In [None]:
#Objeto Chart -> emite JSON para que lo grafique vega light
cars1 = cars.iloc[:1]
alt.Chart(cars1).mark_point().to_dict()

{'$schema': 'https://vega.github.io/schema/vega-lite/v4.17.0.json',
 'config': {'view': {'continuousHeight': 300, 'continuousWidth': 400}},
 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},
 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Acceleration': 12.0,
    'Cylinders': 8,
    'Displacement': 307.0,
    'Horsepower': 130.0,
    'Miles_per_Gallon': 18.0,
    'Name': 'chevrolet chevelle malibu',
    'Origin': 'USA',
    'Weight_in_lbs': 3504,
    'Year': '1970-01-01T00:00:00'}]},
 'mark': 'point'}

Marks que se pueden definir:

* ``mark_point()`` 
* ``mark_circle()``
* ``mark_square()``
* ``mark_line()``
* ``mark_area()``
* ``mark_bar()``
* ``mark_tick()``

Para la lista completa escribit  alt.Chart.mark_ y arpretar tab

Encodings:

* x: x-axis value
* y: y-axis value
* color: color of the mark
* opacity: transparency/opacity of the mark
* shape: shape of the mark
* size: size of the mark
* row: row within a grid of facet plots
* column: column within a grid of facet plots

Tipos de datos

<table>
  <tr>
    <th>Data Type</th>
    <th>Code</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>quantitative</td>
    <td>Q</td>
    <td>Numerical quantity (real-valued)</td>
  </tr>
  <tr>
    <td>nominal</td>
    <td>N</td>
    <td>Name / Unordered categorical</td>
  </tr>
  <tr>
    <td>ordinal</td>
    <td>O</td>
    <td>Ordered categorial</td>
  </tr>
  <tr>
    <td>temporal</td>
    <td>T</td>
    <td>Date/time</td>
  </tr>
</table>

Se determinan de forma automatica pero si se quiere se pueden cambiar

### Agregacion

In [None]:
cars.groupby('Origin')['Miles_per_Gallon'].mean()

Origin
Europe    27.891429
Japan     30.450633
USA       20.083534
Name: Miles_per_Gallon, dtype: float64

In [None]:
alt.Chart(cars).mark_bar().encode(
    alt.X('Miles_per_Gallon', bin=True),
    alt.Y('count()'),
    alt.Color('Origin')
)

In [None]:
alt.Chart(cars).mark_bar().encode(
    color=alt.Color('Miles_per_Gallon', bin=True),
    x=alt.X('count()', stack='normalize'),
    y='Origin'
)

Datos temporales temp guia de uso en documentacion y github.

### Demo

In [None]:
#Dataset de ejemplo para usar la libreria 
from vega_datasets import data

cars = data.cars()
cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


In [None]:
alt.Chart(cars).mark_point().encode(
    x='Miles_per_Gallon'
) # se puede usar mark_tick en vez de mark_point si se quiere usar lineas y no puntos.

In [None]:
alt.Chart(cars).mark_point().encode(
    x='Miles_per_Gallon',
    y='Horsepower'
)
#agregar .interactive() al final deja que se pueda mover el grafico con el mouse.

In [None]:
#puedo variar el color de un punto x y segun otra variable
#si esa variable es categorica (origen) los colores van a ser distintivos
# si la varible es continua (aceleracion) el color va a ser un gradiente.
alt.Chart(cars).mark_point().encode(
    x='Miles_per_Gallon',
    y='Horsepower',
    color='Origin'
)
#el tipo de categoria se detecta de forma automatica pero si se quiere cambiar se pone color='Origin:O' :O es por datos ordenados

In [None]:
#Para hacer un histograma hago mark_bar y la variable y sera el count() 
alt.Chart(cars).mark_bar().encode(
    x=alt.X('Miles_per_Gallon', bin=True),
    y='count()'
)

In [None]:
alt.Chart(cars).mark_bar().encode(
    x=alt.X('Miles_per_Gallon', bin=alt.Bin(maxbins=30)),
    y='count()',
    color='Origin'
)
#Tambien puedo agregar color segun otra variable y se puede customizar el numero de intervalos 

In [None]:
#Si quiero hacer graficos separados por alguna variable:
alt.Chart(cars).mark_bar().encode(
    x=alt.X('Miles_per_Gallon', bin=alt.Bin(maxbins=30)),
    y='count()',
    color='Origin',
    column='Origin'
)

In [None]:
alt.Chart(cars).mark_rect().encode(
    x=alt.X('Miles_per_Gallon', bin=True),
    y=alt.Y('Horsepower', bin=True),
    color='mean(Weight_in_lbs)'
)

In [None]:

alt.Chart(cars).mark_point().encode(
    x='Year',
    y='Miles_per_Gallon'
)

In [None]:
alt.Chart(cars).mark_line().encode(
    x='Year',
    y='mean(Miles_per_Gallon)',
)

In [None]:
#con ci0 y ci1 agrego intervalo de confianza. (numeros en los que es posible que este y)
alt.Chart(cars).mark_area().encode(
    x='Year',
    y='ci0(Miles_per_Gallon)',
    y2='ci1(Miles_per_Gallon)'
)

In [None]:
#Ejemplo completo

spread = alt.Chart(cars).mark_area(opacity=0.3).encode(  #grafico el area con cierta opacidad para ver mejor
    x=alt.X('Year', timeUnit='year'), #especifica x y unidad
    y=alt.Y('ci0(Miles_per_Gallon)', axis=alt.Axis(title='Miles per Gallon')), #especifica y y pide que se pueste el intervalo de confianza.
    y2='ci1(Miles_per_Gallon)',
    color='Origin' #El color del area varia segun el origen
).properties(
    width=800
)

lines = alt.Chart(cars).mark_line().encode( #agrego grafico de lineas 
    x=alt.X('Year', timeUnit='year'),
    y='mean(Miles_per_Gallon)',
    color='Origin'
).properties(
    width=800
)

# spread + lines -> Esta es la forma vieja de hacer que se superpongan (todavia funciona)
alt.layer(spread, lines, data=cars)

#### Seleccion e Interaccion

In [None]:
#Interaccion: muestro color solo de lo seleccionado
interval = alt.selection_interval()

alt.Chart(cars).mark_point().encode(
    x='Miles_per_Gallon',
    y='Horsepower',
    color=alt.condition(interval, 'Origin', alt.value('lightgray'))
).add_selection(
    interval
)

In [None]:
single = alt.selection_single()

alt.Chart(cars).mark_circle(size=100).encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(single, 'Origin', alt.value('lightgray'))
).add_selection(
    single
)

In [None]:
single = alt.selection_single(on='mouseover', nearest=True)

alt.Chart(cars).mark_circle(size=100).encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(single, 'Origin', alt.value('lightgray'))
).add_selection(
    single
)

In [None]:
interval = alt.selection_interval(encodings=['x'])

alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(interval, 'Origin', alt.value('lightgray'))
).add_selection(
    interval
)

In [None]:
interval = alt.selection_interval()

base = alt.Chart(cars).mark_point().encode(  #defino un tipo de grafico base que despues especializo con diferentes x
    y='Horsepower',
    color=alt.condition(interval, 'Origin', alt.value('lightgray')),
    tooltip='Name'
).add_selection(
    interval
)

hist = alt.Chart(cars).mark_bar().encode( #creo histograma que toma los datos del intervalo
    x='count()',
    y='Origin',
    color='Origin'
).properties(
    width=800,
    height=80
).transform_filter(
    interval
)

scatter = base.encode(x='Miles_per_Gallon') | base.encode(x='Acceleration') #especializo la base

scatter & hist

## Otros Ejemplos de la documentacion: Graficos interactivos.

link: https://altair-viz.github.io/gallery/index.html#interactive-charts

In [None]:
import altair as alt
from vega_datasets import data

source = data.seattle_weather()
brush = alt.selection(type='interval', encodings=['x'])

bars = alt.Chart().mark_bar().encode(
    x='month(date):O',
    y='mean(precipitation):Q',
    opacity=alt.condition(brush, alt.OpacityValue(1), alt.OpacityValue(0.7)),
).add_selection(
    brush
)

line = alt.Chart().mark_rule(color='firebrick').encode(
    y='mean(precipitation):Q',
    size=alt.SizeValue(3)
).transform_filter(
    brush
)

alt.layer(bars, line, data=source)


In [1]:
import altair as alt
import pandas as pd
import numpy as np

# Crea la data de forma aleatoria usando numpy  
np.random.seed(42)
source = pd.DataFrame(np.cumsum(np.random.randn(100, 3), 0).round(2),
                    columns=['A', 'B', 'C'], index=pd.RangeIndex(100, name='x'))
source = source.reset_index().melt('x', var_name='category', value_name='y')

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['x'], empty='none')

# The basic line
line = alt.Chart(source).mark_line(interpolate='basis').encode(
    x='x:Q',
    y='y:Q',
    color='category:N'
)

# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(source).mark_point().encode(
    x='x:Q',
    opacity=alt.value(0),
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'y:Q', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(source).mark_rule(color='gray').encode(
    x='x:Q',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)