# Simple examples to demonstrate Altair library (for visulization)
> some quick examples to try

- toc:true
- branch: master
- badges: true
- comments: false
- author: John Li
- categories: [Visulization, Altair]

Acknowledgement:
- Examples are from this [page](http://fernandoi.cl/blog/posts/altair/) and Shantam Raj's talk
- A youtube video from Jake VanderPlas - How to Think about Data Visualization - PyCon 2019 [link](https://www.youtube.com/watch?v=vTingdk_pVM)
- A cool interactive visulization chart [here](https://altair-viz.github.io/gallery/seattle_weather_interactive.html)
- Shantam Raj PyData 2020 talk: [Rapidly emulating professional visualizations from New York Times in Python using Alta](https://www.youtube.com/watch?v=pPHhv7qsQ_8)
    - [Author's slides](https://armsp.github.io/talks/pydataglobal-2020/)
- To install Altair: `conda install -c conda-forge altair vega_datasets`

## Some simple examples

In [2]:
import pandas as pd
import altair as alt

data = pd.DataFrame({'country_id': [1, 2, 3, 4, 5, 6],
                     'population': [1, 100, 200, 300, 400, 500],
                     'income':     [50, 50, 200, 300, 300, 450]})
data

Unnamed: 0,country_id,population,income
0,1,1,50
1,2,100,50
2,3,200,200
3,4,300,300
4,5,400,300
5,6,500,450


In [4]:
"""As we mentioned before, we need to define 3 parameters:
 1. Mark: We do this by using "mark_circle".
 2. Channel: We only define an x-axis and we map it to the population.
 3. Encodings: We define both variables as quantitative by using :Q after the column name"""

categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        color='country_id:Q')

categorical_chart

In [5]:
# We changed color='country_id:Q' to color='country_id:N' to indicate it is a nominal variable
categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        color='country_id:N')
categorical_chart

In [6]:
categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        y='income:Q',
                        color='country_id:N')
categorical_chart

In [8]:
categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        y='income:Q',
                        color='country_id:N',
                        tooltip=['country_id', 'population', 'income'])
categorical_chart

## More concrete example (weather data)

In [9]:
import altair as alt  
import pandas as pd

weather_data = "https://github.com/vega/vega-datasets/blob/master/data/weather.csv?raw=True"
data = pd.read_csv(weather_data)
data['date'] = pd.to_datetime(data['date'])
data.head()

Unnamed: 0,location,date,precipitation,temp_max,temp_min,wind,weather
0,Seattle,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,Seattle,2012-01-02,10.9,10.6,2.8,4.5,rain
2,Seattle,2012-01-03,0.8,11.7,7.2,2.3,rain
3,Seattle,2012-01-04,20.3,12.2,5.6,4.7,rain
4,Seattle,2012-01-05,1.3,8.9,2.8,6.1,rain


In [13]:
### locations
data.location.value_counts()

New York    1461
Seattle     1461
Name: location, dtype: int64

In [14]:
### Weather per location
data.groupby(["location", "weather"]).size()

location  weather
New York  drizzle     58
          fog         38
          rain       446
          snow        93
          sun        826
Seattle   drizzle     53
          fog        101
          rain       641
          snow        26
          sun        640
dtype: int64

In [15]:
### Weather per location (anotehr format)
data.groupby(["location", "weather"]).size().reset_index(name="Days")

Unnamed: 0,location,weather,Days
0,New York,drizzle,58
1,New York,fog,38
2,New York,rain,446
3,New York,snow,93
4,New York,sun,826
5,Seattle,drizzle,53
6,Seattle,fog,101
7,Seattle,rain,641
8,Seattle,snow,26
9,Seattle,sun,640


In [16]:
alt.Chart(data).mark_point().encode(
    x = 'date',
    y = 'temp_max',
    column = 'location'
)

In [17]:
### use different color for different location
alt.Chart(data).mark_point().encode(
    x = 'date',
    y = 'temp_min',
    column = 'location',
    color = 'location'
)

In [18]:
### the histogram of temp_max for both cities together
alt.Chart(data).mark_bar().encode(
    x = 'temp_max',
    y = 'count(temp_max)'
)


In [19]:
scatter = alt.Chart(data).mark_point().encode(
    x = 'precipitation',
    y = 'wind'
)

regression = alt.Chart(data).transform_regression('precipitation', 'wind').mark_line().encode(
    x = 'precipitation',
    y = 'wind'
)


In [20]:
scatter | regression

In [21]:
scatter + regression.mark_line(color="red")