<a href="https://colab.research.google.com/github/tarujg/viz-altair/blob/main/altair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interactive Scatter Plot in Altair

Interactive scatter plot from data stored in a Pandas dataframe.

In [151]:
import pandas as pd
import altair as alt

In [152]:
college_scores_df = pd.read_csv('/content/sample_data/calvinCollegeSeniorScores.csv')
college_scores_df.head()

Unnamed: 0,SATM,SATV,ACT,GPA
0,430,470,15,2.239
1,560,350,16,2.488
2,400,330,17,2.982
3,410,450,17,2.155
4,430,460,17,2.712


1. Scatterplot of SATM vs SATV scores. 
  - x coordinate encodes SATM scores
  - y coordinate encodes SATV scores
  - radius of points represents ACT scores
  - color represents GPA scores.
  - Features
    - Interactive plot
    - Zoom and Pan interactions
    - Tooltips showing SATM, SATV, ACT, and GPA
    - Legends

In [153]:
alt.Chart(college_scores_df).mark_point(
        filled=True,
        opacity=0.3).encode(
          x='SATM',
          y='SATV',
          color='GPA',
          size='ACT',
          tooltip=(['SATM', 'SATV', 'GPA', 'ACT'])).interactive()


2. Visualize dataset as scatter plot with linked brushing
  - Two scatterplots
    - SATM and SATV
    - ACT and GPA
  - Linked views with brushing techniques

In [154]:
brush = alt.selection_interval()
color_condition = alt.condition(brush, 'GPA', alt.value('lightgray'))

plot_1 = alt.Chart(college_scores_df, title="SATM vs SATV").mark_point(filled=True).encode(
  x='SATM',
  y='SATV',
  color=color_condition,
  tooltip=(['SATM', 'SATV'])
).add_selection(brush)

plot_2 = alt.Chart(college_scores_df, title="ACT vs GPA").mark_point(filled=True).encode(
  x='ACT',
  y='GPA',
  color=color_condition,
  tooltip=(['ACT', 'GPA'])
).add_selection(brush)

plot_1 | plot_2

In [155]:
import pandas as pd
import altair as alt
import numpy as np

In [92]:
fatalities_df = pd.read_csv('/content/sample_data/ukDriverFatalities.csv')
fatalities_df.head()

Unnamed: 0,month,year,count
0,0,1969,1687
1,1,1969,1508
2,2,1969,1507
3,3,1969,1385
4,4,1969,1632


1. Heatmap 
  - Year in one axis
  - Month in second axis
  - Color should encodes the number of deaths
  - Add legend

In [149]:
min_month, max_month = min(fatalities_df['month']),max(fatalities_df['month'])
min_year, max_year = min(fatalities_df['year']),max(fatalities_df['year'])

x, y = np.meshgrid(range(min_month,max_month+1), range(min_year, max_year+1))
deaths = fatalities_df.pivot(index='month', columns='year', values='count').to_numpy()

source = pd.DataFrame({'month': x.ravel(),
                       'year': y.ravel(),
                       'deaths': deaths.ravel()})


alt.Chart(source, title="Deaths across time").mark_rect().encode(
    x=alt.X('month:O',scale=alt.Scale(zero=False)),
    y=alt.Y('year:O',scale=alt.Scale(zero=False)),
    color=alt.Color('deaths:Q'),
    tooltip=(['deaths'])
)

2. Line chart showing the total number of deaths over the years

In [165]:
alt.Chart(fatalities_df, title="Total deaths over the years").mark_line().encode(
  x='year',
  y='sum(count)',
  tooltip=([alt.Tooltip('sum(count)', title='Total Deaths')])
).interactive(bind_y=False)