# Part 1 - Exercises - Data Types, Graphical Marks, and Visual Encoding Channels

The goal of these exercises is to gain experience with the concepts from Part 1.

In [1]:
import altair as alt
import pandas as pd
from vega_datasets import data
print("The installed Vega-Altair version is " + alt.__version__)

The installed Vega-Altair version is 5.3.0


## Spotify exercise

The goal of this first exercise is to familiarize ourselves with the Altair syntax.  Load the following csv file as a pandas DataFrame: [Spotify dataset](../resources/datasets/spotify.csv).  This dataset is a very slightly cleaned version of a dataset found on [Kaggle](https://www.kaggle.com/datasets/arnavvvvv/spotify-music).

> **Note:** This dataset will be used again in the Part 2 Exercises.



In [2]:
df = pd.read_csv("../resources/datasets/spotify.csv")

In [5]:
df.columns

Index(['Index', 'Highest Charting Position', 'Number of Times Charted',
       'Week of Highest Charting', 'Song Name', 'Streams', 'Artist',
       'Artist Followers', 'Song ID', 'Genre', 'Release Date', 'Weeks Charted',
       'Popularity', 'Danceability', 'Energy', 'Loudness', 'Speechiness',
       'Acousticness', 'Liveness', 'Tempo', 'Duration (ms)', 'Valence',
       'Chord'],
      dtype='object')

In [6]:
df.Valence

0       0.589
1       0.478
2       0.688
3       0.591
4       0.894
        ...  
1551    0.608
1552    0.714
1553    0.394
1554    0.881
1555    0.422
Name: Valence, Length: 1556, dtype: float64

In [3]:
import altair as alt

alt.Chart(df).mark_circle().encode(
    alt.X('Energy'),
    alt.Y('Acousticness'),
    alt.Color('Valence', scale=alt.Scale(scheme='category20b')),
    alt.Tooltip(['Artist', 'Song Name'])
).properties(
    title='Scatter Plot of Song Energy vs Acousticness'
)

### Part a

* Create a scatter plot from this data using the encodings `alt.X`, `alt.Y`, and `alt.Color`.  (Choose any fields you like for these encodings.)
* Include a tooltip that indicates the artist and song name.  
* Choose a visually appealing color scheme from the [Vega documentation](https://vega.github.io/vega/docs/schemes/) and specify this color scheme using the method syntax.
* Include a title for your chart by using the `properties` method of the Altair Chart object.  (Here is a Title Configuration example from the [documentation](https://altair-viz.github.io/user_guide/configuration.html#title-configuration).)

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/spotify1.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(df).mark_circle().encode(
    alt.X("Energy"),
    alt.Y("Acousticness"),
    alt.Color("Valence").scale(scheme="viridis"),
    alt.Tooltip(["Artist", "Song Name"])
).properties(
    title="Spotify songs"
)
  ```
</details>

### Part b

Which song in the dataset released between 1970 and 1980 had the lowest "Valence" level?  The highest?  Use the "Release Date" field to help answer this question.  Specify that Altair should interpret these Release Date values as dates, not strings, by using the `":T"` encoding type abbreviation.

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/spotify2.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(df).mark_circle().encode(
    alt.X("Release Date:T"),
    alt.Y("Valence"),
    alt.Tooltip(["Artist", "Song Name"])
)
  ```
</details>

## Gapminder exercise

The following is based on [this chart](https://altair-viz.github.io/gallery/gapminder_bubble_plot.html) from the Altair example gallery, which in turn is based on the following [Lisa Charlotte Muth blogpost](https://lisacharlottemuth.com/2016/05/17/one-chart-code/).

* Load the `gapminder_health_income` dataset from vega_datasets.  (Notice that this is not the same as the `gapminder` dataset.)
* Make a scatter plot from this data.
* Use "income" for the x-axis encoding.
* Use "health" for the y-axis encoding.

In [None]:
source = data.gapminder_health_income()

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/gapminder1.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(source).mark_circle().encode(
    alt.X('income'),
    alt.Y('health'),
)
  ```
</details>

 - By default, Vega-Altair includes zero when defining quantitative scales.  For the y-axis encoding only, change this `zero` argument to `False` within the `scale` method.

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/gapminder2.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(source).mark_circle().encode(
    alt.X('income'),
    alt.Y('health').scale(zero=False),
)
  ```
</details>

* Add size and color encodings, both using the "population" field.
* Specify a color scheme of your choice.  If desired, you can set the `reverse` keyword argument to `True` reverse the ordering of the colors.

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/gapminder3.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(source).mark_circle().encode(
    alt.X('income'),
    alt.Y('health').scale(zero=False),
    alt.Size('population'),
    alt.Color('population').scale(scheme="spectral", reverse=True)
)
  ```
</details>


* Recall that scales in Altair convert data values to visual values.  Specify that, for the x-axis encoding, a log scale should be used to make this conversion.

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/gapminder4.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(source).mark_circle().encode(
    alt.X('income').scale(type="log"),
    alt.Y('health').scale(zero=False),
    alt.Size('population'),
    alt.Color('population').scale(scheme="spectral", reverse=True)
)
  ```
</details>


## Driving exercise

This exercise is based on the following two examples from the Altair example gallery:  [Link 1](https://altair-viz.github.io/gallery/line_custom_order.html) and [Link 2](https://altair-viz.github.io/gallery/falkensee.html). (But don't click the links unless you want a hint!)

* Load the *driving* dataset from vega_datasets.

In [None]:
df = data.driving()


* Make a line chart using "miles" for the x-encoding and "gas" for the y-encoding.
* Use `point=True` as a keyword argument to `mark_line` to also have the corresponding points plotted.  (Secretly, this is layering a scatter plot on top of our line chart.)
* Specify `zero=False` for the y-axis encoding scale.

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/driving1.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(df).mark_line(point=True).encode(
    alt.X("miles"),
    alt.Y("gas").scale(zero=False),
)
  ```
</details>


* Notice that the x-axis corresponds to miles driven, and points are connected accordingly.  Use the *order* encoding channel to specify that instead, points should be connected according to "year".

<details>
  <summary>(Show Image)</summary>
  <img src="../resources/images/part1/driving2.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart(df).mark_line(point=True).encode(
    alt.X("miles"),
    alt.Y("gas").scale(zero=False),
    alt.Order("year")
)
  ```
</details>


We would also like to highlight a certain interesting region of the chart.  We will use a separate rectangle chart to make this highlight.  Rather than being data-based, we will mostly specify the properties of this rectangle chart directly.

Recall that scales describe how to convert data values to visual values.  Sometimes we would like to explicitly specify data values, and sometimes we would like to explicitly specify visual values.  For data values, we use `alt.datum`, and for visual values, we use `alt.value`.

* Make a rectangle chart (`mark_rect`, [reference](https://altair-viz.github.io/user_guide/marks/rect.html)) with the x-axis spanning from `6500` to `7000`, viewed as data values.  No dataset is needed here.  Should we be using `alt.datum` or `alt.value`?

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart().mark_rect().encode(
    x=alt.datum(6500),
    x2=alt.datum(7000)
)
  ```
</details>

* Specify the color as `"orange"` and the opacity as `0.2`.  It's probably most natural to specify these using keyword arguments to `mark_rect`, but for practice with `alt.datum` vs `alt.value`, instead specify these visual values within the `encode` method.

<details>
  <summary>(Show Answer)</summary>

  ```python
alt.Chart().mark_rect().encode(
    x=alt.datum(6500),
    x2=alt.datum(7000),
    color=alt.value("orange"),
    opacity=alt.value(0.2)
)
  ```
</details>

* Assign the above two finished charts (the line chart with the custom order and our explicitly described rectangle chart) to two Python variables.
* Layer these charts, one on top of the other, using either the abbreviation `+`, or as specifying the two charts as arguments to `alt.layer`.

<details>
  <summary>(Show Answer)</summary>

  ```python
c1 = alt.Chart(df).mark_line(point=True).encode(
    alt.X("miles"),
    alt.Y("gas").scale(zero=False),
    alt.Order("year")
)

c2 = alt.Chart().mark_rect().encode(
    x=alt.datum(6500),
    x2=alt.datum(7000),
    color=alt.value("orange"),
    opacity=alt.value(0.2)
)

alt.layer(c1, c2)
  ```
</details>

## Barley exercise

The *barley* dataset, which is part of `vega_datasets`, contains an error.  One of the six sites represented in the data, has had its values for the years `1931` and `1932` swapped.
 Try to identify which site contains the swapped data by producing an Altair chart similar to the following.

 Aspects to consider:
 * We have mostly been using `mark_circle` so far in Part 1, but here you should use `mark_point`.  This will allow us to specify a shape encoding below.
 * In our sample image below, there are two charts displayed.  (This sample image is only a portion of the final image.  There will eventually be more charts.)  Do these charts correspond to concatenation or to faceting?
 * We have removed the `title` from many of the axes in this picture to make the exercise more difficult, but that is not an essential aspect of the final image you should produce.
 * The quantitative aspect of the year values is not emphasized here (we are interested in `1931` vs `1932` as a binary choice). Specify an appropriate data type when using the "year" field.
 * With an appropriate data type and the color encoding channel, the year values should be colored as in our sample image (no need to specify them manually).
 * Use the shape encoding channel to emphasize the difference between the years.  This will only work if you use one of the categorical data types.
 * We have explicitly specified the shapes to use as "square" and "triangle-up" (see [here](https://vega.github.io/vega-lite/docs/point.html#properties) for more shape options).  Should we specify these in the `axis` method or the `scale` method?  Should we specify these using `domain` or using `range`?
 * Try to include/exclude gridlines as in our sample image, by specifying the `grid` value (`True` or `False`) to an appropriate method.  Should we use `axis` or `scale` here?
 * Once your image is produced, answer the question: Which site mistakenly had its year values swapped in the dataset?
 * [Reference/Related Answer](https://altair-viz.github.io/gallery/beckers_barley_facet.html)

Sample image:
<img src="../resources/images/part1/barley.png">

<details>
  <summary>(Show Complete Image)</summary>
  <img src="../resources/images/part1/barley1.png">
</details>

<details>
  <summary>(Show Answer)</summary>

  ```python
df = data.barley()
alt.Chart(df).mark_point().encode(
    alt.X('yield:Q').axis(grid=False),
    alt.Y('variety:N').axis(grid=True).title(None),
    alt.Color('year:N').title(None),
    alt.Shape('year:N').scale(range=["square", "triangle-up"]),
    alt.Row('site:N')
)
  ```
</details>