# Midas Tutorial

Hello! Please follow the tutorial to learn the basics of Midas. Be sure to play around until you are comfortable. You will have about 20 minutes. Should you have any questions, please feel free to ask Yifan, who will be present during the entire session.

Midas is a Jupyter notebook library/extension that aids data exploration by providing relevant static  visualizations. The key of Midas is that **the operations you perform in the interactive visualization space is also reflected in code space**---you will see what this means if you run the code cells below!

### TODO: <font color="green">run the cell below</font>

In [None]:
from midas import Midas
import numpy as np

m = Midas()

## Initiate Midas

Import the the Midas class from library `midas`. When you create an instance of `Midas`, you see that a dashboard-like area pops up to the right, composed of **two pannels**, one that will be populated with visualizations of dataframes you create, and one for the "original" dataframes.

**Adjusting the pannels**

- Drag the left edge of the pannel to change the size.
- Click the "toggle midas" and "toggle column shelf" to hide/show the shelves.

<font color='gray'>Note that Per a single notebook, you can only have one Midas instance.</font> 

## Load data
Midas takes in data from a few APIs, such as `from_file`, used below, which loads from pandas dataframe.
Note that you can also use

In [None]:
fires_df = m.from_file("./data/fires.csv")
fires_df

## Querying data

Midas supports dataframes with syntax using that of the [data science module](http://data8.org/datascience/), which we talked about earlier.

In [None]:
# You might be more used to seeing the syntax df['FIRE_SIZE'] > 100
# Our syntax requires that you explicitly call the function.
big_fires = fires_df.where('FIRE_SIZE', m.are.above(100))
big_fires

## Seeing data

Midas attempts to visualize the dataframe for you directly---to do this, just call the `.vis()` function.

Sometimes, you may want to change the kind of chart, which is also very easy to do in Midas. If you want to change the chart types from bar to scatter, or to line etc., just specify `mark` (`"bar"`, `"line"`, or `"circle"`).

If you click on the circular ellipses to the right, you will find a few helpful buttons
- ⬅️➡️ move the charts around
- 📷 take a snapshot of the current image
- 📊 let's you navigate back to the cell for which the chart is defined
- ➖ minimizes the chart
- ❌ deletes the chart

In [None]:
STATE_distribution = fires_df.group('STATE')
STATE_distribution

In [None]:
STATE_distribution.vis()

In [None]:
# you can use your own aggregators
average_fire_size = fires_df.select(['STATE', 'FIRE_SIZE']).group('STATE', np.average)

In [None]:
average_fire_size.vis()

In [None]:
# you can also use the vis to change the encoding, when it makes sense
# when in doubt, try "?"
average_fire_size.vis?

In [None]:
average_fire_size.vis(mark="circle")

## Seeing distributions automatically

When you click on a column in the columns pane, go ahead and click on the "STATE" column. After you click, two effects take place:
1. a cell will be created that contains dataframe calls that derives the new filtered values, as well as the visualization calls. You will see that they have color emoji such as 🟠, these are indicators for you to better visually navigate.
2. a chart is created that visualizes the data created in the pane on the right hand side

If the chart is the wrong encoding, or if the grouping query is inaccurate, fell free to modify the queries---the results will be reflected in the chart automatically.

### <font color="green">TASK: click on the "CAUSE_DESCR" column</font>

**Seeing Errors**: Sometimes, we cannot visualize the data, . Here are some common cases:

- When there are too many unique values and we do not know the grouping logic. For instance, in this dataset, we have `COUNTY`, and there are are too many unique values---no cells will be given.
- When a numeric field has `NaN`s. We will generate the stub code that you can modify to fix the query.

### <font color="green">TASK: click on the "COUNTY" column</font>
Here you will see an error pop up, this is because there are too many COUNTY values to visualize.

## Making selections

All the existing visualizations are equipped with the ability to **select**.

* With scatter plots and line charts, you can **brush** select on both the x and y axis---to brush, you must enter **shift** while performing the drag (this is because dragging without the shift key results in moving the chart around).
* With bar charts, you can **click** on the bars, and **shift-clikc** to add more.

When you perform a selection, you will observe two effects
1. a cell will be generated with the selections you have made, signaled by the emoji 🔵 in the comment, along with the time the interaction was performed.
2. the other visualizations will be _automatically_ filtered by the selection---this is known as a "cross-filter" interaction technique.

The original data might be useful to provide a stable point of reference. However, if you do _not_ wish to see the original data in the background (in a dimmer blue color), just click on the emoji 📌 to "unpin" the original data, and then click 📍 to pin the original data back. As an example, when you visualize the `average` or `median`, you might want to remove the orignal data, since the values maybe have increased (as opposed to being smaller).

Note that **selections persist across interations on different charts** unless you
* **explicitly de-select by clicking on any region of the bar with whitespace**
* remove the chart where the selection is made from


In [None]:
m.sel([{"STATE_distribution": {"STATE": ["CA"]}}])

In [None]:
# reset selections
m.sel([])

In [None]:
m.current_selection

## Recording Views/Selections



## Accessing selections programmatically

Access selection in **predicate** form from the Midas runtime variable, `m` (you can assign it other names if you wish).

- the interaction you just had: `m.immediate_selection`
- most recent _overall_ selection: `m.current_selection` --- this is different from before because you can continue building your selections
- all selections made in the past: `m.all_selections`

Access selection results in **data** form: call `<df>.get_filtered_data` to access the underling values. 

Access your work in **code** form: go to the menu to the top right of the chart and select the 📋, which copies the code to the clip-board

In [None]:
m.immediate_selection

In [None]:
# this is a shorthand to just access the value
m.immediate_value

In [None]:
m.current_selection

In [None]:
m.all_selections

In [None]:
# if you currently have a filter that applied to STATE_distribution
# you can get the data via the API `get_filtered_data`
STATE_distribution.get_filtered_data()

## Reactive cells and custom visualizations

A reactive cell means that Midas will run it after interactions.
Reactive cells can be used to inspect the state or computation related to the selection events.

In [None]:
%%reactive

print(m.current_selection)

In [None]:
%%reactive

if m.immediate_value:
    discoverytime_firesize_df = fires_df.where('STATE', m.are.contained_in(m.immediate_value)).select(["DISCOVERY_TIME", "FIRE_SIZE"])
    discoverytime_firesize_df.reactive_vis()

## Testing your understanding

Please make sure you have the distribution of the `STATE` column ready (by clicking on `STATE`), and the distribution of `CAUSE_DESCR`. Using this interactive visualization, please identify a couple interesting findings.