# Midas Demo

Welcome to Midas! Midas is a reserach prototype to help us explore a *new programming medium*, where you we provide tools to help you move smoothly between coding and **interactions on visualizations**. Now that sounds rather abstract, so let's get started on the demo, which will give you concrete examples for when you might want to move from code to interactions and back.

Please follow the tutorial to learn the basics of Midas. We will walk through with a real example and introduce each feature as motivated by a specific need. Until you are done with the tutorial, please do *not* randomly click on the interface, and only do what is asked in our tutorial. This will help you understand.

Should you have any questions, please feel free to ask Yifan, who will be present during the entire session.

In [None]:
from midas import Midas
import numpy as np

# Initiate Midas environment
m = Midas()

No need to read through the text in the blue pane---we'll walk you through most of it.

In [None]:
# Load data
fires_df = m.from_file("./data/fire_earlier.csv")
fires_df

You will see a blue bar below the cell---this is an indicator that "Midas generated cells" will be placed just below. You will know what we mean in the next step. Be aware that the blue bar will move as we go through the tutorial, to the most recently executed cell.

### <font color="0080FF">Exploratory Analysis: Distribution of Fire Causes</font>

From the above table, the first question that comes to mind is the distribution of causes of fire (`CAUSE_DESCR`). Conveniently, Midas has a shortcut for seeing column distributions---**please find `CAUSE_DESCR` in the pane to the right and click it**.

A cell with yellow emoji may be created above---that is because Midas always create cells after the last executed cell. **Please move it below the text for organization** using the arrow keys in the Jupyter menu bar.  The cell describes the query used to derive the data for the chart, which is a `group` operation, and by default, the aggregation is a `count`. You also see that `.vis` is called at the end---it's useful for customizing your own visualizations. We'll talk more about this later.

### <font color="0080FF">Exploratory Analysis: Distribution of State</font>

Another thing we can look at is how many fires there are by state (`STATE`). Again you can go ahead and **click on the `STATE` column in the yellow pane**.

Now you might find that you want to see more of the STATE chart---to do this, you can **drag the left edge of the pane to change the size**. You can **click on "toggle midas" and "toggle column pane" in the menu bar to hide/show the shelves**.

### <font color="0080FF">Investigation: Cause Distribution in California</font>

Let's get a sense of what fires happen in California---**please click on CA in the `STATE_fires_df_dist` chart**. You will see some shorter blue bars being filtered in the `CAUSE_DESCR_fires_df_dist`---this is the distribution of `CAUSE_DESCR` filtered by `CA`.

You will also see cells created that are annotated with "ðŸ”µ"---this serves as an executable log of the interactions. You can try executing a different value, e.g. change 'CA' to 'NY'. Or you can programmtically empty the selection using `m.sel([])`.

You will also notice that the chart which you interacted with is highlighted in red---this serves as an additional visual cue of active selections.

Since the filtered values are a smaller portion of the chart, it might be hard to compare. **Please click on the ellipsis icon next to the chart name and then the "show filtered data only" icon**.  This will remove the paler blue bars in the background (fire counts across all states), and help you see better.  To close the dropdown menu, click on the ellipsis button again.

We notice the interesting observation that California gets hit by Lightning a lot. To record this insight, **please click on the ðŸ“· icon in the menu bar**. You will see a cell created with ðŸŸ  emoji that contains both of the current charts, as well as the code that is used to derived the data on the charts, which may be helpful for reproduceability. <font color="gray">(The snap-shotted charts are stored as SVG, a HTML image format.)</font>

In some cases you might want to record the facts that you have observed. To see an example, execute the cells below. You will see an array of the top causes for fires in California, which should be `['Miscellaneous', 'Equipment Use']`, corresponding to what you see in the charts.

### <font color="0080FF">Investigating Lightning and Land Area</font>
Now let's dig a little further and see what other states are affected by Lightning. **Please click on the Lightning bar**. Perhaps the count of lightning fires is mostly due to land area? In thecell below, we will load data from this [source](https://www.usgs.gov/special-topic/water-science-school/science/how-wet-your-state-water-area-each-state?qt-science_center_objects=0#qt-science_center_objects) to get the area of each state to compare with our data.

In [None]:
# Load data
land_size_df = m.from_file("./data/state_land_sizes.csv")
land_size_df.head(3)

We can modify the dataframe so that the new dataframe's state column is consistently named with the new data.

In [None]:
state_dict = {"Alabama":"AL", "Alaska":"AK", "Arizona":"AZ", "Arkansas":"AR", "California":"CA", "Colorado":"CO", "Connecticut":"CT", "Delaware":"DE", "Florida":"FL", "Georgia":"GA", "Hawaii":"HI", "Idaho":"ID", "Illinois":"IL", "Indiana":"IN", "Iowa":"IA", "Kansas":"KS", "Kentucky":"KY", "Louisiana":"LA", "Maine":"ME", "Maryland":"MD", "Massachusetts":"MA", "Michigan":"MI", "Minnesota":"MN", "Mississippi":"MS", "Missouri":"MO", "Montana":"MT", "Nebraska":"NE", "Nevada":"NV", "New Hampshire":"NH", "New Jersey":"NJ", "New Mexico":"NM", "New York":"NY", "North Carolina":"NC", "North Dakota":"ND", "Ohio":"OH", "Oklahoma":"OK", "Oregon":"OR", "Pennsylvania":"PA", "Rhode Island":"RI", "South Carolina":"SC", "South Dakota":"SD", "Tennessee":"TN", "Texas":"TX", "Utah":"UT", "Vermont":"VT", "Virginia":"VA", "Washington":"WA", "West Virginia":"WV", "Wisconsin":"WI", "Wyoming":"WY", "District of Columbia": "DC"}
land_size_df["STATE"] = land_size_df.apply(lambda x: state_dict[x], "state_name")

We can accessing the data programmatically---in the cell below, we can **use `get_filtered_data` to directly access the filtered result**.

In [None]:
# now let's get the the data filtered by lightning
lightning_df = STATE_fires_df_dist.get_filtered_data()
lightning_df

<font color="gray">A general Jupyter tip: type, "get_" then press the "Tab" key for auto-complete!</font>

**You can also use `static_vis` to quickly look at one-off visualizations.** The static vis is generated right below the cell, and _not_ in the midas chart area---this is because static visualizations do not reactively update based on the interactions, and are one-off.

In [None]:
# now we are going to plot a scatter plot against the lightning counts and the land counts
count_and_area_df = lightning_df.join("STATE", land_size_df.select(["STATE", "area_sq_miles"]), "STATE")
count_area_scatter = count_and_area_df.select(["count", "area_sq_miles"])
count_area_scatter.static_vis()

In [None]:
count_area_scatter.corr()

Looking at the plot, there doesn't appear to be a correlation, let's move on. Now we don't need to look at `land_size_df` anymore, **please click on the text in the yellow pane to hide the columns**.

### <font color="0080FF">Exploratory Analysis: Distribution of discovery time</font>

**Please go ahead and click on the `DISCOVERY_TIME` column.** For numeric values, the default chart selection behavior is a brush (via dragging). If you want to make it click based, try running the following cell, with `selection_type="multiclick"` passed in.

In [None]:
m.help(DISCOVERY_TIME_fires_df_dist.vis)

### Data "Cleaning"
Here, the `DISCOVERY_DATE` column is in scientific format, e.g., `2.45246e+06`---we are going to create a new column to record the year of the incident, by running a custom function that maps the time a `datetime` object, which we can use to derive year.

In [None]:
from datetime import datetime 
fires_df["year"] = fires_df.apply(lambda s: datetime.fromtimestamp(s).year, "DISCOVERY_DATE")

In [None]:
# we can invoke show_profile to reflect on the change
m.show_profile(fires_df)

We find that all the data was from 1970, and we cannot find the trends over time. You can now delete the chart, by selecting the option in the menu.

### <font color="0080FF">Exploratory Analysis: Distribution of fire size and discovery time</font>

In the cell below, we can create our own **custom visualizations**, and then add them to Midas interactions via `vis`.

In [None]:
size_time_df = fires_df.select(["DISCOVERY_TIME", "FIRE_SIZE"])
size_time_df.vis()

In [None]:
fires_df.get_filtered_data()

Now you can use **reactive cells to inspect the values dynamically**.

In [None]:
%%reactive

fires_df.get_filtered_data()['STATE']

Now that we have created a good number of charts, we can now make the interactive visualizations more prominent, and rearrange the charts by clicking on the dropdown menu and clicking on the left and right arrows.

### <font color="0080FF">Understanding Geo</font>

To get a sense of where the fires are distributed, here is a helper mapping function we provide for you.

In [None]:
from midas.util.utils import plot_heatmap

locs = fires_df.select(["LATITUDE", "LONGITUDE"])
plot_heatmap(locs, zoom_start=3)

We can use **reactive cells** to drive the filtering, similar to that above, but adapted to mapping.

In [None]:
%%reactive

filtered = fires_df.get_filtered_value().select(["LATITUDE", "LONGITUDE"])
plot_heatmap(filtered, zoom_start=3)

You can also use `%%reactive -disable` if you want to keep the cell, but no longer have it trigger reactively.

### <font color="0080FF">More fires data</font>

After the initial analysis we now find another data source with the same schema but different data. To **reuse** the visual analysis you have performed, you can look at the cells generated by Midas, and you can also directly copy code used to derive certain charts you found interesting. You can click the ellipsis button and then click on the "get code to clipboard button" to reproduce your work.

In [None]:
later_fires_df = m.from_file("./data/fire_later.csv")
later_fires_df

### <font color="0080FF">Taking note of what you have looked at</font>

Often in data analysis, it's important to understand what data you have looked at and what data you have not. The following function, `all_selections`,  returns all the interactions you have made. We have limited to the most recent 5, which you should feel free to change. An example observation based on the history is _"I have looked at `Lightning` and `Debris Burning` in more detail; it might also be interesting to look at fire sizes next time"_.

In [None]:
m.all_selections[-10:]

In contrast, if you just want to take a look at non-selection cells, you can click on the "Toggle ðŸ”µ" button to hide the selection cells.

One last feature to help you contextualize your visual analysis is navigating to the cells where you generated the charts, by clicking on the dropdown menu and clicking "find defining cell".