# Tutorial: Data Science

In this tutorial, we will introduce Solara from the perspective of a data scientist or when you are thinking of using Solara for a data science app.
It is therefore focussed on data (Pandas), visualizations (plotly) and how to add interactivity.

## You should know
This tutorial will assume:

  * You have succesfully installed Solara
  * You know how to display a Solara component in a notebook or script

If not, please follow the [Quick start](/docs/quickstart).

## Extra packages you need to install

For this tutorial, you need plotly and pandas, you can install them using pip:

  $ pip install plotly pandas

## You will learn

In this tutorial, you will learn:

   * [To create a scatter plot using plotly.express](#our-first-scatter-plot)
   * [Display your plot in a Solara component](#our-first-scatter-plot).
   * [Build a UI to configure the X and Y axis](#configure-the-x-axis).
   * [Handle a click event and record which point was clicked on](#interactive-plot).
   * [Refactor your code to build a reusable Solara component](#make-a-reusable-component).
   * [Compose your newly built component into a larger application](#make-a-reusable-component).

## The dataset

For this tutorial, we will use the [Iris flow data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) which contains the lengths and widths of the petals and sepals of three species of Iris (setosa, virginica and versicolor).

This dataset comes with many packages, but since we are doing to use plotly.express for this tutorial, we will use:

```python
import plotly.express as px
df = px.data.iris()
```

In [None]:
## solara: skip
import plotly.express as px


df = px.data.iris()
df


## Our first scatter plot

We use plotly express to create our scatter plot with just a single line.

```python
fig = px.scatter(df, "sepal_length", "sepal_width", color="species")
```

To display this figure in a Solara component, we should return an element that can render the plotly figure. [FigurePlotly](/api/plotly) will do the job for us.

Putting this together

In [None]:
import plotly.express as px
import solara

df = px.data.iris()


@solara.component
def Page():
    fig = px.scatter(df, "sepal_length", "sepal_width", color="species")
    solara.FigurePlotly(fig)

In [None]:
## solara: skip
Page()

## Configuring the X-axis

To configure the X-axis, first, create a global application state using:

```python
x_axis = solara.reactive("sepal_length")
```

This code creates a reactive variable. You can use this reactive variable in your component and pass it to a [`Select`]((/api/select)) component to control the selected column.


```python
columns = list(df.columns)
solara.Select(label="X-axis", values=columns, value=x_axis)
```

Now, when the Select component's value changes, it will also update the reactive variable x_axis.

If your components use the reactive value to create the plot, for example:


```python
fig = px.scatter(df, x_axis.value, "sepal_width", color="species")
```

The component will automatically re-execute the render function when the `x_axis` value changes, updating the figure accordingly.

In [None]:
columns = list(df.columns)
x_axis = solara.reactive("sepal_length")

@solara.component
def Page():
    # Create a scatter plot by passing "x_axis.value" to px.scatter
    # This will automatically make the component listen to changes in x_axis
    # and re-execute this function when x_axis value changes
    fig = px.scatter(df, x_axis.value, "sepal_width", color="species")
    solara.FigurePlotly(fig)
    
    # Pass x_axis to Select component
    # The select will control the x_axis reactive variable
    solara.Select(label="X-axis", value=x_axis, values=columns)


In [None]:
## solara: skip
Page()

### Understanding (optional)

#### State

Understanding state management and how Solara re-renders component is crucial for understanding building larger applications. If you don't fully graps it now, that is ok. You should first get used to the pattern, and consider reading [About state management](/docs/fundamentals/state-management) later on to get a deeper understanding.



## Configure the Y-axis.

Now that we can configure the X-axis, we can repeat the same for the Y-axis. Try to do this yourself, without looking at the code, as a good practice.

In [None]:
y_axis = solara.reactive("sepal_width")

@solara.component
def Page():
    fig = px.scatter(df, x_axis.value, y_axis.value, color="species")
    solara.FigurePlotly(fig)
    solara.Select(label="X-axis", value=x_axis, values=columns)
    solara.Select(label="Y-axis", value=y_axis, values=columns)       

In [None]:
## solara: skip
Page()

## Interactive plot

We now built a small UI to control a scatter plot. However, often we also want to interact with the data, for instance select a point in our scatter plot.

We could look up in the plotly documentation how exactly we can extract the right data, but lets take a different approach. We are simply going to store the data we get from `on_click` into a new reactive variable (`click_data`) and display the raw data into a Markdown component.

In [None]:
click_data = solara.reactive(None)

@solara.component
def Page():
    fig = px.scatter(df, x_axis.value, y_axis.value, color="species")
    solara.FigurePlotly(fig, on_click=click_data.set)
    solara.Select(label="X-axis", value=x_axis, values=columns)
    solara.Select(label="Y-axis", value=y_axis, values=columns)
    # display it pre-formatted using the backticks `` using Markdown
    solara.Markdown(f"`{click_data}`")
        

In [None]:
## solara: skip
Page()

### Inspecting the on_click data

Click a point and you should see the data printed out like:

```python
{'event_type': 'plotly_click', 'points': {'trace_indexes': [1], 'point_indexes': [34], 'xs': [5.4], 'ys': [3]}, 'device_state': {'alt': False, 'ctrl': False, 'meta': False, 'shift': False, 'button': 0, 'buttons': 1}, 'selector': None}
```

We can see from the raw data that we can access the trace index we clicked on (we have 3 traces, one for setosa, versicolor and virginica). We can also get access to the point_index (which point in the trace). With these two numbers we can find the row number we clicked

### Finding row number (optional)

It is slightly annoying that plotly express splits up our dataframe into 3 traces, since now we don't have enough information to find back to row number.

There is a trick we can do to get the row index, if we pass `df.index` to the custom data argument, plotly express will also 'distribute' the index along the traces. This information we can use to reconstruct the row index from the trace index and point index.


### Displaying the row number

Ok, we sorted out how to get the row number, we simply display it to test if our code works.

In [None]:
def find_row_index(fig, click_data):
    # goes from trace index and point index to row index in a dataframe
    # requires passing df.index as to custom_data
    trace_index = click_data['points']['trace_indexes'][0]
    point_index = click_data['points']['point_indexes'][0]
    trace = fig.data[trace_index]
    return trace.customdata[point_index][0]
    

clicked_row = solara.reactive(None)

@solara.component
def Page():
    # Instead of passing FigurePlotly the clicked_row.set function directly
    # we need to do some data manipulation first.
    # we do this in a local function, so that we can acess the local
    # variables we need (set_clicked_row function and fig)
    def on_click(click_data):
        # sanity checks
        assert click_data['event_type'] == "plotly_click"        
        row_index = find_row_index(fig, click_data)
        clicked_row.value = row_index

    fig = px.scatter(df, x_axis.value, y_axis.value, color="species", custom_data=[df.index])
    solara.FigurePlotly(fig, on_click=on_click)
    solara.Select(label="X-axis", value=x_axis, values=columns)
    solara.Select(label="Y-axis", value=y_axis, values=columns)
    if clicked_row is not None:
        solara.Markdown(f"Clicked on `index={clicked_row}`")
    else:
        solara.Info("Click to select a point")


In [None]:
## solara: skip
Page()

## Displaying the nearest neighbours

We now have the row index of the point we clicked on, we will use that to improve our component, we will.

   1. Add an indicator in the scatter plot to highlight which point we clicked on.
   2. Find the nearest neighbours and display them in a table.
  
For the first item, we simply use plotly express again, and add the single trace it generated to the existing figure (instead of displaying two separate figures).

We add a function to find the `n` nearest neighbours:

```python
def find_nearest_neighbours(df, xcol, ycol, x, y, n=10):
    df = df.copy()
    df["distance"] = ((df[xcol] - x)**2 + (df[ycol] - y)**2)**0.5
    return df.sort_values('distance')[1:n+1]
```

We now only find the nearest neighbours if `clicked_row`, and display the dataframe using the [`DataFrame`](/api/dataframe) component.


In [None]:
def find_nearest_neighbours(df, xcol, ycol, x, y, n=10):
    df = df.copy()
    df["distance"] = ((df[xcol] - x)**2 + (df[ycol] - y)**2)**0.5
    return df.sort_values('distance')[1:n+1]


clicked_row = solara.reactive(None)


@solara.component
def Page():
    x, set_x = solara.use_state("sepal_length")
    y, set_y = solara.use_state("sepal_width")

    fig = px.scatter(df, x_axis.value, y_axis.value, color="species", custom_data=[df.index])

    if clicked_row.value is not None:
        # add an indicator 
        click_x = df[x].values[clicked_row.value]
        click_y = df[y].values[clicked_row.value]
        fig.add_trace(px.scatter(x=[click_x], y=[click_y], text=["⭐️"]).data[0])
        df_nearest = find_nearest_neighbours(df, x_axis.value, y_axis.value, click_x, click_y, n=3)

    def on_click(click_data):
        # sanity checks
        assert click_data['event_type'] == "plotly_click"
        row_index = find_row_index(fig, click_data)
        clicked_row.value = row_index

    solara.FigurePlotly(fig, on_click=on_click)
    solara.Select(label="X-axis", value=x_axis, values=columns)
    solara.Select(label="Y-axis", value=y_axis, values=columns)
    if clicked_row.value is not None:
        solara.Markdown("## Nearest 3 neighbours")
        solara.DataFrame(df_nearest)
    else:
        solara.Info("Click to select a point")

In [None]:
## solara: skip
Page()