# Introduction: IPython Widgets

In this notebook, we will get an introduction to IPython widgets. These are tools that allow us to build interactivity into our notebooks often with a single line of code. These widgets are very useful for data exploration and analysis, for example, selecting certain data or updating charts. In effect, Widgets allow you to make Jupyter Notebooks into an interactive dashboard instead of a static document.

Run the below cell if needed. You can also do this from the command line. If in Jupyter lab, [check out the instructions for that environment](https://ipywidgets.readthedocs.io/en/stable/user_install.html). 

In [2]:
!pip install -U -q ipywidgets
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


These are the other imports will use. 

In [3]:
# Standard Data Science Helpers
import numpy as np
import pandas as pd
import scipy

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(colorscale='plotly', world_readable=True)

# Extra options
pd.options.display.max_rows = 30
pd.options.display.max_columns = 25

# Show all code cells outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

ModuleNotFoundError: No module named 'plotly'

In [4]:
import os
from IPython.display import Image, display, HTML

## Data

For this project, we'll work with my medium stats data. You can grab your own data or just use mine! 

In [5]:
df = pd.read_parquet('https://github.com/WillKoehrsen/Data-Analysis/blob/master/medium/data/medium_data_2019_01_26?raw=true')
df.head()

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
Trying to import the above resulted in these errors:
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

In [5]:
df.describe()

Unnamed: 0,claps,days_since_publication,fans,num_responses,read_ratio,read_time,reads,title_word_count,views,word_count,claps_per_word,editing_days,<tag>Education,<tag>Data Science,<tag>Towards Data Science,<tag>Machine Learning,<tag>Python
count,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0
mean,1815.263158,248.407273,352.052632,7.045113,29.074662,12.917293,6336.300752,7.12782,23404.030075,3029.120301,0.957638,20.330827,0.729323,0.609023,0.43609,0.383459,0.315789
std,2449.074661,179.370879,479.060117,9.056108,12.41767,9.510795,9007.284726,3.158475,33995.636496,2393.414456,1.846756,74.111579,0.445989,0.489814,0.497774,0.488067,0.466587
min,0.0,1.218629,0.0,0.0,8.11,1.0,1.0,2.0,3.0,163.0,0.0,-13.0,0.0,0.0,0.0,0.0,0.0
25%,121.0,74.543822,23.0,0.0,20.02,8.0,363.0,5.0,1375.0,1653.0,0.052115,0.0,0.0,0.0,0.0,0.0,0.0
50%,815.0,245.41613,136.0,4.0,27.06,10.0,2049.0,7.0,7608.0,2456.0,0.421525,1.0,1.0,1.0,0.0,0.0,0.0
75%,2700.0,376.080598,528.0,12.0,34.91,14.0,7815.0,8.0,30141.0,3553.0,1.099366,5.0,1.0,1.0,1.0,1.0,1.0
max,13600.0,597.301123,2588.0,59.0,74.37,54.0,41978.0,16.0,173714.0,15063.0,17.891817,349.0,1.0,1.0,1.0,1.0,1.0


# Simple Widgets

Let's get started using some widgets! We'll start off pretty simple just to see how the interface works.

In [6]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

To make a function interactive, all we have to do is use the `interact` decorator. This will automatically infer the input types for us! 

In [7]:
@interact
def show_articles_more_than(column='claps', x=5000):
    display(HTML(f'<h2>Showing articles with more than {x} {column}<h2>'))
    display(df.loc[df[column] > x, ['title', 'published_date', 'read_time', 'tags', 'views', 'reads']])

interactive(children=(Text(value='claps', description='column'), IntSlider(value=5000, description='x', max=15…

The `interact` decorator automatically inferred we want a `text` box for the `column` and an `int` slider for `x`! This makes it incredibly simple to add interactivity. We can also set the options how we want.

In [8]:
@interact
def show_titles_more_than(x=(1000, 5000, 100),
                          column=list(df.select_dtypes('number').columns), 
                          ):
    # display(HTML(f'<h2>Showing articles with more than {x} {column}<h2>'))
    display(df.loc[df[column] > x, ['title', 'published_date', 'read_time', 'tags', 'views', 'reads']])

NameError: name 'df' is not defined

This now gives us a `dropdown` for the `column` selection and still an `int` slider for `x`, but with limits. This can be useful when we need to enforce certains constraints on the interaction.

# Image Explorer

Let's see another quick example of creating an interactive function. This one allows us to display images from a folder.

In [9]:
fdir = 'nature/'

@interact
def show_images(file=os.listdir(fdir)):
    display(Image(fdir+file))

FileNotFoundError: [Errno 2] No such file or directory: 'nature/'

You could use this for example if you have a training set of images that you'd quickly like to run through.

# File Browser

We can do a similar operation to create a very basic file browser. Instead of having to manually run the command every time, we just can use this function to look through our files.

In [10]:
!ls -a -t -r -l

total 6152
-rwxrwxrwx  1 win7 win7       1 Apr  9  2021 __init__.py
-rwxrwxrwx  1 win7 win7    2751 May  6 23:12 test.py
-rwxrwxrwx  1 win7 win7    1344 May 18 01:20 example2.py
-rwxrwxrwx  1 win7 win7   10027 May 20 17:22 selector.py
-rwxrwxrwx  1 win7 win7       0 May 29 17:06 obs.txt
drwxr-xr-x  2 win7 win7    4096 Sep 14 07:55 test
drwxr-xr-x  3 win7 win7    4096 Sep 14 07:55 utils
drwxr-xr-x  3 win7 win7    4096 Sep 14 07:55 parallel_mp_optimizers
drwxr-xr-x  3 win7 win7    4096 Sep 14 07:55 datasets
drwxr-xr-x  3 win7 win7    4096 Sep 14 07:55 serial_optimizers
drwxr-xr-x  3 win7 win7    4096 Sep 14 07:55 parallel_mpi_optimizers
-rwxrwxrwx  1 win7 win7    8412 Oct  9 09:44 README.md
-rwxrwxrwx  1 win7 win7   18887 Oct  9 12:27 optimizer.py
drwxr-xr-x  2 win7 win7    4096 Oct  9 12:29 __pycache__
drwxr-xr-x  3 win7 win7    4096 Oct  9 12:30 results
-rwxrwxrwx  1 win7 win7    4050 Oct  9 14:42 example.py
drwxr-xr-x 11 win7 win7    4096 Oct 11 14:07 ..
-rwxrwxrwx  

In [11]:
import subprocess
import pprint

root_dir = '../'
dirs = [d for d in os.listdir(root_dir) if not '.' in d]

@interact
def show_dir(dir=dirs):
    x = subprocess.check_output(f"cd {root_dir}{dir} && ls -a -t -r -l -h", shell=True).decode()
    print(x)

interactive(children=(Dropdown(description='dir', options=('additive_models', 'statistics', 'recall_precision'…

# Dataframe Explorer

Let's look at a few more examples of using widgets to explore data. Here we create a widget that quickly lets us find correlations between columns.

In [12]:
@interact
def correlations(column1=list(df.select_dtypes('number').columns), 
                 column2=list(df.select_dtypes('number').columns)):
    print(f"Correlation: {df[column1].corr(df[column2])}")

interactive(children=(Dropdown(description='column1', options=('claps', 'days_since_publication', 'fans', 'num…

Here's one to describe a specific column.

In [13]:
@interact
def describe(column=list(df.columns)):
    print(df[column].describe())

interactive(children=(Dropdown(description='column', options=('claps', 'days_since_publication', 'fans', 'link…

# Interactive Widgets for Plots

We can use the same basic approach to create interactive widgets for plots. This expands the capabilities of the already powerful plotly visualization library.

In [14]:
@interact
def scatter_plot(x=list(df.select_dtypes('number').columns), 
                 y=list(df.select_dtypes('number').columns)[1:]):
    df.iplot(kind='scatter', x=x, y=y, mode='markers', 
             xTitle=x.title(), yTitle=y.title(), title=f'{y.title()} vs {x.title()}')

interactive(children=(Dropdown(description='x', options=('claps', 'days_since_publication', 'fans', 'num_respo…

Let's add some options to control the column scheme.

In [15]:
@interact
def scatter_plot(x=list(df.select_dtypes('number').columns), 
                 y=list(df.select_dtypes('number').columns)[1:],
                 theme=list(cf.themes.THEMES.keys()), 
                 colorscale=list(cf.colors._scales_names.keys())):
    
    df.iplot(kind='scatter', x=x, y=y, mode='markers', 
             xTitle=x.title(), yTitle=y.title(), 
             text='title',
             title=f'{y.title()} vs {x.title()}',
            theme=theme, colorscale=colorscale)

interactive(children=(Dropdown(description='x', options=('claps', 'days_since_publication', 'fans', 'num_respo…

The next plot lets us choose the grouping category for the plot. 

In [16]:
df['binned_read_time'] = pd.cut(df['read_time'], bins=range(0, 56, 5))
df['binned_read_time'] = df['binned_read_time'].astype(str)

df['binned_word_count'] = pd.cut(df['word_count'], bins=range(0, 100001, 1000))
df['binned_word_count'] = df['binned_word_count'].astype(str)

@interact
def scatter_plot(x=list(df.select_dtypes('number').columns), 
                 y=list(df.select_dtypes('number').columns)[1:],
                 categories=['binned_read_time', 'binned_word_count', 'publication', 'type'],
                 theme=list(cf.themes.THEMES.keys()), 
                 colorscale=list(cf.colors._scales_names.keys())):
    
    df.iplot(kind='scatter', x=x, y=y, mode='markers', 
             categories=categories, 
             xTitle=x.title(), yTitle=y.title(), 
             text='title',
             title=f'{y.title()} vs {x.title()}',
             theme=theme, colorscale=colorscale)

interactive(children=(Dropdown(description='x', options=('claps', 'days_since_publication', 'fans', 'num_respo…

You may have noticed this plot was a little slow to update. When that is the case, we can use `interact_manual` which only updates the function when the button is pressed.

In [17]:
from ipywidgets import interact_manual

In [18]:
@interact_manual
def scatter_plot(x=list(df.select_dtypes('number').columns), 
                 y=list(df.select_dtypes('number').columns)[1:],
                 categories=['binned_read_time', 'binned_word_count', 'publication', 'type'],
                 theme=list(cf.themes.THEMES.keys()), 
                 colorscale=list(cf.colors._scales_names.keys())):
    
    df.iplot(kind='scatter', x=x, y=y, mode='markers', 
             categories=categories, 
             xTitle=x.title(), yTitle=y.title(), 
             text='title',
             title=f'{y.title()} vs {x.title()}',
             theme=theme, colorscale=colorscale)

interactive(children=(Dropdown(description='x', options=('claps', 'days_since_publication', 'fans', 'num_respo…

# Making Our Own Widgets

The decorator `interact` (or `interact_manual`) is not the only way to use widgets. We can also explicity create our own. One of the most useful I've found is the `DataPicker`.

In [19]:
df.set_index('published_date', inplace=True)

In [20]:
def print_articles_published(start_date, end_date):
    start_date = pd.Timestamp(start_date)
    end_date = pd.Timestamp(end_date)
    stat_df = df.loc[(df.index >= start_date) & (df.index <= end_date)].copy()
    total_words = stat_df['word_count'].sum()
    total_read_time = stat_df['read_time'].sum()
    num_articles = len(stat_df)
    print(f'You published {num_articles} articles between {start_date.date()} and {end_date.date()}.')
    print(f'These articles totalled {total_words:,} words and {total_read_time/60:.2f} hours to read.')
    
_ = interact(print_articles_published,
             start_date=widgets.DatePicker(value=pd.to_datetime('2018-01-01')),
             end_date=widgets.DatePicker(value=pd.to_datetime('2019-01-01')))

interactive(children=(DatePicker(value=Timestamp('2018-01-01 00:00:00'), description='start_date'), DatePicker…

For this function, we use a `Dropdown` and a `DatePicker` to plot one column cumulatively up to a certain time. Instead of having to write this ourselves, we can just let `ipywidgets` do all the work!

In [21]:
def plot_up_to(column, date):
    date = pd.Timestamp(date)
    plot_df = df.loc[df.index <= date].copy()
    plot_df[column].cumsum().iplot(mode='markers+lines', 
                                   xTitle='published date',
                                   yTitle=column, 
                                  title=f'Cumulative {column.title()} Until {date.date()}')
    
_ = interact(plot_up_to, column=widgets.Dropdown(options=list(df.select_dtypes('number').columns)), 
             date = widgets.DatePicker(value=pd.to_datetime('2019-01-01')))

interactive(children=(Dropdown(description='column', options=('claps', 'days_since_publication', 'fans', 'num_…

# Dependent Widgets

How do we get a value of a widget to depend on that of another? Using the `observe` method.

Going back to the Image Browser earlier, let's make a function that allows us to change the directory for the images to list.

In [22]:
directory = widgets.Dropdown(options=['images', 'nature', 'assorted'])
images = widgets.Dropdown(options=os.listdir(directory.value))

def update_images(*args):
    images.options = os.listdir(directory.value)

directory.observe(update_images, 'value')

def show_images(fdir, file):
    display(Image(f'{fdir}/{file}'))

_ = interact(show_images, fdir=directory, file=images)

interactive(children=(Dropdown(description='fdir', options=('images', 'nature', 'assorted'), value='images'), …

We can also assign to the `interact` call and then reuse the widget. This has unintended affects though! 

In [23]:
def show_stats_by_tag(tag):
    display(df.groupby(f'<tag>{tag}').describe()[['views', 'reads', 'claps', 'read_ratio']])
    
stats = interact(show_stats_by_tag,
                tag=widgets.Dropdown(options=['Towards Data Science', 'Education', 'Machine Learning', 'Python', 'Data Science']))

interactive(children=(Dropdown(description='tag', options=('Towards Data Science', 'Education', 'Machine Learn…

In [24]:
stats.widget

interactive(children=(Dropdown(description='tag', options=('Towards Data Science', 'Education', 'Machine Learn…

Now changing the value in one location changes it in both places! This can be a slight inconvenience, but on the plus side, now we can reuse the interactive element.

# Linked Values

We can link the value of two widgets to each other using the `jslink` function. This ties the values to be the same.

In [25]:
def show_less_than(column1_value, column2_value):
    display(df.loc[(df['views'] < column1_value) & 
                    (df['reads'] < column2_value), 
                   ['title', 'read_time', 'tags', 'views', 'reads']])
        
column1_value=widgets.IntText(value=100, label='First')
column2_value=widgets.IntSlider(value=100, label='Second')

linked = widgets.jslink((column1_value, 'value'),
                        (column2_value, 'value'))

less_than = interact(show_less_than, column1_value=column1_value,
                 column2_value=column2_value)

interactive(children=(IntText(value=100, description='column1_value'), IntSlider(value=100, description='colum…

I'm not exactly sure why you would want to link two widgets, but there you go! We can unlink them using the `unlink` command (sometimes syntax does make sense).

In [26]:
linked.unlink()

In [27]:
less_than.widget

interactive(children=(IntText(value=100, description='column1_value'), IntSlider(value=100, description='colum…

# Conclusions

These widgets are not going to change your life, but they do make notebooks closer to interactive dashboards. I've only shown you some of the capabilities so be sure to look at the [documentation for the full details]. The Jupyter Notebook is useful by itself, but with additional tools, it can be an even better data exploration and analysis technology. Thanks to the efforts of many developers and contributors to open-source, we have these great technologies, so we might as well get the most from these libraries! 

In [28]:
cscales = ['Greys', 'YlGnBu', 'Greens', 'YlOrRd', 'Bluered', 'RdBu',
            'Reds', 'Blues', 'Picnic', 'Rainbow', 'Portland', 'Jet',
            'Hot', 'Blackbody', 'Earth', 'Electric', 'Viridis', 'Cividis']

In [29]:
import plotly.figure_factory as ff

corrs = df.corr()

@interact_manual
def plot_corrs(colorscale=cscales):
    figure = ff.create_annotated_heatmap(z = corrs.round(2).values, 
                                     x =list(corrs.columns), 
                                     y=list(corrs.index), 
                                     colorscale=colorscale,
                                     annotation_text=corrs.round(2).values)
    iplot(figure)

interactive(children=(Dropdown(description='colorscale', options=('Greys', 'YlGnBu', 'Greens', 'YlOrRd', 'Blue…

In [30]:
@interact
def plot_spread(column1=['claps', 'views', 'read', 'word_count'], 
                 column2=['views', 'claps', 'read', 'word_count']):
    df.iplot(kind='ratio',
                                                   y=column1,
                                                   secondary_y=column2,
                                                   title=f'{column1.title()} and {column2.title()} Spread Plot',
                                 xTitle='Published Date')

interactive(children=(Dropdown(description='column1', options=('claps', 'views', 'read', 'word_count'), value=…