# Making Jupyter Notebooks Work for You
> A short worksheet

In this repo, we briefly explore 2 of the 3 topics presented on helping Jupyter notebooks work for you.

# Software development practice
## Example 1: Find the error
Although we wouldn't normally take this approach in practice, our objective here is to perform normalization on some of the colums of our dataframe.  This is for demonstrative purposes only.  Our approach might be to test the general steps on a single column and then try this on multiple columns.  Let's see how that works.

### Normalize a single column

In [1]:
import pandas as pd

In [2]:
#read data and drop rows with NA
data = pd.read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv')
data = data.dropna().reset_index(drop=True)
data

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007
...,...,...,...,...,...,...,...,...
328,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009
329,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009
330,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009
331,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009


In [3]:
#get the column mean and std
col_mean = data['bill_length_mm'].mean()
col_std = data['bill_length_mm'].std()

In [4]:
data['bill_length_normed'] = (data['bill_length_mm'] - col_mean)/col_std
data.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_length_normed
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,-0.894695
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,-0.821552
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,-0.675264
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,-1.333559
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007,-0.858123


### Generalize to other columns

In [5]:
def normalize_cols(input_data, col_name):
    
    new_col_name = col_name + '_normed'
    
    input_data[new_col_name] = (input_data[col_name] - col_mean)/col_std
    
    return input_data

In [6]:
out_data = normalize_cols(data, 'flipper_length_mm')
out_data

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_length_normed,flipper_length_mm_normed
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,-0.894695,25.053121
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,-0.821552,25.967420
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,-0.675264,27.613159
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,-1.333559,27.247439
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007,-0.858123,26.698859
...,...,...,...,...,...,...,...,...,...,...
328,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009,2.159064,29.807477
329,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009,-0.090112,28.893178
330,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009,1.025333,27.247439
331,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009,1.244765,30.356057


## On your own
1.  Identify potential problems with this setup.
2.  How do you fix this?

## Problems
### 1.  Usage of defined variables in the kernel that are not passed in.

In [7]:
def normalize_cols(input_data, col_name):
    
    new_col_name = col_name + '_normed'
    col_mean = input_data[col_name].mean()
    col_std = input_data[col_name].std()
    
    input_data[new_col_name] = (input_data[col_name] - col_mean)/col_std
    
    return input_data

In [8]:
out_data = normalize_cols(data, 'flipper_length_mm')
out_data

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_length_normed,flipper_length_mm_normed
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,-0.894695,-1.424608
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,-0.821552,-1.067867
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,-0.675264,-0.425733
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,-1.333559,-0.568429
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007,-0.858123,-0.782474
...,...,...,...,...,...,...,...,...,...,...
328,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009,2.159064,0.430446
329,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009,-0.090112,0.073705
330,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009,1.025333,-0.568429
331,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009,1.244765,0.644491


### 2.  Actually, we're modifying the underlying dataframe.
Could or could not be problematic.

In [9]:
out_data is data

True

In [10]:
id(out_data)

2718348294280

In [11]:
id(data)

2718348294280

### Approach 1
Copy the dataframe in the function and return a modified copy.

In [12]:
def normalize_cols(input_data, col_name):
    
    new_data = input_data.copy()
    
    new_col_name = col_name + '_normed'
    col_mean = new_data[col_name].mean()
    col_std = new_data[col_name].std()
    
    new_data[new_col_name] = (new_data[col_name] - col_mean)/col_std
    
    return new_data

In [13]:
out_data = normalize_cols(data, 'flipper_length_mm')
out_data

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_length_normed,flipper_length_mm_normed
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,-0.894695,-1.424608
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,-0.821552,-1.067867
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,-0.675264,-0.425733
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,-1.333559,-0.568429
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007,-0.858123,-0.782474
...,...,...,...,...,...,...,...,...,...,...
328,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009,2.159064,0.430446
329,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009,-0.090112,0.073705
330,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009,1.025333,-0.568429
331,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009,1.244765,0.644491


In [14]:
out_data is data

False

In [15]:
id(out_data)

2718385203912

In [16]:
id(data)

2718348294280

### Approach 2
Just return the column of interest and do what you feel like with this new series.

In [17]:
data

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_length_normed,flipper_length_mm_normed
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,-0.894695,-1.424608
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,-0.821552,-1.067867
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,-0.675264,-0.425733
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,-1.333559,-0.568429
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007,-0.858123,-0.782474
...,...,...,...,...,...,...,...,...,...,...
328,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009,2.159064,0.430446
329,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009,-0.090112,0.073705
330,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009,1.025333,-0.568429
331,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009,1.244765,0.644491


In [18]:
def normalize_cols(input_data, col_name):
    
    new_col_name = col_name + '_normed'
    col_mean = input_data[col_name].mean()
    col_std = input_data[col_name].std()
    
    out_ser = (input_data[col_name] - col_mean)/col_std
    
    return out_ser

In [19]:
out_data = normalize_cols(data, 'flipper_length_mm')
out_data

0     -1.424608
1     -1.067867
2     -0.425733
3     -0.568429
4     -0.782474
         ...   
328    0.430446
329    0.073705
330   -0.568429
331    0.644491
332   -0.211688
Name: flipper_length_mm, Length: 333, dtype: float64

In [20]:
out_data is data['flipper_length_mm']

False

In [21]:
id(out_data)

2718385260040

In [22]:
id(data['flipper_length_mm'])

2718385259528

# nbdev practice
This is relatively involved and something we can't really try on Colab quite yet.  We'll do a brief walkthrough for interested parties at the end if time allows to determine what future steps can be taken.  Otherwise, see this nbdev tutorial for more information on how to get started: https://nbdev.fast.ai/tutorial.html

# xeus-python debugging kernel
You can learn all about the debugging kernel here: https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559 .  In the meantime, let's give the default binder environment a try to see how the debugger can help us!  Here's a direct link to the binder demo: https://mybinder.org/v2/gh/jupyterlab/debugger/stable?urlpath=/lab/tree/examples/index.ipynb