# Who I am
My name is Wolf.  I'm a long-time software engineer.  I'm curious.  I'm passionate about the things I find useful.  In my day-job, I write software that helps teach kids to read.  In my spare time I use my favorite tools like Vim, tmux, Jupyter, sqlite, and lots of other things I really like.  I don't use Jupyter Notebooks in my professional career, but they're fun to play with.  I don't know a lot, but I can show you a few interesting things to get you excited about them.

I'm running this on my M1 Mac mini using Orion as my browser.  You can run Jupyter on any platform.

# What this presentation is
It's a demonstration of the big features and issues with [Jupyter](https://jupyter.org/).  It's enough to get you started and investigating on your own.  There's a lot to talk about so I won't be able to dive deep on any one thing.  Feel free to ask questions as I speak.

I will give this notebook and an HTML version of it to Craig so he can put it on the MUG website.  And I'll create a GitHub project with this presentation, including all outputs.

# What this presentation is **not**
This is not a data-science lecture.  I'm a software engineer, not a data-scientist.  As such, I don't know _all_ the ins and outs of the data-science features of Jupyter.  I'm more student than expert, but I can get you moving on your notebook adventure.  This isn't a programming how-to.  This isn't SQL training.  Etc.,  I'll touch on these subjects so you know where to look for more.

# What I'm going to talk about
* What a Jupyter notebook _is_
* Cells are code, Markdown, or raw
    * Code cells can be executed
    * Magics
* `pandas` and `matplotlib`
* `nbconvert`
* Building GUIs with `ipywidgets`
* `jupytext`
* `nbviewer`
* SQL
* `nbdev`
* How to install and run Jupyter

# What _is_ a Jupyter Notebook?
A "notebook" interface is an interactive, live-programming, cell-based, interface to computation.

Think about Mathematica back in the day.  Code, data, tests, and documentation are presented in hunks called "cells" that can be individually executed, reordered, and typed (code, markdown, or just raw, uninterpreted data).

## Why should I care
Jupyter makes it easy to _play_ with your code in an interactive environment.  Run just small pieces of it.  Make a change, run it again.  Use the source-level debugger.  Plot output with graphing packages, and add UI with ready-made widgets.

If you're working with data, the same applies to _it_.  Play with your data.  Run some computations on it, or on just a fraction of it.  Change some parameters.  Run it again.  Rich interaction with your data and ready access to a data-scientist's toolkit for analysis.

But most of all, Jupyter is just plain _fun_.  It will bring back a little of the joy of programming to your work.

## What is the difference between Jupyter Lab and Jupyter Notebooks?
"Classic" Jupyter Notebooks is the original notebook interface.  This is slowly being replaced by the more capable and complete Jupyter _Lab_.  Jupyter Lab has tabs, debugging, shows multiple documents and more.  Development continues on Jupyter Lab.  We still call the thing we see within Jupyter Lab "a notebook".  It still has the same format and extension it has always had: json and `".ipynb"`, respectively.

I'm using Jupyter Lab in this presentation.

## Where did they come from?  What is their history?
Jupyter was inspired by Sage, which was inspired by Mathematica.  Jupyter is built on top of IPython, but has grown to be so much more.  Jupyter began its life around 2014.

## The relationship between Jupyter and IPython
Jupyter is built on IPython, and when you're using Python as your language, IPython is your kernel.  Special IPython settings that you make in startup files apply to your notebook as well.  IPython is the terminal interface.  Jupyter is the GUI interface.

## What are you looking at right now
Jupyter is a specific implementation of the notebook concept that runs as a client-server pair where the notebook _interface_ is the client, and the language _kernel_ is the server.  There are many different kernels.  The most popular is Python, but your favorite language is almost certainly available.  In Mathematica days, there was only the Mathematica language.  Now you can have anything.

What you're looking at now is a notebook using a Python 3 kernel.  The kernel is shown in the upper right-hand corner of the notebook.  Or else you're looking at a slideshow I generated from a notebook and we'll go right back to the notebook in a minute.

## Before I start...
I'll use Shift-Command-C to activate the Command Palette and clear all the outputs.

### Cells

This is a cell.  Each gray box you see below is a separate cell.  This cell is specifically a Markdown cell.  If I "execute" it, it renders the Markdown as prose.  There are also code cells and raw cells.

Using a particular kernel implies that code cells are to be interpreted as _that_ language.  Each notebook uses exactly one kernel.  So not including _magic_, the entire notebook is in that one language.  

Markdown is a lightweight markup langage.  Here are some examples of how you markup text and what the rendered result looks like.  **Bold** insided double asterisks.  ~Strikethrough~ between tildes.  _Italic_ between underlines.  There are special markers for `code`, and blocks of code, and quotes, and links.

You can tell what's in a cell by the drop-down menu in the toolbar.  Also, code-cells have square brackets next to them that get filled in with the execution order.  Markdown cells do not.

You can execute a cell using the 'play' button above.  I will be hitting Shift-Enter, a short-cut for execute and move to the next cell.

In [None]:
2 + 2  # code-cells do calculations

In [None]:
# Note that when cells have a result, that result is printed in an
# _output cell_.  The result of this cell is just the expression
# that ends the cell

"Hello, World!"

In [None]:
# Here's a code cell containing Python
#  That makes sense since this notebook is using a Python kernel
#  Python is the native language of Jupyter notebooks

person: str = "Wolf"
print(f"Hello, {person}!  This output is not the _result_ of the cell, so it is printed outside of any cell.")

# The two cells above, 2 + 2 and "Hello, World!" worked because
# they were valid Python

#### Another Markdown cell
Note the syntax-highlighting in the Python cell.  Let's try auto-complete.  I'll type a period after the `person` variable, then hit tab to engage auto-complete.  This requires that some cell **defining** `person` has been run.

In [None]:
# person.lower().count('w')
person

# Note that `person` was defined above in completely separate cell.
# It's a string, so Jupyter knows to auto-complete with the methods
# of string.

#### Back to a cell again.  This is yet another Markdown cell.
Note that `stdout` (and also `stderr`) from the code cell above appears between the cells, not **in** them.  Results appear in result cells.  Output appears between cells.

In [None]:
# What version of Python are we running?

import sys
print(sys.version)

# again, note that printed output is not, itself, in a cell

### Magics
`'%'` or `'%%'` in a code cell means _magic_.  A magic command means something special to Jupyter.  There are many magics.  A double-% means it's a "cell-magic", that is, it applies to the entire cell.  A single-% is a "line-magic".  Here are some cell-magics that let you use alternative languages (with many limitations).

In [None]:
%%javascript

// before running: open the console; make it big;
// and show only "Logs"

console.log("Look!  I can mix-in JavaScript!")

In [None]:
%%bash
MY_VARIABLE='Bash'
echo "Here is some ${MY_VARIABLE}!"

In [None]:
%%bash

# Note a limitation of language magics is that each
# cell stands alone.  If we were using a Bash kernel
# this would not be a problem.

echo ${MY_VARIABLE}

In [None]:
%%perl
map { print "$_\n" } ( 1 .. 10 )

Since we're playing around, let's try to exactly reproduce the output of the Perl fragment with some Python.  First using a `for` loop.

In [None]:
for i in range(1, 11): print(i)

Next using `map`.

In [None]:
_ = list(map(print, range(1, 11)))

And finally with a list comprehension.

In [None]:
_ = [print(i) for i in range(1, 11)]

And let's ask Python how `map` works.

In [None]:
help(map)

Here is a [full list of magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html) but you can also see them with a bit of documentation by actually using the magics `%magic` and `%quickref`.  Note that magics are implemented by the underlying kernel, so they will be different for different kernels.

In [None]:
%quickref

# `pandas` and `matplotlib`
These are table-stakes for data-science professionals.

In [None]:
import pandas

# Let's make up some random data from scratch (deprecated).
# It's the easiest way to get some fake data.

# Dataframes and Series are the main datatypes of Pandas.
# Dataframes are 2-dimensional.  Series 1-dimensional.
# (If you need more dimensions, use `xarray`.)

dataframe = pandas.util.testing.makeDataFrame()
dataframe  # Note: printing the cell's _result_

# Note where stderr went...

In [None]:
# Show the first seven rows of the dataframe (default is five)

dataframe.head(n=7)

In [None]:
# Show all 30 rows, sorted by the values in column 'C'

with pandas.option_context('display.max_rows', 30):
    display(dataframe.sort_values(by='C'))

In [None]:
# Extract a single column

display(dataframe['B'])

In [None]:
# Get the sum of that column

sum(dataframe['B'])

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Of course I'm plotting nonsense here...

plt.rcParams['figure.figsize'] = [20, 8]  # Make the graph big

dataframe.plot.bar()

All kinds of other plots are available and there are additional charting packages as well.  Here is a scatter plot and a line plot.  Every kind of chart you could want is available through `matplotlib` or others.

In [None]:
plt.rcParams['figure.figsize'] = [20, 8]  # Make the graph big

plt.scatter(dataframe['B'], dataframe['D'])

In [None]:
plt.rcParams['figure.figsize'] = [20, 8]  # Make the graph big

# This is nonsense upon nonsense...

dataframe.plot(x='A', y='C')

I have more to say about Pandas (it's super powerful) below.  One thing I'm not going to show is that Pandas can reach out, read a web-page, find a data table on that web-page, and read the table into a dataframe.

# `nbconvert`
Note that `nb` in this context usually stands for "notebook".

Converting Notebooks into other things using `jupyter nbconvert <notebook> --to <format>`

## To slides

In [None]:
%%bash
# Show the metadata that directs slide creation

jupyter nbconvert Jupyter-Presentation.ipynb --to slides

Let's take a look at the result

## To HTML

In [None]:
%%bash
jupyter nbconvert Jupyter-Presentation.ipynb --to html

## To LaTeX
I could generate the LaTeX, but I had trouble actually running TeX on it to produce a PDF

## To PDF
Same here.  The backend of this is from notebook to LaTeX to PDF, so not surprising I couldn't get this to work.

The complete set of `nbconvert` output types is 'asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', and 'webpdf'.

# `ipywidgets`

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets

In [None]:
my_int_slider = widgets.IntSlider(
    value=7,
    min=0,
    max=10,
    step=1,
    description='Test:',
    disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
display(my_int_slider)

In [None]:
display(my_int_slider)

In [None]:
my_text_value = widgets.BoundedIntText(
    value=7,
    min=0,
    max=10,
    step=1,
    description='Text:',
    disabled=False
)
display(my_text_value)
link = widgets.jslink((my_int_slider, 'value'), (my_text_value, 'value'))

In [None]:
button = widgets.Button(
    description='Click me',
    disabled=False,
    button_style='warning', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click me',
)

def on_button_clicked(b):
    my_int_slider.__setattr__('value', 9)

button.on_click(on_button_clicked)
button

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import mpl_interactions.ipyplot as iplt

x = np.linspace(0, 2 * np.pi, 1000)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

def f(x, w):
    return np.sin(w * x)

controls = iplt.plot(x, f, w=(1, 10))


There are [many, many, widgets](https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html) which you can stack, arrange, connect, and use to build exactly the UI you want.

Some resources:
* [The ipywidgets documentation](https://ipywidgets.readthedocs.io/en/latest/)
* A [simple YouTube video](https://www.youtube.com/watch?v=wb6k_T4rKBQ) showing them in action
* A [deeper YouTube video](https://www.youtube.com/watch?v=f0WmLo8AVxo)
* A YouTube video about [Dashboarding with Python Widgets](https://www.youtube.com/watch?v=SDy7aBahFuQ)

# `jupytext`

## Some things are better in an IDE or external editor
* Look at this notebook in PyCharm
* Look at this notebook in VS Code

Some editors will directly open notebooks, but if that's not your favorite, or if you prefer a different mechanism you can use `jupytext`.  This is how you do it manually:

In [None]:
%%bash
jupytext --to md Jupyter-Presentation.ipynb
jupytext --to ipynb --update Jupyter-Presentation.md

But `jupytext` is automatic.  In the normal case, use the "Pair with Markdown" (for example) command, and then when you make a change in the Markdown version of the notebook, it's quickly reflected in the `ipynb` version and vice versa.  This is the main reason it's better than `nbconvert`.

Why would you use `jupytext`: some things are just better in your editor, diffing of text is easier than diffing of json, and instant automatic conversion is nifty.  Those are the prime reasons for me, anyway.

# `nbviewer`
Built-in to GitHub and other source repositories, but usable separately as well.  Here's an example on GitHub: [My `re` presentation](https://github.com/wolf/re-presentation/blob/main/%60re%60%20Presentation.ipynb) (open in Firefox to avoid scrollbar bug)

`nbviewer` provides a read-only (non-executing) view on to an existing notebook that has been saved with output cells so you can see everything.

See [`nbviewer`'s repo](https://github.com/jupyter/nbviewer).

# SQL

In [None]:
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://wolf:@localhost/usda')

# Yes, I'm bad.  I have no password on my local
# PostgreSQL database

In [None]:
%load_ext sql
%sql postgresql://wolf:@localhost/usda

In [None]:
%%sql

SELECT * FROM fd_group

-- note this is _just_ a result, not otherwise
-- saved

In [None]:
# This time, let's save it

fd_groups_simple = %sql SELECT * FROM fd_group
fd_groups_simple

In [None]:
type(fd_groups_simple)

## SQL in Pandas

In [None]:
# Let's make a new dataframe

fd_group = pandas.read_sql('SELECT * FROM fd_group', engine)
fd_group

In [None]:
# Let's make a new dataframe the easy way
# and time how long it takes to read those rows

%timeit pandas.read_sql_table('nut_data', engine)
nut_data = pandas.read_sql_table('nut_data', engine)
nut_data

In [None]:
type(nut_data)

# `nbdev`
* Exploratory programming
* Build full packages
* Publish them to PyPI, et al
* Create beautiful documentation (Quarto-based), automatically published
* Integrated tests
* Automatic CI (continuous integration -- running the tests and getting things published with every commit)

Helpful to think of `nbdev` as a _dialect_ of Python according to [this YouTube video](https://www.youtube.com/watch?v=l7zS8Ld4_iA).  I struggled with what to show you for `nbdev` as it's a complete literate programming _system_ built on top of Jupyter notebooks, but you have to agree with their choices.  It's very powerful, and very opinionated.

[Further details](https://nbdev.fast.ai) and a [tutorial here](https://nbdev.fast.ai/tutorial.html) and [YouTube video introduction here](https://www.youtube.com/watch?v=l7zS8Ld4_iA) (I preferred watching this video at 1.25x).

## Literate programming
Look at [an example of literate programming](https://github.com/fastai/execnb/blob/master/nbs/01_nbio.ipynb) that is part of `nbdev` itself.

## Creating books
With a little bit of configuration and appropriate use of metadata, you can turn your notebook into an actual book or article.

See also [Jupyter Book](https://jupyterbook.org/en/stable/intro.html) for an alternative _not_ using `nbdev`.  I actually prefer Jupyter Book over `nbdev`.  Jupyter Book is built on [Quarto](https://quarto.org) which can also make books.

Many output formats are available.  Most used among them is HTML for publishing directly on the web.  Most interesting to me is ePub.  Jupyter Book will publish directly to GitHub Pages, netlify or ReadTheDocs.

# How do you install and run Jupyter?

## Installing into a local virtual environment
For this demonstration, I used `pyenv` to build my virtual environments; but it would be similar if I used `venv` or `virtualenv`.  This is what I did:

In [None]:
%%bash
mkdir Notebooks
cd Notebooks

In [None]:
# ...some git stuff...

In [None]:
%%bash
# make a new globally available virtual environment
pyenv virtualenv 3.10.5 jupyter-notebooks
# activate this environment whenever you are _in_ it, in bash
pyenv local jupyter-notebooks

In [None]:
%%bash
# Now, from that directory with the virtual environment active

pip install --upgrade pip setuptools wheel jupyter \
    ipython[all] nbdev jupyterlab-git ipywidgets \
    jupytext ipython-sql psycopg2

# ...etc...

# pyscopg2 because I'm using PostgreSQL, you'll install
# whatever is appropriate for your database

## Installing globally (no demo)
Not recommended.  Do everything above _except_ the `pyenv` steps.  This will install Jupyter into your global Python.  But take my advice: in Python, you should **always** use a virtual environment, and not just for Jupyter; for **everything**.

## Installing with Anaconda (no demo)
Nothing to do.  Jupyter Notebooks install automatically as a part of Anaconda.  I do not have Anaconda installed on this machine, but it's the data-scientist's Python of choice because it comes with everything a data-scientist needs.  That's a _lot_ of stuff.  An Anaconda installation is large.

## Using through Docker (no demo)
You just need the right image.  Here's a [tutorial on installing Docker and running Jupyter](https://towardsdatascience.com/how-to-run-jupyter-notebook-on-docker-7c9748ed209f).  At this moment I do not have Docker installed on this machine because when I **did** have it installed, it was eating up too much CPU while doing nothing.  So I uninstalled it.

## Installing additional kernels for other languages

In [None]:
%%bash
pip install bash_kernel
python -m bash_kernel.install

Each language has its own scheme for installation, but in general, they are all this easy (except, in my experience, for Perl).  Note that you do **not** need kernels installed for languages you invoke through "magics", e.g., `%%bash`, `%%javascript`, `%%perl`.  These work because you have e.g., `bash`, `node`, and `perl`, respectively, installed natively on your machine.

## Running Jupyter
Jupyter has two parts, the client (running in your web browser) and the server (running in your terminal).  I like to run Jupyter in a `tmux` session, because hell, I like to do everything in a `tmux` session.  Given the very first setup I described using `pyenv` virtual environments, here is what I do:
* `cd` into the directory I made that will contain the notebook, and is set to automatically start up the virtual environment
* `tmux new-session -s jupyter`
* I typically split my `tmux` session into two panes so I can run the server in one, and do `git` commands in the other
* In one of the panes: `jupyter-lab`, this automatically opens the lab in a new window of my default browser, or a new tab of the last window I touched

# What did I talk about / questions / more live demos --- what would you like to see?
* What a Jupyter notebook _is_
* Cells are code, Markdown, or raw
    * Code cells can be executed
    * Magics
* `pandas` and `matplotlib`
* `nbconvert`
* Building GUIs with `ipywidgets`
* `jupytext`
* `nbviewer`
* SQL
* `nbdev`
* How to install and run Jupyter