In [None]:
%%javascript
// Run this cell to hide current output, but reserve space for it.
// Re-running those cells will cause the output to appear.

$('.output_area').css('opacity', 0)

In [None]:
import grader

# Teaching from Jupyter Notebooks

&nbsp;

## Christian Moscardi and Robert Schroll

&nbsp;

### The Data Incubator

## About Us

Teach a variety of classes
- Fellowship
- Foundations
- Corporate Training
- Workshops

Different Content, Different Timescales, Same Platform

## Outline

- Notebooks for Teaching
- Notebooks for Exercises
- Deploying Notebooks
- The Future

## Outline

- **Notebooks for Teaching**
- Notebooks for Exercises
- Deploying Notebooks
- The Future

## Many Languages; One Interface

- IPython for Python
- Toree for Spark/Scala
  - http://blog.thedataincubator.com/2017/04/spark-2-0-on-jupyter-with-toree
- R
  - https://irkernel.github.io/

- Web stack: HTML/JS/CSS

## Existing IPython Magics

In [None]:
%%html

<h2>Hello World!</h2>

In [None]:
%%javascript

alert("Hello World!")

## Magics Run in Notebook Context

In [None]:
%%html
<style id='annoying'>
* {
    font-family: "Comic Sans MS";
    background: url('https://media.giphy.com/media/87tkMovdHMRk4/giphy.gif') black;
    
    animation: 3s infinite linear hue;
}

@keyframes hue {
    0% { color: hsl(0, 100%, 50%); }
    16% { color: hsl(60, 100%, 50%); }
    33% { color: hsl(120, 100%, 50%); }
    50% { color: hsl(180, 100%, 50%); }
    67% { color: hsl(240, 100%, 50%); }
    84% { color: hsl(300, 100%, 50%); }
    100% { color: hsl(360, 100%, 50%); }
}

</style>
<button onclick="$('#annoying').remove()">Please Stop!</button>

## A Pure JS Solution
https://github.com/cmoscardi/embedded_d3_example

Doesn't help students understand how to build similar things in a stand-alone manner.

## `<iframe>` for Isolation

In [None]:
from IPython.display import IFrame, display_html

IFrame('http://jupyter.org', '100%', 400)

## Data URLs

iframes take URLs, not raw HTML

We could try to spin up a webserver to display the content, but...

Data URLs encode content directly in the URL

In [None]:
%%html
<img src="data:image/png;base64,iVBORw0KGgoAAA
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" width="100" height="100" alt="Red dot" />

## Data URLs and iframes

In [None]:
import base64

IFrame('data:text/html;base64,' + base64.b64encode("""
<html>
    <body bgcolor='#eee'>
        <h1>Hello World!</h1>
    </body>
</html>
"""), "100%", 100)

## Introducing the ihtml package

Wraps iframe + Data URLs in a magic syntax

In [None]:
import ihtml

In [None]:
%%ihtml
<html>
    <body bgcolor='#eee'>
        <h1>Hello World!</h1>
    </body>
</html>

## Templating Inspired by Jinja

In [None]:
message = "Hello World!"

In [None]:
%%ihtml 200
<html>
    <body bgcolor='#eee'>
        <h1>{{ message }}</h1>
        <pre>var message = {{ message | json }}</pre>
    </body>
</html>

## Sub-documents for CSS and JS

In [None]:
%%cssdoc graybg
body {
    background: #eee;
}

In [None]:
%%jsdoc clicker
document.addEventListener("DOMContentLoaded", function (e) {
    document.querySelector("h1").addEventListener("click", function (ev) {
        var div = document.createElement("div");
        div.textContent = {{ message | json }};
        document.body.appendChild(div);
    })
})

## Referencing Sub-documents

In [None]:
%%ihtml
<html>
    <head>
        {{ graybg | cssdoc }}
        {{ clicker | jsdoc }}
    </head>
    <body>
        <h1>Click me!</h1>
    </body>
</html>

## ihtml

Not going to replace your web-development platform

Good for teaching from a single document

Available on Github: https://github.com/thedataincubator/ihtml

BSD license

## Magics Customize Kernel Behavior

You can make the notebooks behave how you need them to

For example, **exception handling**

In [None]:
def cause_problems():
    return 1/0

cause_problems()

That's great for interactive use, but
- We want to test
- This halts the execution of multiple cells

We used to use `try`/`except` blocks

In [None]:
try:
    cause_problems()
except ZeroDivisionError:
    print "Problems were caused!"

But students don't learn to read tracebacks

We can include the traceback in Markdown...
```
>>> cause_problems()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in cause_problems
ZeroDivisionError: integer division or modulo by zero
```
...at the cost of interactivity

## IPython Magic for Custom Execution

Our first attempt was a magic at the syntax level

It would rewrite a cell like

In [None]:
%%exception_magic

cause_problems()

to wrap it in a `try`/`except` block

In [None]:
try:
    cause_problems()
except Exception as e:
    from IPython.core import ultratb
    ultratb.AutoFormattedTB(mode='Context')()

In [None]:
try:
    cause_problems()
except Exception as e:
    from IPython.core import ultratb
    ultratb.AutoFormattedTB(mode='Context')()

But the traceback contains the altered source

## Magics can Interact with Python Shell

Magics classes get a reference to current shell in the `shell` attribute

The shell can have custom exception handling in `set_custom_exc`

We build a custom handler to print out exception, but continue execution

## ExpectException

In [None]:
import expectexception

In [None]:
%%expect_exception ZeroDivisionError

1 / 0

## `Exception` is default type

In [None]:
%%expect_exception

1 / 0

## `ExceptionExpected`

In [None]:
%%expect_exception ZeroDivisionError

0 / 1

## `ignore_exception` Magic

In [None]:
%%ignore_exception AssertionError

import random
assert random.randint(0, 1)

## Debugger can Inspect Tracebacks

In [None]:
%debug

## ExpectException

Available on Github: https://github.com/thedataincubator/expectexception

BSD license

## Outline

- **Notebooks for Teaching**
- Notebooks for Exercises
- Deploying Notebooks
- The Future

## Outline

- Notebooks for Teaching
- **Notebooks for Exercises**
- Deploying Notebooks
- The Future

## Miniprojects

We believe in *learning-by-doing*

Most of students' time is spent on "miniprojects":  
Sample analysis of real data

Worried about results, not method

## Each in a Notebook

Mix description, helper code, outline of solution in same document

Start with fill-in-the-blank questions

In [None]:
star_sum = defaultdict(int)
count = defaultdict(int)

for row, stars in zip(data, star_ratings):
    # increment the running sum in star_sum
    # increment the running count in count

Move to less-directed questions

> Build a custom transformer that flattens the attributes dictionary. Place this in a pipeline with a DictVectorizer and a regressor.
> 
> You may find it difficult to find a single regressor that does well enough. A common solution is to use a linear model to fit the linear part of some data, and use a non-linear model to fit the residual that the linear model can't fit. Build a residual estimator that takes as an argument two other estimators. It should use the first to fit the raw data and the second to fit the residuals of the first.

## Checkpoints with `assert`

Allow students to check their progress towards an answer

In [None]:
assert len(avg_stars) == 167

**But:** Students will solve problems in ways that invalidate your assumptions!

## Automated Grader

Instant feedback for students, based on the results of their analysis

Students write a function that returns the answer

This function is passed to the `grader.score` function

## Fixed Answers

Some questions ask for a single fixed answer

> Return a list of the first five prime numbers.

In [None]:
def give_primes():
    return [2, 3, 5, 7, 11]

grader.score("assignment1__five_primes", give_primes)

Used for most data processing questions

Grades based on overlap (possibly fuzzy) of student's answer and reference solution

## Dynamic Answers

Other questions require a function that processes input to calculate a response

> Build a function that takes a list of numbers and returns a list of their squares.

In [None]:
def square(x):
    return [i**2 for i in x]

grader.score("assignment1__square", square)

Used for most machine learning questions

Grades based on some machine learning metric, normalized by reference solution score on same metric

## Architectural Diagram

## Templating System

Want solution version of notebooks to hand out after course

Want to know that partial code in prompt notebook is correct

$\Rightarrow$ Generate both from same document

Use simple templating commands in code cells

In [None]:
l = [1, 2, 3]  #REMOVERHS
for i in l:
    #INSERT print ...
    #REMOVE{
    print i + 1
    #REMOVE}
print "Done" #REMOVE

## Generating Prompt Notebook

In [None]:
l = [1, 2, 3]  #REMOVERHS
for i in l:
    #INSERT print ...
    #REMOVE{
    print i + 1
    #REMOVE}
print "Done" #REMOVE

Preprocessor script removes some lines, inserts others, to yield:

In [None]:
l = ...
for i in l:
    print ...

## Generating Solution Notebook

In [None]:
l = [1, 2, 3]  #REMOVERHS
for i in l:
    #INSERT print ...
    #REMOVE{
    print i + 1
    #REMOVE}
print "Done" #REMOVE

Preprocessor script removes comments, leaving functional code the same:

In [None]:
l = [1, 2, 3]
for i in l:
    print i + 1
print "Done"

## In-Notebook Exercises

Miniprojects aren't appropriate to every type of class

Shorter trainings, workshops don't have time for students to get into miniprojects

Instead, integrate exercises in between lecture topics  
$\Rightarrow$ Short bits of coding to test understanding

## Outline

- Notebooks for Teaching
- **Notebooks for Exercises**
- Deploying Notebooks
- The Future

## Outline

- Notebooks for Teaching
- Notebooks for Exercises
- **Deploying Notebooks**
- The Future

## Using git for SCM

Get all sorts of metadata and output in source files.
We don't want students to see that - we also don't want to mess up our git history whenever output changes.


```json
...
...
...
  {
   "cell_type": "code",
   "execution_count": 4, // NOPE
   "metadata": {
    "collapsed": false
   },
"outputs": [
    {
     "data": {
      "application/javascript": [
       "\n",
       "           window.headwayVsRidership={\"day\":{\"0\":\"Monday\",\"1\":\"Tuesday\",\"2\":\"Wednes
       ...
       ...
       ...
```

** Only content should be included in Git, not outputs**. \*

\*This is our opinionated stance.

## Cleaning Scripts as git Hooks

Solution - a really simple git pre-commit hook.

https://gist.github.com/cmoscardi/75a4cf2362c49deb36cdd6b991c25786


```python
#!/usr/bin/env ipython
import io
import sys

from nbformat import read, write

# Handle either stdin or a filename
if __name__ == '__main__':
  for filename in sys.argv[1:]:
    ipynb = read(filename, 4)

    ipynb.metadata.signature = ''

    for cell in ipynb.cells:
      if "outputs" in cell:
        cell["outputs"] = []
      if "execution_count" in cell:
        cell["execution_count"] = None

    with io.open(filename, mode="w", encoding='utf-8') as fh:
      write(ipynb, fh)
```

## Smoke Testing

Make sure all cells run

```
jupyter nbconvert --to notebook --execute\
--ExecutePreprocessor.timeout=60\
--output my_notebook_executed.ipynb my_notebook.ipynb
```

http://www.christianmoscardi.com/blog/2016/01/20/jupyter-testing.html

As long as you have the appropriate kernel installed, this Just Works (even in parallel!)

## Link Testing

External sites go down all the time

Internal links $\Rightarrow$ Did we deploy the right resources?

## Efficiency vs. Completeness

On PRs, test only what changed  
Unless a dependency changed; then test everything


![Testing](Testing Pipeline.svg)

## Now, Parallelize that

Use `nose-parallel` [link](http://nose.readthedocs.io/en/latest/doc_tests/test_multiprocess/multiprocess.html)

But we dynamically add our test cases based on what notebooks get passed in as command-line arguments.

`nose-parallel` doesn't like this... so we have to hack some internals.

```python
  @classmethod
  def add_func(cls, ipynb):
    
    def func(self):
      self.check_ipynb(ipynb) # this is the nbconvert code

    _, nbname = os.path.split(ipynb)
    func.__name__ = 'test_{}_{}'.format(prefix, nbname.split('.')[0])
    func.__doc__ = 'Test {}'.format(nbname)
    setattr(cls, func.__name__, func)

  def add_tests(self):
    for ipynb in self.ipynbs:
      self.add_func(ipynb)
```




We also parallelize our remote link checking with [requests-futures](https://github.com/ross/requests-futures) (and some retry logic).


## Deployment

Individual Digital Ocean boxes for each student

Image contains all software and libraries

Curriculum loaded from S3

Autolaunch Jupyter server; use IPtables to put on port 443

## Outline

- Notebooks for Teaching
- Notebooks for Exercises
- **Deploying Notebooks**
- The Future

## Outline

- Notebooks for Teaching
- Notebooks for Exercises
- Deploying Notebooks
- **The Future**

## The Future

MOOCs $\Rightarrow$ This won't scale

JupyterHub + Kubernetes (this was our alpha test!)

Goal: All trainings, Foundations on JupyterHub

https://zero-to-jupyterhub.readthedocs.io/en/latest/ is AWESOME

## Conclusion

Links

Emails

Link to Presentation

&c