# Teaching Data Science Effectively

### Robert Schroll
### The Data Incubator

Follow along at [bit.ly/tdi-odsc](https://bit.ly/tdi-odsc)

## Data Scientists Needed

Quanthub estimates a shortage of 250,000 data scientists in 2020.

Few are coming directly from academia.

[Quanthub's survey](https://quanthub.com/data-scientist-shortage-2020/) is based on data from 2019 and 2020, so it probably does not fully reflect the effects of the pandemic.  But hiring remains strong, and this estimate is consistent with the order of magnitude from other research.

## Data Science Skills Needed

Many who don't consider themselves data scientists benefit from data science skills.

Pandas can be a game-changer, by automating spreadsheet tasks.

## Data Science Skills

Data Scientists need to combine skills in
1. Data analysis
2. Programming
3. Communication

## About the Data Incubator

The Data Incubator trains recent graduates and experienced workers.

Our Data Science Fellowship is an 8-week bootcamp.
- 1000+ students trained in the basics of data science.
- Students placed at 100+ companies, including Genentech, Square Space, and Freddie Mac

We run private, customized trainings for dozens of companies.

## About Me

I have been programming and conducting data analysis for most of this century.

First as a physicist, now as a data scientist.

I have been teaching data science for 5 years.

## Teaching Data Science Effectively

1. Jupyter as a pedagogical tool
2. Learning by doing
3. The importance of failure
4. Where we're still learning

# Jupyter as a Pedagogical Tool

Jupyter notebooks are a great tool for data science.

They also work well for teaching.

## Mixing Languages

Data scientists are often moving between multiple languages and tools.

Jupyter and the IPython kernel allow you to mix many of these tools in a single interface.

## Python

In [None]:
for i in range(10):
    print('Hi! ' * i)

## Bash

Shell code can be run with the `!` shortcut.

In [None]:
! grep "Bash" ODSC.ipynb

## Bash

There is also a bash magic, for longer code sections.

In [None]:
%%bash

for f in *; do
    echo "Hi $f"
done

## SQL

The [IPython SQL magic](https://github.com/catherinedevlin/ipython-sql) allows SQL code directly in cells.

In [None]:
%load_ext sql
%sql sqlite:///small_data/customers.sqlite

In [None]:
%%sql

SELECT * FROM customers LIMIT 5;

## HTML/JS

Jupyter already has tools to display HTML documents, but our [iHTML](https://github.com/thedataincubator/ihtml) package lets you demo HTML code directly in a notebook.

First, we'll create a Javascript document:

In [None]:
import ihtml

In [None]:
%%jsdoc clicker
document.addEventListener("DOMContentLoaded", function(e) {
    document.querySelector("h1").addEventListener("click", function(event) {
        var div = document.createElement("div");
        div.textContent = "Hi!";
        document.body.appendChild(div);
    })
})

## HTML/JS

Then we can use this inside a HTML document:

In [None]:
%%ihtml 200
<html>
    <head>
        <style>
            body { background: #eee; }
        </style>
        {{ clicker | jsdoc }}
    </head>
    <body>
        <h1>Click me!</h1>
    </body>
</html>

## Benefits of Jupyter Notebooks

**Interactivity:** Cause and effect are tightly coupled.

**Modifiability:** Easy to make small changes and see effects.

Together, these encourage experimentation.

## Deploying Jupyter for Everyone

[Jupyter](https://jupyter.org/) provides a server for a single user.

[JupyterHub](https://jupyter.org/hub) provides many users their own servers.

[Zero-to-JupyterHub](https://zero-to-jupyterhub.readthedocs.io/en/latest/) runs JupyterHub on a K8s cluster.

All for free, with a wonderful community!

For details on our set up, see [out DigitalOcean Tech Talk](https://www.digitalocean.com/community/tech_talks/scaling-a-school-bringing-data-science-curriculum-to-20-000-students-in-the-cloud).

## Learning by Doing

Promote action at four levels:
1. Prepared interactive elements
2. Small inline coding exercises
3. Stand-alone "miniprojects"
4. Capstone project

## Prepared Interactive Elements

## Inline Coding Exercises

## Miniprojects

## Capstone Project

Data scientists must solve the *right* problems and communicate the results.

Students design and execute a full data science project.

Usual problems are not technical, but around use cases.

Some recent projects:
- Finding broadband opportunities for under-served communities in Tennessee.
- Identifying useful product reviews.
- Finding shelter pets for adoption by image search.

## The Importance of Failure

You need both successes and failures to train a ML model.

Students learn as much from failures as from successes.

## The Benefits of Failure

Failure is the default state of programming.  Most time is spent debugging.

The marginal cost of code is $0. If there were known solutions, they would already be in a library.

Students need to learn how to find their way out of problems.

Hand-holding does not promote self-sufficiency.

## Google-fu is a Skill

<img src="https://i.redd.it/7lfrc6p5xna21.png" width="70%">

## Designing for Failure

More than just fill-in-the-blank.

Problems should involve aspects not covered previously.

Projects should have mutliple roots to success.

## Helping Students Overcome Failure

Provide sketches or outlines of potential solutions.

Checkpoints let students check their own work.

Make instructors available:
- Lecture
- Office hours
- One-on-one meetings
- Slack

## Instructor Failures are Teaching Moments

Students learn the most when the instructor gets stuck.

Yes, some _schadenfreude_.

But also, an invaluable chance to teach debugging skills.

$\Rightarrow$ Get into trouble while teaching!

## Still Learning: Demoing Failure

Prepared failures don't resonate as well.

It's hard to fake the panic of facing a bewildering bug.

Part of bug hunting is going down, and then abandoning blind alleys.

## Still Learning: Teaching Generalization

Some students want a flow chart to follow.

Repeat examples step by step, without understanding logic.

These students struggle mightily on miniprojects.

## Still Learning: Magical Thinking

Some students treat code as a magical incantation.

Rearrange syntax until it runs.

They seem uninterested in understanding *why* code failed.

## Find Out More

- Data Science Fellowship
- Data Engineering Fellowship
- Private Training

[thedataincubator.com](https://www.thedataincubator.com)

Robert Schroll &mdash; robert@thedataincubator.com

Booth #9