In [None]:
#hide
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [None]:
#hide
from fastbook import *

# Your Deep Learning Journey

## Deep Learning Is for Everyone

### Myths
1. Lots of math
2. Lots of data
3. Lots of expensive computers

### Reality
1. Just high school math is sufficient
2. Record-breaking results with <50 items of data
3. Can get what you need for state-of-the-art work for free

### Real-world applications
- Natural language processing (NLP): Answering questions; speech recognition; summarizing documents; classifying documents; finding names, dates, etc. in documents; searching for articles mentioning a concept
- Computer vision: Satellite and drone imagery interpretation (e.g., for disaster resilience); face recognition; image captioning; reading traffic signs; locating pedestrians and vehicles in autonomous vehicles
- Medicine: Finding anomalies in radiology images, including CT, MRI, and X-ray images; counting features in pathology slides; measuring features in ultrasounds; diagnosing diabetic retinopathy
- Biology: Folding proteins; classifying proteins; many genomics tasks, such as tumor-normal sequencing and classifying clinically actionable genetic mutations; cell classification; analyzing protein/protein interactions
- Image generation: Colorizing images; increasing image resolution; removing noise from images; converting images to art in the style of famous artists
- Recommendation systems: Web search; product recommendations; home page layout
- Playing game: Chess, Go, most Atari video games, and many real-time strategy games
- Robotics: Handling objects that are challenging to locate (e.g., transparent, shiny, lacking texture) or hard to pick up
- Other applications: Financial and logistical forecasting, text to speech, and much more...

## Neural Networks: A Brief History

- 1943: Neurophysiologist Warren McCulloch and logician Walter Pitts developed a mathematical model of an artificial neuron: 

> "Because of the 'all-or-none' character of nervous activity, neural events, and the relations among them can be treated by means of propositional logic. It is found that the behavior of every net can be described in these terms."

- 1950s: Frank Rosenblatt made some subtle changes to McCulloch and Pitts model, which he argued would give rise to a "machine capable of perceiving, recognizing, and identifying its surroundings without any human training or control."

- AI Winter: MIT professor Marvin Minsky and Seymour Papert wrote a book called _Perceptrons_. They demonstrated a single layer of Rosenblatt's invention was unable to learn some simple, critical mathematical functions, but that using multiple layers of the devices would fix the problem. Unfortunately, only the former insight would recognized, so this led to neural networks being abandoned.

- 1986: MIT released a book in two volumes called _Parallel Distributed Processing_ (PDP) describing a series of requirements that, if all were met, could lead to amazing work being done using distributed parallel processing. (Modern neural networks handle each of these requirements.)

    1. Processing units
    2. State of activation
    3. Output function
    4. Pattern of connectivity
    5. Propagation rule
    6. Activation rule
    7. Learning rule
    8. Environment: Data scientists often ignore this part
    
- 1980s: Most neural network models were built with a second layer of neurons, thus avoiding problem identified by Minsky and Papert. However, a misunderstanding of theoretical issues held back the field. In theory, adding one extra layer of neurons was enough to allow any mathematical model to be approximated, but in practice, such neural networks were too big and slow to be useful. Researchers demonstrated that to get good practical performance, you need to use even more layers of neurons.

## How to Learn Deep Learning

Based on David Perkin's theory of education, which reverses the conventional approach to teaching (build things first from principles and foundations):

1. Play the whole game
2. Make the game worth playing
3. Work on the hard parts

Proponents of this approach to education argue most people learn better with the full context first, and people who prefer a principls/foundations-first approach are overrepresented in academia.

What this means:

1. _Play the whole game_: Ensure there is a context and a purpose to be understood intuitively. Teach everything through real examples first, and then go deeper and deeper into theoretical understanding.
2. _Make the game worth game playing_: Show you how to use a complete, working, usable, state-of-the-art deep learning network to solve real-world problems with simple, expressive tools.
3. _Work on the hard parts_: Deliberate practice without dumbing things down. This includes having tackled calculus, linear algebra,  software engineering of the code, etc.

### Your Projects and Your Mindset

What sort of tasks make for good test cases? Deep learning can be set to work to any problem. It helps to focus on your hobbies and passion, setting yourself four or five little projects rather than striving to solve a big, grand problem. Trying to be too ambitious too early can often backfire.

Common character traits in people that do well at deep learning include playfulness and curiosity. "The late physicist Richard Feynman is an example of someone who we'd expect to be great at deep learning: his development of an understanding of the movement of subatomic particles came from his amusement at how plates wobble when they spin in the air."

## The Software: PyTorch, fastai, and Jupyter

...and why it doesn't matter.

Software stack:


**Python**

**PyTorch**: Super flexible, expressive, and designed for developers, but not necessarily beginner-friendly. Works best as a low-level foundation library, providing basic operations for higher-level functionality. Does not trade off speed for simplicity, providing both. Cons: no high-level APIs that makes it easy for things to be built quickly.

Easier to use than Tensorflow and much more useful for researchers. Everyone used to use Tensorflow, as did this course, a few years ago, but it got "bogged down." Within the last 12 months, the percentage of researchers using PyTorch has increased from 20% to 80%, whereas usage of Tensorflow has gone down from 80% to 20%. Industry moves more slowly, but this may change within the next year or so.

**fastai**: Sits on top of PyTorch and a popular high-level API for PyTorch, providing higher-level functionality. It is not only designed for beginners and teaching, but also for industry and practitioners. Uses standard software engineering practices like layered API, which was previously absent in any deep learning library.

**Jupyter Notebook**: Platform for experimenting with code. It will be used to train and experiment with models, as well as introspect every stage of data pre-processing and model development pipeline. Powerful, flexible, and easy-to-use.

## Your First Model

Train an image classifier to recognize dogs and cats with almost 100% accuracy. What we will use:

1. A dataset called the [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) that contains 7,349 images of cats and dogs from 37 different breeds will be downloaded from the fast.ai datasets collection to the GPU server you are using, and will then be extracted.
2. A *pretrained model* that has already been trained on 1.3 million images, using a competition-winning model will be downloaded from the internet.
3. The pretrained model will be *fine-tuned* using the latest advances in transfer learning, to create a model that is specially customized for recognizing dogs and cats.

### Getting a GPU Deep Learning Server

Access to a computer with NVIDA GPU, but no need to buy one. Just rent one from Google Colab, Paperspace Gradient, etc. Recommended OS is Linux.

### Running Your First Notebook

In [None]:
# CLICK ME
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

### Sidebar: This Book Was Written in Jupyter Notebooks

In [None]:
1+1

In [None]:
img = PILImage.create(image_cat())
img.to_thumb(192)

### End sidebar

In [None]:
uploader = widgets.FileUpload()
uploader

In [None]:
#hide
# For the book, we can't actually click an upload button, so we fake it
uploader = SimpleNamespace(data = ['images/chapter1_cat_example.jpg'])

In [None]:
img = PILImage.create(uploader.data[0])
is_cat,_,probs = learn.predict(img)
print(f"Is this a cat?: {is_cat}.")
print(f"Probability it's a cat: {probs[1].item():.6f}")

### What Is Machine Learning?
Machine learning is the training of programs developed by allowing a computer to learn from its experience, rather than manually coding the individual steps.

#### History
In 1949, researcher Arthur Samuel started working on a different way to get computers to complete tasks like image recognition, which he called machine learning.

In his 1962 essay _Artificial Intelligence: A Frontier of Automation_, he wrote:

> Programming a computer for such computations is, at best, a difficult task, not primarily because of any inherent complexity in he computer itself but, rather, because of the need to spell out every minute step of the process in the most exasperating level of detail. Computers, as any programmer will tell you, are giant morons, not giant brains.

Instead of telling the computer the exact steps to solve a problem, Samuel proposed to show the computers examples of the problem and let it figure out how to solve it itself. This turned out to be very effective: by 1961, his checkers playing program had learned so much it beat the Connecticut state champion.

> Suppose we arrange for some automatic means of testing the effectivenes of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine programmed would "learn" frm its experience.

Learning would become entirely automatic when the adjustment of its weights was also automatic. Instead of us improving a model by adjusting its weights manually, we relied on an automated mechanism that produced adjustments based on performance.

Once the model is trained - that is, once we've chosen our final, best, favorite weight assignment - then we can think of the weights as being part of the model since we're not varying them anymore. Therefore, actually using a model looks like regular programming (inputs > model > results).

In [None]:
gv('''program[shape=box3d width=1 height=0.7]
inputs->program->results''')

In [None]:
gv('''model[shape=box3d width=1 height=0.7]
inputs->model->results; weights->model''')

In [None]:
gv('''ordering=in
model[shape=box3d width=1 height=0.7]
inputs->model->results; weights->model; results->performance
performance->weights[constraint=false label=update]''')

In [None]:
gv('''model[shape=box3d width=1 height=0.7]
inputs->model->results''')

If no images are being displayed, go to File > Trust Notebook.

### What Is a Neural Network?
Neural network is a kind of function that is so flexible that it could be used to solve any given problem, just by varying its weights.

A mathematical proof called the *universal approximation theorem* shows that this function can solve any problem to any level of accuracy, in theory.

### A Bit of Deep Learning Jargon

In [None]:
gv('''ordering=in
model[shape=box3d width=1 height=0.7 label=architecture]
inputs->model->predictions; parameters->model; labels->loss; predictions->loss
loss->parameters[constraint=false label=update]''')

### Limitations Inherent To Machine Learning

From this picture we can now see some fundamental things about training a deep learning model:

- A model cannot be created without data.
- A model can only learn to operate on the patterns seen in the input data used to train it.
- This learning approach only creates *predictions*, not recommended *actions*.
- It's not enough to just have examples of input data; we need *labels* for that data too (e.g., pictures of dogs and cats aren't enough to train a model; we need a label for each one, saying which ones are dogs, and which are cats).

Generally speaking, we've seen that most organizations that say they don't have enough data, actually mean they don't have enough *labeled* data. If any organization is interested in doing something in practice with a model, then presumably they have some inputs they plan to run their model against. And presumably they've been doing that some other way for a while (e.g., manually, or with some heuristic program), so they have data from those processes! For instance, a radiology practice will almost certainly have an archive of medical scans (since they need to be able to check how their patients are progressing over time), but those scans may not have structured labels containing a list of diagnoses or interventions (since radiologists generally create free-text natural language reports, not structured data). We'll be discussing labeling approaches a lot in this book, because it's such an important issue in practice.

Since these kinds of machine learning models can only make *predictions* (i.e., attempt to replicate labels), this can result in a significant gap between organizational goals and model capabilities. For instance, in this book you'll learn how to create a *recommendation system* that can predict what products a user might purchase. This is often used in e-commerce, such as to customize products shown on a home page by showing the highest-ranked items. But such a model is generally created by looking at a user and their buying history (*inputs*) and what they went on to buy or look at (*labels*), which means that the model is likely to tell you about products the user already has or already knows about, rather than new products that they are most likely to be interested in hearing about. That's very different to what, say, an expert at your local bookseller might do, where they ask questions to figure out your taste, and then tell you about authors or series that you've never heard of before.

### How Our Image Recognizer Works

### What Our Image Recognizer Learned

### Image Recognizers Can Tackle Non-Image Tasks

### Jargon Recap

## Deep Learning Is Not Just for Image Classification

In [None]:
path = untar_data(URLs.CAMVID_TINY)
dls = SegmentationDataLoaders.from_label_func(
    path, bs=8, fnames = get_image_files(path/"images"),
    label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
    codes = np.loadtxt(path/'codes.txt', dtype=str)
)

learn = unet_learner(dls, resnet34)
learn.fine_tune(8)

In [None]:
learn.show_results(max_n=6, figsize=(7,8))

In [None]:
from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

If you hit a "CUDA out of memory error" after running this cell, click on the menu Kernel, then restart. Instead of executing the cell above, copy and paste the following code in it:

```
from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=32)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
```

This reduces the batch size to 32 (we will explain this later). If you keep hitting the same error, change 32 to 16.

In [None]:
learn.predict("I really liked that movie!")

### Sidebar: The Order Matters

### End sidebar

In [None]:
from fastai.tabular.all import *
path = untar_data(URLs.ADULT_SAMPLE)

dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, y_names="salary",
    cat_names = ['workclass', 'education', 'marital-status', 'occupation',
                 'relationship', 'race'],
    cont_names = ['age', 'fnlwgt', 'education-num'],
    procs = [Categorify, FillMissing, Normalize])

learn = tabular_learner(dls, metrics=accuracy)

In [None]:
learn.fit_one_cycle(3)

In [None]:
from fastai.collab import *
path = untar_data(URLs.ML_SAMPLE)
dls = CollabDataLoaders.from_csv(path/'ratings.csv')
learn = collab_learner(dls, y_range=(0.5,5.5))
learn.fine_tune(10)

In [None]:
learn.show_results()

### Sidebar: Datasets: Food for Models

### End sidebar

## Validation Sets and Test Sets

### Use Judgment in Defining Test Sets

## A _Choose Your Own Adventure_ moment

## Questionnaire

It can be hard to know in pages and pages of prose what the key things are that you really need to focus on and remember. So, we've prepared a list of questions and suggested steps to complete at the end of each chapter. All the answers are in the text of the chapter, so if you're not sure about anything here, reread that part of the text and make sure you understand it. Answers to all these questions are also available on the [book's website](https://book.fast.ai). You can also visit [the forums](https://forums.fast.ai) if you get stuck to get help from other folks studying this material.

For more questions, including detailed answers and links to the video timeline, have a look at Radek Osmulski's [aiquizzes](http://aiquizzes.com/howto).

1. Do you need these for deep learning?

   - Lots of math T / F
   - Lots of data T / F
   - Lots of expensive computers T / F
   - A PhD T / F
   
1. Name five areas where deep learning is now the best in the world.
1. What was the name of the first device that was based on the principle of the artificial neuron?
1. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?
1. What were the two theoretical misunderstandings that held back the field of neural networks?
1. What is a GPU?
1. Open a notebook and execute a cell containing: `1+1`. What happens?
1. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.
1. Complete the Jupyter Notebook online appendix.
1. Why is it hard to use a traditional computer program to recognize images in a photo?
1. What did Samuel mean by "weight assignment"?
1. What term do we normally use in deep learning for what Samuel called "weights"?
1. Draw a picture that summarizes Samuel's view of a machine learning model.
1. Why is it hard to understand why a deep learning model makes a particular prediction?
1. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?
1. What do you need in order to train a model?
1. How could a feedback loop impact the rollout of a predictive policing model?
1. Do we always have to use 224×224-pixel images with the cat recognition model?
1. What is the difference between classification and regression?
1. What is a validation set? What is a test set? Why do we need them?
1. What will fastai do if you don't provide a validation set?
1. Can we always use a random sample for a validation set? Why or why not?
1. What is overfitting? Provide an example.
1. What is a metric? How does it differ from "loss"?
1. How can pretrained models help?
1. What is the "head" of a model?
1. What kinds of features do the early layers of a CNN find? How about the later layers?
1. Are image models only useful for photos?
1. What is an "architecture"?
1. What is segmentation?
1. What is `y_range` used for? When do we need it?
1. What are "hyperparameters"?
1. What's the best way to avoid failures when using AI in an organization?

### Further Research

Each chapter also has a "Further Research" section that poses questions that aren't fully answered in the text, or gives more advanced assignments. Answers to these questions aren't on the book's website; you'll need to do your own research!

1. Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning?
1. Try to think of three areas where feedback loops might impact the use of machine learning. See if you can find documented examples of that happening in practice.