# A quickstart introduction

## An example ...
As part of a [conservation effort](http://burrowingowlconservation.org/sightings/), Ann would like to report sightings of Burrowing Owls as she is hiking. Unfortunately, Ann doesn't know what a Burrowing Owl looks like so she goes to the web to look at pictures. What she has then, is a set of images that are labeled. By examining these labeled images she is **training** herself to recognize Burrowing Owls. 

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/owls2.png)


More generally, we can call this set of labeled images used for training, the **labeled training dataset**. Let's dive into this idea of a labeled training dataset a bit more. Suppose Clara is given the task of distinguishing between pictures of telecaster style guitars and stratocaster. But not to worry, because her boss has given her thousands of pictures of guitars. When looking at a picture, the only thing Clara knows is that it is of either a stratocaster or a telecaster. For example, here are some pictures of stratocasters and telecasters. 

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/guitars2.png)


Again, when looking at a picture all she knows is that it is a picture of a stratocaster or telecaster but she doesn't know which of the two it is. How long will it take for Clara to learn how to distinguish between these two guitar styles? Would the time be significantly shorter if her boss gave her 10,000 pictures or 100,000? If this is all the information she gets, she will never learn. What she needs is a **labeled** dataset. When presented with a picture she needs to know whether it is a picture of a stratocaster or a telecaster. 


**Now back to Ann learning to recognize Burrowing Owls**

When Ann is learning to recognize Burrowing Owls from her labeled training set, she is developing a model of what features make it a Burrowing Owl. Once she is done learning she can be on a hike, see an animal, and classify it as a Burrowing Owl or something else. This is an **inference** process---based on the evidence of different features of the animal she can infer what it is. And to throw more jargon at you, this type of problem is called a **classification problem**. In classification problems the system is given the features of an object and it needs to classify that object. For example,

* The features might be the words of a Twitter post (i.e., *Everything Everywhere All At Once was a f-ing masterpiece. I can't emphasize how great this movie is, it's just that great.*) and based on those features the system classifies the post as positve, negative, or neutral.
* The features might be the pixels of an image and based on those pixels the system classifies the image into one of 1,000 categories (it's an image of an owl, a bicycle).



## machine learning
In machine learning, classification systems have a similar two step process. First is the training phase where the system uses a labeled training dataset to build a model. (We will learn about the architecture of these models and how they learn a bit later.) 

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/quickDiagram2.png)



> **Supervised vs. Unsupervised learning**. When a machine learning system trains on labeled data this is called supervised learning. When a system learns with unlabeled data this is called unsupervised learning. A very common example of unsupervised learning is clustering where we might give our system a million unlabeled pictures and ask it to divide the images into 10 groups. 

Once we have a trained model, we can use that model for inference---we can give it pictures and the system can classify them as burrowing owl or something else. Again, the two phases are:

1. training
2. inference

In this introduction we are going to ignore the training phase and learn a bit about inference. To do that we are going to use a pre-trained model. What *pre-trained model* simply means is that someone else designed the architecture of the model and trained it. 

### AlexNet

The pretrained model we will use is AlexNet. AlexNet was designed by Alex Krishevsky, Ilya Sutskever, and Geoffrey Hinton. In 2012, AlexNet won a competition where the competing systems had to classify images into one of 1,000 categories. AlexNet had a 15% error rate and the error rate of the second place winner was over 25%. Since then there are dozens of systems that perform better, but we will use AlexNet because of its historic significance. The pretrained AlexNet was trained on a labeled training dataset of over one million images.

## Let's get started

#### First, a note ...
The intent of this notebook is for you to learn a little bit about data mining and have a bit of fun. The idea is not for you to understand every line of code. That will come later.

**Note:**

**First let's set the runtime to GPU (Graphics Processing Unit) -- click on 'runtime' in the menu above, select 'Change runtime type' and pick 'GPU'.**

Let's check to see if we set the runtime to GPU:

In [None]:
!nvidia-smi

You should see something like:

```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   64C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

Showing, for example, that we are using a Tesla T4 GPU

Now we will set up a variable to allow us to use the GPU.

In [1]:
import torch
if torch.cuda.is_available():  
  dev = "cuda:0" 
else:  
  dev = "cpu"  
device = torch.device(dev) 
dev

'cpu'

### 1. Install Pytorch Lightning.
First, let's install the Pytorch Lightning library on our virtual machine

In [None]:
!pip install pytorch-lightning


The exclamation point (aka *bang*) at the beginning of the line instructs the system to interpret the rest of the line as a Unix command (something you might type in a Unix terminal).

For example

```!ls```

will list the contents of the current directory

`pip` is the **p**ackage **i**nstaller for **P**ython. As the name suggests, it install Python libraries (packages) that are not already present in the system. In this case,

In the case above, we are installing the `pytorch-lightning` library.


### 2. Import the computer vision library


In [None]:
from torchvision import models
import torch

Without the bang Jupyter interprets code lines in this notebook as a Python commands which

```from torchvision import models``` is.

`torchvision` is a library containing models and other components for computer vision. `torch` is the basic PyTorch deep learning library.

There are many pretrained models available to use. Let's take a look at the possibilities:

In [None]:
dir(models)

That is a lot of pretrained models we can use!

### 3. Load  the pretrained AlexNet model.

In [None]:
alexnet = models.alexnet(weights='AlexNet_Weights.DEFAULT')
# have model run on the GPU
alexnet.cuda()

And let's check to see if the model is using the GPU ...

In [None]:
next(alexnet.parameters()).is_cuda

Excellent! CUDA is the API for NVIDIA GPUs that allow us to do parallel programming.

Of course we could call this model anything we want:

```
alexnet_model = models.alexnet(pretrained=True)
myModel = models.alexnet(pretrained=True)
```

Now we have a pretrained model loaded into our system. We would like to use the model to classify our own images. 

## Inference

We would like to give AlexNet an image like:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/poodle.jpg)

and have AlexNet classify it. First, let's download the image file from the web using the Unix command `curl`:

In [None]:
!curl http://zacharski.org/files/courses/dmpics/poodle.jpg -o poodle.jpg

Next, let's load that image into Python using PIL (Python Imaging Library):

In [None]:
from PIL import Image
img = Image.open("poodle.jpg")

Now let's display that image:

In [None]:
img

(If that doesn't display an image change the code to `img.show`)


The size of this particular image is 4032x3024 which is slightly over 12 million pixels. That is a lot of pixels! AlexNet was designed to work with an image size of 224x224. So we need to transform the original image to AlexNet specifications by using some methods from the `torchvision` library. 

First we will use `transforms.Resize(256)` which transforms the image so that the smaller dimension of the original image will be resized to 256.  ([PyTorch documentation](https://pytorch.org/vision/main/generated/torchvision.transforms.Resize.html)). The resultant image will be 341x256.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/poodleSmall.jpg)


To get the image to the final 224x224 size we are going to use `transforms.CenterCrop(224)` which as the name suggests crops the image at the center to a 224x224 square. The result will look something like:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/poodleCropped.jpg)



Then we will convert the image to an array using `ToTensor`. Since each pixel has values for red, green and blue (RGB), the resulting array will be 224x224x3. Finally we normalize the tensor using `transforms.Normalize` ([PyTorch Documentation](https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html?highlight=normalize#torchvision.transforms.Normalize)). 



Here are those transformations put together:



In [None]:
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229,0.224,0.225])
])

You may wonder where the numbers come from in 

```
        mean=[0.485, 0.456, 0.406],
        std=[0.229,0.224,0.225])
```
These are the mean and standard deviation of the RGB values for all the pixels in all the images in the ImageNet dataset.

Let's use this method we defined to transform the image:


In [None]:
img_t = transform(img)

batch_t = torch.unsqueeze(img_t, 0)

# put the tensor on the GPU
batch_t = batch_t.to(device)
batch_t.is_cuda

Image classification models typically classify an array of images at once rather than a single image. 

```
    batch_t = torch.unsqueeze(img_t, 0)
```
creates a tensor with one element `img_t` which itself is a tensor. In a sense, an array of image arrays.

Next,

```
batch_t = batch_t.to(device)
```
Puts that Tensor on the GPU and

```
batch_t.is_cuda
```
checks to make sure that that is the case.

Now we have prepared the image and are prepared to pass it to alexnet for inference.

### Model Inference

In PyTorch, models can be in two modes and we can toggle between them.

* `alexnet.eval()` puts the model in inference mode so it can make predictions.
* `alexnet.train()` puts the model in training mode.


Let's get the model ready for inference:

In [None]:
alexnet.eval()

As you can see, `alexnet.eval()` displays a lot of information about the architecture of the model. We will learn about the architecture of deep learning models about midway through the course.

Now let's pass the tensor of our image to alexnet and get the output. Plus, let's examine the shape of the output.

In [None]:
out = alexnet(batch_t)
print(out.shape)

As we see `out` is a one dimensional tensor with 1,000 different values. We get 1,000 values because Image_net contained 1,000 labels for the images. The larger the number the more likely the image is of that class. 

In [None]:
out

Let's convert those values to probabilities:

In [None]:
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
percentage

As you can see, the image is not very likely to be one of the first 5 labels. What are the actual names of these labels. First let's download the label name file:


In [None]:
!curl http://zacharski.org/files/courses/dmpics/imagenet_classes.txt -o imagenet_classes.txt


Let's load in those labels:

In [None]:
with open('imagenet_classes.txt') as f:

  labels = [line.strip() for line in f.readlines()]
print(labels[:5])

The first five labels are of types of fish. It is good to know our model didn't think our image was of a fish. What does our model think?


In [None]:
z_, index = torch.max(out, 1)
print(index)

Okay, the label at index 267 is the most likely. Let's print that out:

In [None]:
print(labels[index[0]], percentage[index[0]].item())


Fortunately, alexnet correctly thinks the image is of a standard poodle with 38.65% likelihood. 

The ImageNet competition evaluated systems based on the top-5 error rate meaning the system was judged correct if the correct label was among the top 5 the system predicted.  Let's look at the top 5 our model predicted for this image.

In [None]:
_, indices = torch.sort(out, descending=True)
[(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]


Let's turn what we learned into a function and try predicting the class of other images.

In [None]:
import  requests

def predict(url):
    # first download the image from the web
    r = requests.get(url)
    with open('tmp.jpg', 'wb') as f:
        f.write(r.content)
    img = Image.open('tmp.jpg')
    img.show()
    img_t = transform(img)
    batch_t = torch.unsqueeze(img_t, 0)
    batch_t = batch_t.to(device)
    out = alexnet(batch_t)
    _, indices = torch.sort(out, descending=True)
    percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
    return([(labels[idx], percentage[idx].item()) for idx in indices[0][:5]])


Let's find out what the 1,000 labels are:

In [None]:

line = ''
for i in range(len(labels)):
  line += '%-25s' %labels[i]
  if (i + 1) % 4 == 0:
    print(line)
    line = ''

Electric Guitar is one of the labels. Let's see if our model correctly identifies a picture of one:

In [None]:
predict('https://raw.githubusercontent.com/zacharski/ml-class/master/labs/pics/Fender_Stratocaster.jpeg')


Our system is 99% sure that this image is an electric guitar.

While AlexNet doesn't know about burrowing owls it does know about great horned owls. Let's give it a picture of a burrowing owl and see what it does:

In [None]:
predict('https://raw.githubusercontent.com/zacharski/ml-class/master/labs/pics/greyOwl.jpeg')

That is a reasonable response!

Let's try a cello.

In [None]:
predict('https://raw.githubusercontent.com/zacharski/ml-class/master/labs/pics/cello12.jpeg')

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/torchdivide.png)


# <font color='#EE4C2C'>You Try ...</font> 
Ok, it is time for you to try out what you just learned.

## <font color='#EE4C2C'>1. Your own images</font> 
Try this out on three images of your own.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/PyDivideTwo.png)
## <font color='#EE4C2C'>2. Squeezenet</font> 

Let's try a different pretrained model, `squeezenet1_1`. Load the model, construct a function that will make predictions based on the model and try it out on the images above that we provided plus your three.</span>

In [None]:
#

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/PyDivideTwo.png)
## <font color='#EE4C2C'>2. Summary</font> 

Please answer the following questions by editing this markdown cell. </span>

1. Classification machine learning models have two modes. What are they?  <font color='#EE4C2C'>your answer here</font> 
2. What is a pretrained model? 
2. What is supervised learning? What is unsupervised learning?
3. Describe in a few sentences what Squeezenet is. (requires a bit of googling)
4. Compare the performance of AlexNet over Squeezenet. Was one more accurate than the other? Did you notice any other differences? 
    
