# A quickstart introduction

## An example ...
As part of a [conservation effort](http://burrowingowlconservation.org/sightings/), Ann would like to report sightings of Burrowing Owls as she is hiking. Unfortunately, Ann doesn't know what a Burrowing Owl looks like so she goes to the web to look at pictures. What she has then, is a set of images that are labeled, meaning that she knows what each image is a picture of. By examining these labeled images she is **training** herself to recognize Burrowing Owls.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/owls2.png)


More generally, we can call this set of labeled images used for training, the **labeled training dataset**. Let's dive into this idea of a labeled training dataset a bit more. Suppose Clara is given the task of distinguishing between pictures of telecaster style guitars and stratocaster. But not to worry, because her boss has given her thousands of pictures of guitars. When looking at a picture, the only thing Clara knows is that it is of either a stratocaster or a telecaster. For example, here are some pictures of stratocasters and telecasters.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/guitars2.png)


Again, when looking at a picture all she knows is that it is a picture of a stratocaster or telecaster but she doesn't know which of the two it is. How long will it take for Clara to learn how to distinguish between these two guitar styles? Would the time be significantly shorter if her boss gave her 10,000 pictures or 100,000? If this is all the information she gets, she will never learn. What she needs is a **labeled** dataset. When presented with a picture she needs to know whether it is a picture of a stratocaster or a telecaster.


**Now back to Ann learning to recognize Burrowing Owls**

When Ann is learning to recognize Burrowing Owls from her labeled training set, she is developing a model of what features make it a Burrowing Owl. One such feature is the bright yellow eyes. Once she is done learning she can be on a hike, see an animal, and classify it as a Burrowing Owl or something else. This is an **inference** process---based on the evidence of different features of the animal she can infer what it is. And to throw more jargon at you, this type of problem is called a **classification problem**. In classification problems the system is given the features of an object and it needs to classify that object. For example,

* The features might be the words of a Twitter post (i.e., *Everything Everywhere All At Once was a f-ing masterpiece. I can't emphasize how great this movie is, it's just that great.*) and based on those features the system classifies the post as positve, negative, or neutral.
* The features might be the pixels of an image and based on those pixels the system classifies the image into one of 1,000 categories (it's an image of an owl, a bicycle).



## machine learning
In machine learning, classification systems have a similar two step process. First is the training phase where the system uses a labeled training dataset to build a model. (We will learn about the architecture of these models and how they learn a bit later.)

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/quickDiagram2.png)



> **Supervised vs. Unsupervised learning**. When a machine learning system trains on labeled data this is called supervised learning. When a system learns with unlabeled data this is called unsupervised learning. A very common example of unsupervised learning is clustering where we might give our system a million unlabeled pictures and ask it to divide the images into 10 groups.

Once we have a trained model, we can use that model for inference---we can give it pictures and the system can classify them as burrowing owl or something else. Again, the two phases are:

1. training
2. inference

In this introduction we are going to ignore the training phase and learn a bit about inference. To do that we are going to use a pre-trained model. What *pre-trained model* simply means is that someone else designed the architecture of the model and trained it.

### VGG16

The pretrained model we will use is VGG16, which was designed by Karén Simonyan & Andrew Zisserman. In 2014, it won a competition where the competing systems had to classify objects in images into one of 1,000 categories and then locate where that object is located in the image by drawing a bounding box around the object. As shown below.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/boundingbox.png)

VGG16 beat out GoogLeNet in that competition.  Since then there are dozens of systems that perform better, but we will use VGG16 because it runs well on a Colab instance. The pretrained VGG16 was trained on ImageNet, a labeled training dataset of over one million images.

## Let's get started

#### First, a note ...
The intent of this notebook is for you to learn a little bit about data mining and have a bit of fun. The idea is not for you to understand every line of code. That will come later.

**Note:**

**First let's set the runtime to GPU (Graphics Processing Unit) -- click on 'runtime' in the menu above, select 'Change runtime type' and pick one of the 'GPU' options.**

### 0. TensorFlow and Keras
TensorFlow and Keras are already available in Google Colab so there is nothing special to install.

### 1. Load the VGG16 model.

In [None]:
from keras.applications import VGG16

model = VGG16(weights='imagenet', include_top=True)

The `import` command loads a library module into Python. For example,

```import tensorflow```

will import the Python tensorflow library, meaning that all the code in that module will be available to your program. Sometimes you only need part of that module. To import part of the code from a module one can use the `from ... import` command. For example,

```from keras.models import Sequential```

will import the code from the `Sequential` submodule from `models` which itself is a submodule of `keras`.

Now back to our code above. The first line:

```from keras.applications import VGG16```

imports the code associated with VGG16.

The next line:

```model = VGG16(weights='imagenet', include_top=True)```

creates an instance of the VGG16 model. We are going to use the model that has been pretrained on the ImageNet dataset (`weights= 'imagenet`) and we want to use the same types of object names used in ImageNet (`include_top=True`)



#### Pretrained models
There are many pretrained models available to use in Keras ([around 40 at the time of this writing](https://keras.io/api/applications/#available-models))


Of course we could call this model anything we want:

```
vgg = VGG16(weights='imagenet', include_top=True)
myModel = VGG16(weights='imagenet', include_top=True)
```

Now we have a pretrained model loaded into our system. We would like to use the model to classify our own images.

## Inference

We would like to give VGG16 an image like:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/poodle.jpg)

and have VGG16 classify it. First, let's download the image file from the web using the Unix command `curl`:

In [None]:
!curl http://zacharski.org/files/courses/dmpics/poodle.jpg -o poodle.jpg

The exclamation point (aka *bang*) at the beginning of the line instructs the system to interpret the rest of the line as a Unix command (something you might type in a Unix terminal).

For example

```!ls```

will list the contents of the current directory. If you want more information about the `curl` command you can type

```!man curl```

Next, let's load that image into Python using PIL (Python Imaging Library). This step isn't necessary for our model to recognize the image, but it is useful for us to see the image.

In [None]:
from PIL import Image
img = Image.open("poodle.jpg")


Now let's display that image:

In [None]:
img

(If that doesn't display an image change the code to `img.show`)


The size of this particular image is 4032x3024 which is slightly over 12 million pixels. That is a lot of pixels! VGG16 was designed to work with an image size of 224x224. So we need to transform the original image to VGG16 specifications.  Since the image isn't square but bigger in width than height, if we just resized it to 224x224 the image would look squished:


![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/ghost103.png)

To avoid this let's do two things:

First, let's resize the image so that the smaller dimension of the original image will be resized to 224.  For this image the smaller dimension is the height. That will look like:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/ghost101.png)

Then we will crop the image, removing the left and right sections of the photo:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/ghost102.png)


This all can be done by one Keras command:


In [None]:
from keras.utils import load_img
img = load_img("poodle.jpg", target_size=(224, 224), keep_aspect_ratio=True)


The resulting image looks like:

In [None]:
img

The resulting size of the image is

In [None]:
img.size

That looks fantastic!



Finally let's convert the image to an array and preprocess the array into a form expected by VGG16:

In [None]:
import numpy as np
from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image
input_img = image.img_to_array(img)             # convert the image into an array.
                                                # The array represents a single image
input_img = np.expand_dims(input_img, axis= 0)  # our model processes an array of images, not just
                                                # a single one. So add a dimension to the array
input_img = preprocess_input(input_img)         # finally, transform the values to those needed by vgg16
input_img


Now we have prepared the image and are prepared to pass it to our vgg16 model for inference.

### Model Inference

In deep learning models we can classify an image using the `predict` method:

In [None]:
predictions = model.predict(input_img)
predictions


In [None]:
predictions.shape

As we see `predictions` is a two dimensional tensor with one row representing the single image we used as input and 1,000 different values for that image. We get 1,000 values because ImageNet contained 1,000 classes for the images. The larger the number the more likely the image is of that class.



As you can see, the image is not very likely to be one of the first 5 labels as they are very small numbers:

```1.86004166e-14, 5.63008062e-14, 1.81835883e-13, 3.05401185e-14, 1.17410248e-11```

Those represent the probabilities of

```['tench', 'goldfish', 'great white shark', 'tiger shark', 'hammerhead']```

 It is good to know our model didn't think our image was of a fish. What does our model think?


Let's find out the top 5 predictions:

In [None]:
from keras.applications.vgg16 import decode_predictions
top5  = decode_predictions(predictions, top=5)
top5

VGG16 is over 86% certain that the image is of a standard poodle.

Pretty impressive!

## Putting it all together

Let's combine all the code above into several contiguous codeblocks without the intervening explanations.  

First we will put all the import statements at the start of the code, much like we would do in other programming languages.

Next we will create an instance of the VGG16 model.

In [None]:
from keras.applications import VGG16
from PIL import Image
import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import preprocess_input
from keras.utils import load_img
import requests

model = VGG16(weights='imagenet', include_top=True)

Let's turn what we learned into a function and try predicting the class of other images.

In [None]:

def transform(img_path):
  "Transform the image to a 224 x 224 array"
  targetdim = 224
  ## load image and crop
  img = load_img(img_path, target_size=(targetdim, targetdim), keep_aspect_ratio=True)

  # convert to array
  input_img = image.img_to_array(img)
  input_img = np.expand_dims(input_img, axis= 0)
  input_img = preprocess_input(input_img)
  return input_img


def predict(url):
    # first download the image from the web
    r = requests.get(url)
    with open('tmp.jpg', 'wb') as f:
        f.write(r.content)
    img_t = transform('tmp.jpg')
    predictions = model.predict(img_t)
    return (decode_predictions(predictions, top=5))



Let's see what if our model will correctly recognize a violin.

![](https://cdn.shoplightspeed.com/shops/629006/files/23585410/430x510x3/yamaha-yamaha-4-4-braviol-av10-intermediate-violin.jpg)

In [None]:
predict( 'https://cdn.shoplightspeed.com/shops/629006/files/23585410/430x510x3/yamaha-yamaha-4-4-braviol-av10-intermediate-violin.jpg')

Around 93% certain it is a image of a violin.
So far so good!

Let's find out what the 1,000 labels are:

In [None]:
!curl https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/imagenet_classes.txt -o imagenet_classes.txt

with open('imagenet_classes.txt') as labelfile:
  i = 0
  line = ''
  labels = labelfile.readlines()
  for label in labels:

    #print("HERE")
    line += '%-25s' %labels[i].strip()
    if (i + 1) % 4 == 0:
      print(line)
      line = ''
    i+=1

Electric Guitar is one of the labels. Let's see if our model correctly identifies a picture of one:

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/telecaster.jpg))

In [None]:
predict('https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/telecaster.jpg')


Our system is nearly 75% sure that this image is an electric guitar.



![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/torchdivide.png)


# <font color='#EE4C2C'>You Try ...</font>
Ok, it is time for you to try out what you just learned.

## <font color='#EE4C2C'>1. Your own images</font>
Try this out on three images of your own.

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/PyDivideTwo.png)
## <font color='#EE4C2C'>2. Xception</font>

Let's try a different pretrained model, Xception. [Here is the documentation page for it.](https://keras.io/api/applications/xception/) Load the model, construct a function that will make predictions based on the model and try it out on the images above that we provided plus your three.

**Here are a few things to note:**

1. VGG16 has an input size of 224x224x3.  Xception has a different input size and you will need to make that change to `transform`
2. VGG16 used `keras.applications.vgg16.preprocess_input`. Xception uses a different `preprocess_input. You will need to make that change as well.
</span>

![](https://raw.githubusercontent.com/zacharski/datamining-guide/master/labs/pics/PyDivideTwo.png)
# <font color='SlateBlue'>Summary</font>
We learned some critical terms in machine learning, including
1. supervised and unsupervised learning systems
2. labeled data
3. inference
4. pretrained models

We alse gained some experience working with Colab, and Keras.
