# Using deep features to train an image classifier

### Load data from the google drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#### Installing Turicreaete   

In [6]:
!pip install turicreate



#### Now importing the Turicreate 

In [3]:
import turicreate
from PIL import Image

# Load some data
#### Here we are uploading from the google drive

In [7]:
image_train = turicreate.SFrame('/content/drive/My Drive/Coursera/image_train_data/')
image_test = turicreate.SFrame('/content/drive/My Drive/Coursera/image_test_data/')

In [8]:
image_train

id,image,label,deep_features,image_array
24,Height: 32 Width: 32,bird,"[0.242871761322, 1.09545373917, 0.0, ...","[73.0, 77.0, 58.0, 71.0, 68.0, 50.0, 77.0, 69.0, ..."
33,Height: 32 Width: 32,cat,"[0.525087952614, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[7.0, 5.0, 8.0, 7.0, 5.0, 8.0, 5.0, 4.0, 6.0, 7.0, ..."
36,Height: 32 Width: 32,cat,"[0.566015958786, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[169.0, 122.0, 65.0, 131.0, 108.0, 75.0, ..."
70,Height: 32 Width: 32,dog,"[1.12979578972, 0.0, 0.0, 0.778194487095, 0.0, ...","[154.0, 179.0, 152.0, 159.0, 183.0, 157.0, ..."
90,Height: 32 Width: 32,bird,"[1.71786928177, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[216.0, 195.0, 180.0, 201.0, 178.0, 160.0, ..."
97,Height: 32 Width: 32,automobile,"[1.57818555832, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[33.0, 44.0, 27.0, 29.0, 44.0, 31.0, 32.0, 45.0, ..."
107,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.220677852631, 0.0, ...","[97.0, 51.0, 31.0, 104.0, 58.0, 38.0, 107.0, 61.0, ..."
121,Height: 32 Width: 32,bird,"[0.0, 0.23753464222, 0.0, 0.0, 0.0, 0.0, ...","[93.0, 96.0, 88.0, 102.0, 106.0, 97.0, 117.0, ..."
136,Height: 32 Width: 32,automobile,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.5737862587, 0.0, ...","[35.0, 59.0, 53.0, 36.0, 56.0, 56.0, 42.0, 62.0, ..."
138,Height: 32 Width: 32,bird,"[0.658935725689, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[205.0, 193.0, 195.0, 200.0, 187.0, 193.0, ..."


# Train an image classifier on raw image pixels

## Model we are creating 
* We want to successfully label the images
* Logisitc regression (classification model we want to use in turicareate it calls logistic classifier) 
* Setting features as a Image array in which we have pixel value of the iamge 
* We want to predict the label for the image using from the pixel info

In [9]:
raw_pixel_model = turicreate.logistic_classifier.create(image_train,
                                                       target = 'label',
                                                       features = ['image_array'])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



# Make predictions using simple raw pixel model

### let's compare the results predicted by the model with the actual data 

##### Actual data

In [10]:
image_test[0:3]['label']

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

##### Predcition made by the raw_pixel model

In [12]:
raw_pixel_model.predict(image_test[0:3])

dtype: str
Rows: 3
['bird', 'cat', 'bird']

### **Note:** From the above results it can be clearly stated that our model is not perfomring well 

# Evaluate the raw pixel model on the test data

### By evaluating our model we can get general information like ...
* What is the accuracy of our model ?
* How much precision and recall our model has ?
* What is the value of Area under ROC curve for our model ? 

In [13]:
raw_pixel_model.evaluate(image_test)

{'accuracy': 0.47725, 'auc': 0.7259967916666666, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     dog      |    automobile   |  156  |
 |  automobile  |       cat       |  148  |
 |     cat      |       bird      |  175  |
 |     dog      |       bird      |  220  |
 |     cat      |    automobile   |  218  |
 |     cat      |       cat       |  373  |
 |     dog      |       dog       |  359  |
 |     cat      |       dog       |  234  |
 |     bird     |       cat       |  195  |
 |  automobile  |       bird      |   95  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.4704582115541531, 'log_loss': 1.1975419460755192, 'precision': 0

# Train image classifier using deep features

### Our model is not perfoming well so what we can do ?
* We can check that how many data points(here images) our model is learning on.
* If the number is small than we can take following steps.
* We can use some predefined model.
* We can use the features learnd from one domain and transfer them to current.

In [14]:
len(image_train)

2005

### How to use the predifined model ?
* We can load the model and can use them straight forward.
* The pretrained model, we are using is known as ImageNet having 1.5 million of images and over thousands of categories.  
* Now, we can use that pretraind model and can extract the feautre for our train data.
* **Note:** Here the method etract_feature will extract features from the model(deep learning model) and will apply those feature to our image_train data.  

In [17]:
#deep_learning_model = turicreate.load_model('imagenet_model_iter45')
#image_train['deep_features'] = deep_learning_model.extract_features(image_train)

In [18]:
image_train

id,image,label,deep_features,image_array
24,Height: 32 Width: 32,bird,"[0.242871761322, 1.09545373917, 0.0, ...","[73.0, 77.0, 58.0, 71.0, 68.0, 50.0, 77.0, 69.0, ..."
33,Height: 32 Width: 32,cat,"[0.525087952614, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[7.0, 5.0, 8.0, 7.0, 5.0, 8.0, 5.0, 4.0, 6.0, 7.0, ..."
36,Height: 32 Width: 32,cat,"[0.566015958786, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[169.0, 122.0, 65.0, 131.0, 108.0, 75.0, ..."
70,Height: 32 Width: 32,dog,"[1.12979578972, 0.0, 0.0, 0.778194487095, 0.0, ...","[154.0, 179.0, 152.0, 159.0, 183.0, 157.0, ..."
90,Height: 32 Width: 32,bird,"[1.71786928177, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[216.0, 195.0, 180.0, 201.0, 178.0, 160.0, ..."
97,Height: 32 Width: 32,automobile,"[1.57818555832, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[33.0, 44.0, 27.0, 29.0, 44.0, 31.0, 32.0, 45.0, ..."
107,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.220677852631, 0.0, ...","[97.0, 51.0, 31.0, 104.0, 58.0, 38.0, 107.0, 61.0, ..."
121,Height: 32 Width: 32,bird,"[0.0, 0.23753464222, 0.0, 0.0, 0.0, 0.0, ...","[93.0, 96.0, 88.0, 102.0, 106.0, 97.0, 117.0, ..."
136,Height: 32 Width: 32,automobile,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.5737862587, 0.0, ...","[35.0, 59.0, 53.0, 36.0, 56.0, 56.0, 42.0, 62.0, ..."
138,Height: 32 Width: 32,bird,"[0.658935725689, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[205.0, 193.0, 195.0, 200.0, 187.0, 193.0, ..."


### Now, as we can see above we have created deep features for our data set using a pretraind model.

# Given the deep features, train a logistic classifier

In [19]:
deep_features_model = turicreate.logistic_classifier.create(image_train,
                                                           target='label',
                                                           features = ['deep_features'])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



# Apply the deep features classifier on the first few images

### Now, again we want to see that how our new model is predicting?
* we will check the first three labels and then we will compare it with the prediction done by the new model

In [21]:
image_test[0:3]['label'].explore()

Unnamed: 0,SArray
0,cat
1,automobile
2,cat


In [22]:
deep_features_model.predict(image_test[0:3])

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

#### From above results, it can be said that our model is perfomring better when we use the deep features extracted by the other model.

# Quantitatively evaluate deep features classifier on test data

In [23]:
deep_features_model.evaluate(image_test)

{'accuracy': 0.79275, 'auc': 0.9397962083333339, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       dog       |   59  |
 |     dog      |       cat       |  179  |
 |     cat      |       dog       |  245  |
 |     dog      |       dog       |  759  |
 |     cat      |       bird      |   77  |
 |  automobile  |       dog       |   11  |
 |     cat      |    automobile   |   20  |
 |     dog      |       bird      |   53  |
 |     dog      |    automobile   |   9   |
 |  automobile  |       cat       |   18  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.7933394857645091, 'log_loss': 0.6294464301028291, 'precision': 0

## Evaluation stats 
* Our new model has accuracy of 79%  which is better over the previous 47%
* From the first table above, it can be seen that total 759 labels predicted correctly.