## Image Averaging and 3D Histograms Notebook

In this notebook, you will see how color can sometimes (and sometimes not) be a good feature for scene categorization. In the first part of this exercise, you will deal with three categories (deserts, forests, and oceans) with 50 images each.

Let's first begin by importing the libraries that we will need for this exercise.

In [None]:
import numpy as np
import cv2
import glob
import matplotlib.pyplot as plt
import warnings

# suppress silly warnings
warnings.filterwarnings("ignore")

Let's now create a list of images in each of these categories.

In [None]:
desertList = glob.glob('./Data/Images/desert' + '/' + '*.jpg')
# path, / , *.jpg = "everything that ends with .jpg"
forestList = glob.glob('./Data/Images/forest' + '/' + '*.jpg')
oceanList = glob.glob('./Data/Images/ocean' + '/' + '*.jpg')

In the first animation, we will create a running average of each image as it is read in. We will first create three containers for these averages, one for each category.

In [None]:
N = len(desertList) # this will be the same for all three categories == 50 images
# open the first image to peek at its width and height
testIm = cv2.imread(desertList[0])
print(testIm.shape)

Looks like the shape is height, then width, then colors. Use the following command to assign height and width to variables.

In [None]:
h, w = testIm.shape[:-1] # means all dimensions except for the last one

Now we are ready to initialize an image that will have shape height x width x 3. We'll initialize it with zeros just as a placeholder.

In [None]:
averageDesert = np.zeros((h, w, 3), np.float)
averageForest = np.zeros((h, w, 3), np.float)
averageOcean = np.zeros((h, w, 3), np.float)

We are now going to employ a loop to read in each image in turn, add it to the average, and then show the last image as well as the average image.

In [None]:
for i in range(len(desertList)):
    # read in the ith images
    desert = cv2.imread(desertList[i])
    forest = cv2.imread(forestList[i])
    ocean = cv2.imread(oceanList[i])
    
    # Convert from BGR to RGB
    desert = cv2.cvtColor(desert, cv2.COLOR_BGR2RGB)
    forest = cv2.cvtColor(forest, cv2.COLOR_BGR2RGB)
    ocean = cv2.cvtColor(ocean, cv2.COLOR_BGR2RGB)
    
    # update the average images
    averageDesert = (averageDesert*i + desert) / (i+1)
    averageForest = (averageForest*i + desert) / (i+1)
    averageOcean = (averageOcean*i + desert) / (i+1)
    
    # Plot it!
    plt.subplot(231)
    plt.imshow(desert)
    plt.axis('off')
    plt.subplot(232)
    plt.imshow(forest)
    plt.axis('off')
    plt.subplot(233)
    plt.imshow(ocean)
    plt.axis('off')
    plt.subplot(234)
    plt.imshow(desertAverage)
    plt.axis('off')
    plt.subplot(235)
    plt.imshow(forestAverage)
    plt.axis('off')
    plt.subplot(236)
    plt.imshow(oceanAverage)
    plt.axis('off')
    plt.draw()
    plt.pause(.1)

Now, let's create a 3D histogram of the average red, green, and blue pixels for each of the categories.

In [None]:
plt.ioff()
plt.figure()
ax = plt.axes(projection='3d')

red = averageDesert[:,:,0].ravel()
green = averageDesert[:,:,1].ravel()
blue = averageDesert[:,:,2].ravel()
desertFig = ax.scatter(red, green, blue, c="#D95F02", label="Desert")

red = averageForest[:,:,0].ravel()
green = averageForest[:,:,1].ravel()
blue = averageForest[:,:,2].ravel()
forestFig = ax.scatter(red, green, blue, c="green", label="Forest")

red = averageOcean[:,:,0].ravel()
green = averageOcean[:,:,1].ravel()
blue = averageOcean[:,:,2].ravel()
oceanFig = ax.scatter(red, green, blue, c="blue", label="Ocean")

ax.set_xlabel("Red Channel")
ax.set_ylabel("Green Channel")
ax.set_zlabel("Blue Channel")
plt.legend(handles=[desertFig, forestFig, oceanFig])
plt.show()

This shows that deserts, forests, and oceans can be easily separated by their colors. Does this work equally well for all types of images? In ./Data/Images2 you will find three more folders, this time of indoor images. Repeat the steps above to check whether 