# Image classification with Python
..or how I won a hackathon

## About me

Thomas Werner
    http://twitter.com/toms_rocket

* Software Craftsman (Check out the local Software Craftsmanship Community https://www.softwerkskammer.org/groups/socramob)
* Local Open Data Group / OK Lab Münster http://codeformuenster.org
* Github: https://github.com/tomsrocket (code & slides)
* Not an expert on the topic of image classification


## Image Classification

Let a computer solve the question: 
* "Is this a day or night photo?"
* "Is this a company logo?"
* "Does this picture show an animal?"
* "Which animal is on this picture"

How is this done?
* Best choice right now would be to use Deep Convolutional Neural Networks (CNN)
  * All recent ImageNet Challenge (ILSVRC) winners used CNNs
    * Yearly challenge from the standford university
    * All the big players participating: Google, Microsoft, ..
    * Results 2015: http://image-net.org/challenges/LSVRC/2015/results
  * A great explanation of how CNNs work is here: 
  http://cs231n.github.io/convolutional-networks/ (Andrej Karpathy)


* We will be using a support vector machine (SVM)
  * SVMs were state of the art for image classification problems before CNN came up
  * Relatively easy to understand  
  * Easy to use
  * Fast
  * Very good results for the chosen classification problem
  * You have to define the feature vector manually

* Experiments with combining SVM / CNN have also been done: http://deeplearning.net/wp-content/uploads/2013/03/dlsvm.pdf


## Scikit-learn - "Machine Learning in Python"

[Introduction to the scikit-learn library](IC02_intro.ipynb)


## SVM - Support vector machine

http://scikit-learn.org/stable/modules/svm.html

From a programmers point of view: 
* Define the classes you want to divide the images into
* You need a LOT of images from each of the classes
* You define a suitable feature vector
* You calculate feature vectors for all your images
* You feed the feature vectors into the SVM
* You tell the SVM which class each feature vector belongs to
* You let the SVM train on the images
* You get a "trained SVM" = we call that "classifier"
* Now you can feed an "unknown"/unclassified image into the classifier, and it will tell you which class it belongs to 



## Our Feature Vector

*Very important, the feature vectors will be the only thing the SVM gets, and the SVM needs to be able to distinguish the images by it*

* I chose a feature vector that counts the "number of pixels per color"
* Implementation details:
  * Problem 1: 
    * Just counting the number of pixels for each color = feature vectors wouldnt be comparable, but dependent on the size of each image
    * Solution: Feature vector will contain the ratio of pixels of each color by the total number of pixels (=normalized)
  * Problem 2: 
    * RGB color space = 16M colors, feature vector too large
    * Solution: map each of the 256 possible values of each color channel to a smaller set of values, for example 4
  * Problem 3: 
    * Feature vector is expected to be 1-dimensional
    * Solution: combine the color values into 1 dimension 
* Detailled description: http://www.ippatsuman.com/2014/08/13/day-and-night-an-image-classifier-with-scikit-learn/



## Now lets see some code!

### How to calculate the feature vector
* [The image class that we want to match](IC03b_feature_vector_logos.ipynb)
* [The other images](IC03c_feature_vector_images.ipynb)

Lots of different feature vectors are imaginable, e.g. scale every image down to 10x10pixels, use that as feature vector


## Implementation: Main Method

[More code ahead](IC04_svm_version1.ipynb)


&nbsp;

&nbsp;

&nbsp;

&nbsp;

That's it, thanks for listening!

&nbsp;

&nbsp;

&nbsp;