Skip to content

thekevinscott/dataset-tutorial-for-image-classification

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

A Sample Dataset for practicing Image Classification

This repo is a companion for the article Image Classification in the Browser with Javascript.

This repo contains data appropriate for training. You can test image classification in your browser here.

Included in the data folder is:

  • data/pretrained-model-data - A single image for testing for a valid prediction from MobileNet
  • data/colors - A simple dataset to be used as a sanity check that training is working correctly
  • data/pexel-images - A set of stock photography from the top three most popular categories on Pexels

An example of using the datasets

Running

Clone this repo:

git clone https://github.com/thekevinscott/dataset-tutorial-for-image-classification.git
cd dataset-tutorial-for-image-classification

Install the packages with yarn:

yarn

Run the code with:

yarn develop

Picking a Dataset

It's important that the training images you use are as similar as possible to the images you will be eventually predicting.

For instance, if you're training an image classifier to recognize pictures of fruit and your training data consists of well let brightly colored fruit but your users upload grainy, low-light shots, your accuracy will suffer.

Organizing your Dataset

To put together a dataset for training an image classifier, you need to organize your data like so:

- /folder-containing-your-images
  - /training
    - /class-one
    - /class-two
  - /validation
    - /class-one
    - /class-two

Populating your dataset

A good rule of thumb is 80% training data to 20% validation data.

Some reference links:

Example

Let's say you want to recognize cats vs. dogs. Your folder structure would look like:

- /folder-containing-your-images
  - /training
    - /cats
    - /dogs
  - /validation
    - /cats
    - /dogs

About

A tutorial on organizing datasets for image classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published