Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Image dataset generator for Deep learning projects

Join the chat at

Get a large image dataset with minimal effort

This tool automatically collect images from Google or Bing and optionally resize them.

python "funny cats" -limit=100 -dest=folder_name -resize=250x250

Then you can randomly generate new images with image augmentation from an existing folder. It will add noise, rotate, transform, flip, blur on random images.

python -folder=my_folder/funny_cats -limit=10000

TADA ! In few seconds you will get 10 000 different images of funny cats to train your favorite deep learning algorithm !

Table of content


This project is tested with Python 3.6.4 and more.


  • chromium-browser package (sudo apt-get install chromium-browser)



Git clone the project

Get the python dependencies

pip install -r requirements.txt

Run unit tests

python -m unittest discover


Download images from the web

python "red car" -limit=150 -dest=folder_name -resize=250x250

After running this command, you will have 150 images of red cars (resized 250px by 250px) in the /folder_name/red_car folder.

You can find all possible parameters in the table below (also available with the --help parameter) :

Parameters Description
Keyword (required) The first parameter should be a keyword describing the images to search for.

python "red car"
Destination folder
-dest or -d
Specify the destination folder to save files (default: images/)

python "red car" -dest=your_folder
Limit number
-limit or -l
Specify the number of files to download (default: 50). See the note below for the maximum limit.

python "red car" -limit=200
Thumbnail only
-thumbnail or -thumb
Download the thumbnail instead of the full original image

python "red car" -thumbnail
Resize image
Resize downloaded images on the fly, to get a dataset formatted with the same size (default: no resizing). The parameter should be a couple of number representing the width and height (32x32 will ouput 32px x 32px image files)

python "red car" -resize=32x32"
Grab source
-source, -src or -allsources
Choose the website to grab images : Google and/or Bing (default: Google). -allsources parameter can be use to. It will equally mix image files from all available sources

python "red car" -source=Google (single source)
python "red car" -source=Google -source=Bing (multi source)
python "red car" -allsources (all sources)

Note : There are known limitations for the total number of images you can download in one use of the script. Bing and Google won't let you download more than 800 images each, so the maximum for one download is around 1600 images if you use the -allsources parameter.

Image augmentation

python -folder=your_folder -limit=10000

10 000 augmented images will output by default to the "output" folder inside your image folder.

By default, this command will randomly apply these image transformations :

  • Blur image (with a probability of 10%)

  • Add Random noise (with a probability of 50%)

  • Horizontal flip (with a probability of 30%)

  • Left or Right rotation between 0 or 25 degree (with a probability of 50%)

  • ... to be completed

You can customize these default values by editing the file or by making your own image augmentation pipeline

You can find all possible parameters in the table below (also available with the --help parameter) :

Parameters Description
Keyword (required) Folder input path containing images that will be augmented.`
Destination folder
-dest or -d
Specify the destination folder to save augmented files (default: /your_folder/output)

python -folder=your_folder -limit=50 -dest=other_folder
Limit number
-limit or -l
Number of image to generate by augmentation (default: 50)

Create a custom image augmentation pipeline

from augmentation.augmentation import DatasetGenerator

pipeline = DatasetGenerator(
pipeline.rotate(probability=0.5, max_left_degree=25, max_right_degree=25)
pipeline.resize(probability=1, width=20, height=20)

That's it !

Common issues

WebDriverException: Message: unknown error: cannot find Chrome binary

Make sure chromedriver is well installed on your PATH (run the which chromedriver command on Linux and then echo $PATH). Also Chrome should be installed on your machine (or the chromium-package for Linux).

You can install the chromedriver with this command (more information here): pip install chromedriver_installer --install-option="--chromedriver-version=2.35"

error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools":

As this repo use scikit-image for image processing, on Windows you need Microsoft Visual C++ Build Tools which is provided with Visual Studio (think to check the C++ options on installation). You can install it with the link below.


  • This repo is largely inspired by the work of Marcus Bloice on his Augmentor project. Many thanks for the great work and the useful documentation.

  • I also pick some ideas from this great series of articles for the automatic part to grab images.

The goal of this repo is mainly to provide the smaller python library as possible to generate an image dataset, without a big framework like Keras, Tflearn etc, which can be hard to configure and install for new people working on Data Science / AI.


Get a large image dataset with minimal effort by grabbing image through the web and generate new ones by image augmentation.







No packages published