# DuckDuckGo Image Scraper

This is a slightly modified version of the notebook by Jew Dockrill. Many thanks to himn for the notebook and the package he wrote. Following intro is the original one.

> This was originally an image scraper notebook for creating deep learning datasets.
> It has since been turned into an installable library and is much easier to use as you can simply drop a few lines of code into your own notebook as you're experimenting. 
> This notebook now shows you how to use the library.

> Docs are at [joedockrill.github.io/jmd_imagescraper/](https://joedockrill.github.io/jmd_imagescraper/)

> Hugs & kisses, Joe Dockrill. 

## Install



In [None]:
!pip install -q jmd_imagescraper

## Download images

In [None]:
from pathlib import Path
from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *

path = Path().cwd()/"images"
number_images_to_download = 15

duckduckgo_search(path, "Cats", "cute kittens", max_results=number_images_to_download)
duckduckgo_search(path, "Dogs", "cute puppies", max_results=number_images_to_download)
duckduckgo_search(path, "Birds", "cute baby ducks and chickens", max_results=number_images_to_download)

# file paths are returned so if you want to snag a list of downloaded files as you go, do this:

# images = []
# images.extend(duckduckgo_search(path, "Cats", "cute kittens", max_results=10))
# images.extend(duckduckgo_search(path, "Dogs", "cute puppies", max_results=10))
# images.extend(duckduckgo_search(path, "Birds", "cute baby ducks and chickens", max_results=10))
# images

Duckduckgo search: cute kittens
Downloading results into /content/images/Cats


Duckduckgo search: cute puppies
Downloading results into /content/images/Dogs


Duckduckgo search: cute baby ducks and chickens
Downloading results into /content/images/Birds


[PosixPath('/content/images/Birds/001_e5f914dc.jpg'),
 PosixPath('/content/images/Birds/002_d79eaf3a.jpg'),
 PosixPath('/content/images/Birds/003_b9245132.jpg'),
 PosixPath('/content/images/Birds/004_c759b4a5.jpg'),
 PosixPath('/content/images/Birds/005_08a1c1f8.jpg'),
 PosixPath('/content/images/Birds/006_9df78958.jpg'),
 PosixPath('/content/images/Birds/007_0f07f684.jpg'),
 PosixPath('/content/images/Birds/008_f225fec4.jpg'),
 PosixPath('/content/images/Birds/009_6db61513.jpg'),
 PosixPath('/content/images/Birds/010_9ba4e3b1.jpg'),
 PosixPath('/content/images/Birds/011_1be9dc53.jpg'),
 PosixPath('/content/images/Birds/012_1fbfeeb0.jpg'),
 PosixPath('/content/images/Birds/013_cf3ca8d1.jpg'),
 PosixPath('/content/images/Birds/014_e810c251.jpg'),
 PosixPath('/content/images/Birds/015_36f32fb6.jpg')]

## Changing params across multiple searches

In [None]:
# If you're going to override default params across multiple searches you can use a 
# dictionary like this (so you can change search params for the entire dataset once).

params = {
    "max_results": 10,             # this can go up to 477 at the time of writing
    "img_size":    ImgSize.Cached, 
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    "img_color":   ImgColor.Purple
}

duckduckgo_search(path, "Nice", "nice clowns", **params)
duckduckgo_search(path, "Scary", "scary clowns", **params)

Duckduckgo search: nice clowns
Downloading results into /content/images/Nice


Duckduckgo search: scary clowns
Downloading results into /content/images/Scary


[PosixPath('/content/images/Scary/001_5fe1f77e.jpg'),
 PosixPath('/content/images/Scary/002_f7380d92.jpg'),
 PosixPath('/content/images/Scary/003_da44fc77.jpg'),
 PosixPath('/content/images/Scary/004_7411ce75.jpg'),
 PosixPath('/content/images/Scary/005_52be002a.jpg'),
 PosixPath('/content/images/Scary/006_d681dae8.jpg'),
 PosixPath('/content/images/Scary/007_705ac773.jpg'),
 PosixPath('/content/images/Scary/008_9f462fa7.jpg'),
 PosixPath('/content/images/Scary/009_de552ff0.jpg'),
 PosixPath('/content/images/Scary/010_7bea15a8.jpg')]

## Deleting all images

In [None]:
rmtree(path)

## Displaying the image cleaner

Use this to get rid of unsuitable images without leaving your notebook

In [None]:
from jmd_imagescraper.imagecleaner import *

display_image_cleaner(path)

HBox(children=(Button(description='|<<', layout=Layout(width='60px'), style=ButtonStyle()), Button(description…

HTML(value='<h2>No images left to display in this folder.</h2>', layout=Layout(visibility='hidden'))

GridBox(children=(VBox(children=(Image(value=b'', layout="Layout(width='150px')"), Button(description='Delete'…

## Create a zip to download or transfer to google drive

In [None]:
# create zip

ZIP_NAME = "images.zip" # maybe change this?

!rm -f {ZIP_NAME}
!zip -q -r {ZIP_NAME} {path}

In [None]:
# download to your local system

from google.colab import files
files.download(ZIP_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# copy to google drive 

from google.colab import drive
import shutil

DESTINATION_FOLDER = "Datasets" # where would you like this in Google Drive?

drive.mount("/content/drive") 
folder = Path("/content/drive/My Drive")/DESTINATION_FOLDER

folder.mkdir(parents=True, exist_ok=True)

shutil.copyfile(ZIP_NAME, str(folder/ZIP_NAME))

## Create a CSV file of URLs

If you'd rather distribute a file with the image URLs and labels and have people download the images themselves you can do so here.

In [None]:
CSV_NAME = "images.csv" # maybe change this?

!rm -f {CSV_NAME}

csv = Path.cwd()/CSV_NAME
save_urls_to_csv(csv, "Nice", "nice clowns", max_results=5)
save_urls_to_csv(csv, "Scary", "scary clowns", max_results=5)