#### Using tools/im2rec.py
You can also convert raw images into *RecordIO* format using the [__im2rec.py__](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) utility script that is provided in the MXNet [src/tools](https://github.com/dmlc/mxnet/tree/master/tools) folder.
An example of how to use the script for converting to *RecordIO* format is shown in the `Image IO` section below.

* Note that there is a C++ API implementation of [im2rec](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc), please refer to [RecordIO FAQ](https://mxnet.incubator.apache.org/faq/recordio.html) for more information.

## Image IO

In this section, we will learn how to preprocess and load image data in MXNet.

There are 4 ways of loading image data in MXNet.
   1. Using [__mx.image.imdecode__](http://mxnet.io/api/python/io/io.html#mxnet.image.imdecode) to load raw image files.
   2. Using [__`mx.img.ImageIter`__](http://mxnet.io/api/python/io/io.html#mxnet.image.ImageIter) implemented in Python which is very flexible to customization. It can read from .rec(`RecordIO`) files and raw image files.
   3. Using [__`mx.io.ImageRecordIter`__](http://mxnet.io/api/python/io/io.html#mxnet.io.ImageRecordIter) implemented on the MXNet backend in C++. This is less flexible to customization but provides various language bindings.
   4. Creating a Custom iterator inheriting `mx.io.DataIter`


### Preprocessing Images
Images can be preprocessed in different ways. We list some of them below:
- Using `mx.io.ImageRecordIter` which is fast but not very flexible. It is great for simple tasks like image recognition but won't work for more complex tasks like detection and segmentation.
- Using `mx.recordio.unpack_img` (or `cv2.imread`, `skimage`, etc) + `numpy` is flexible but slow due to Python Global Interpreter Lock (GIL).
- Using MXNet provided `mx.image` package. It stores images in [__`NDArray`__](http://mxnet.io/tutorials/basic/ndarray.html) format and leverages MXNet's [dependency engine](http://mxnet.io/architecture/note_engine.html) to automatically parallelize processing and circumvent GIL.

Below, we demonstrate some of the frequently used preprocessing routines provided by the `mx.image` package.

Let's download sample images that we can work with.

In [1]:
import mxnet as mx
%matplotlib inline
import os
import sys
import subprocess
import numpy as np
import matplotlib.pyplot as plt
import tarfile
import boto3
import botocore

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

### Loading Data using Image Iterators

Before we see how to read data using the two built-in Image iterators,
 lets get a sample __Caltech 101__ dataset
 that contains 101 classes of objects and converts them into record io format.
Download and unzip

In [2]:
BUCKET_NAME = 'reinvent2018-builder-fair-recycle-arm-us-east-1'
dataset = 'imagenet_trashnet'
project = 'imagenet_trashnet'
KEY = 'data/{}/dataset-original.zip'.format(dataset)

s3 = boto3.resource('s3')
# my_bucket = s3.Bucket(BUCKET_NAME)
# for object in my_bucket.objects.all():
#     print(object)
try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, '/tmp/data/dataset-original.zip')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

In [3]:
#fname = mx.test_utils.download(url='http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz', dirname='/tmp/data', overwrite=True)
fname = "/tmp/data/dataset-original.zip"

#tar = tarfile.open(fname)
#tar.extractall(path=os.path.join('/tmp','data'))
#tar.close()

import zipfile
zip_ref = zipfile.ZipFile(fname, 'r')
zip_ref.extractall(path=os.path.join('/tmp/data','dataset-original'))
zip_ref.close()

Let's take a look at the data. As you can see, under the root folder (./data/101_ObjectCategories) every category has a subfolder(./data/101_ObjectCategories/yin_yang).

Now let's convert them into record io format using the `im2rec.py` utility script.
First, we need to make a list that contains all the image files and their categories:

In [4]:
#im2rec_path = mx.test_utils.get_im2rec_path()
im2rec_path = "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py"
data_path = os.path.join('/tmp/data','dataset-original')
prefix_path = os.path.join('/tmp/data','trashnet')

with open(os.devnull, 'wb') as devnull:
    subprocess.check_call(['python', im2rec_path, '--list', '--recursive', prefix_path, data_path],
                          stdout=devnull)

The resulting list file (./data/caltech_train.lst) is in the format `index\t(one or more label)\tpath`. In this case, there is only one label for each image but you can modify the list to add in more for multi-label training.

Then we can use this list to create our record io file:

In [5]:
with open(os.devnull, 'wb') as devnull:
    subprocess.check_call(['python', im2rec_path, '--num-thread=4', '--quality=100', '--resize=480', prefix_path, data_path],
                          stdout=devnull)

In [6]:
import boto3
BUCKET_NAME = 'deeplens-image-classification-varunrao'
KEY1 = '{}/train'.format(project)
KEY2 = '{}/validation'.format(project)

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).upload_file('/tmp/data/trashnet.rec', KEY1 + "/trashnet.rec")
    s3.Bucket(BUCKET_NAME).upload_file('/tmp/data/trashnet.rec', KEY2 + "/trashnet.rec")
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

The record io files are now saved at here (./data)

#### Using ImageRecordIter
[__`ImageRecordIter`__](http://mxnet.io/api/python/io/io.html#mxnet.io.ImageRecordIter) can be used for loading image data saved in record io format. To use ImageRecordIter, simply create an instance by loading your record file:

In [None]:
data_iter = mx.io.ImageRecordIter(
    path_imgrec=os.path.join('/tmp','data','trashnet.rec'),
    data_shape=(3, 500, 500), # output data shape. An 227x227 region will be cropped from the original image.
    batch_size=4, # number of samples per batch
    resize=256 # resize the shorter edge to 256 before cropping
    # ... you can add more augmentation options as defined in ImageRecordIter.
    )
data_iter.reset()
batch = data_iter.next()
data = batch.data[0]
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(data[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
plt.show()

#### Using ImageIter
[__ImageIter__](http://mxnet.io/api/python/io/io.html#mxnet.io.ImageIter) is a flexible interface that supports loading of images in both RecordIO and Raw format.

In [None]:
data_iter = mx.image.ImageIter(batch_size=4, data_shape=(3, 227, 227),
                              path_imgrec=os.path.join('/tmp','data','trashnet.rec'),
                              path_imgidx=os.path.join('/tmp','data','trashnet.idx') )
data_iter.reset()
batch = data_iter.next()
data = batch.data[0]
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(data[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
plt.show()


<!-- INSERT SOURCE DOWNLOAD BUTTONS -->

