### Workflow

Workflow

1. x - Upload the dataset from MelanomaDetection.zip to the GPU server. Note that strictly speaking, you do not have to use a GPU for completing this milestone. This step’s objective is to make sure that you know how to get your dataset onto wherever your GPU is, and access it from there.

2. Write a custom class for the unlabeled images that inherits `torch.utils.data.Dataset` and overrides the following methods:
   * `__init__(self, dir_path, transform=None)`: the constructor should take in a path to the directory containing images and an optional transform argument for image pre-processing and augmentation.
   * `__len__(self)`: should return the number of images in the dataset.
   * `__getitem__(self, i)`: should return the ith image in the set.


3. Write a custom class for the labeled images that inherits `torch.utils.data.Dataset` and overrides the following methods:
   * `__init__(self, dir_path, transform=None)`: the constructor should take in a path to the directory containing images and an optional transform argument for image pre-processing and augmentation
   * `__len__(self)`: should return the number of images in the dataset
   * `__getitem__(self, i)`: should return the ith image in the set as well as its label


4. Instantiate both classes and create two torch.utils.data.DataLoader objects (for the unlabeled and labeled datasets respectively). Use them to print out one batch of data each.

5. After looking at the images, what transformations do you propose to use for the pre-processing and the data augmentation?

### Resources

* https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
* https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/
* Manning Chapters - PyTorch Book
  - Augmentation - https://livebook.manning.com/book/deep-learning-with-pytorch/chapter-12
  - DataLoading - https://livebook.manning.com/book/deep-learning-with-pytorch/chapter-10

### Download the Data

In [1]:
url = 'https://liveproject-resources.s3.amazonaws.com/other/MelanomaDetection.zip'
pth = './data/MelanomaDetection.zip'
unp = './data/MelanomaDetection'


In [2]:
import urllib.request
import os.path
import re
from zipfile import ZipFile
from skimage import io

from torch.utils.data import Dataset, DataLoader

if not os.path.exists(pth):
    urllib.request.urlretrieve(url, pth)
    with ZipFile(pth, 'r') as zipObj:
        zipObj.extractall('./data')


### Write an Unlabeled Dataset Class

### Write an Labeled Dataset Class

In [17]:
class MILabeled(Dataset):
    def __init__(self, dir_path, transform=None):
        
        self.transform = transform
        self.dir_path  = dir_path
        
        files = os.listdir(dir_path)
        self.labeled_images = list(filter(None, [self._parse_files(f) for f in files]))    
        
  
    
    def __getitem__(self, i):
        return self.labeled_images[i]
    
    def _parse_files(self, fn):
        
        pth = self.dir_path + '/' + fn
        #fn  = os.path.basename(pth)
        m   = re.search(r'_(\d)\.', fn)
        if m:
            label = int(m.group(1))
            f_arr  = io.imread(pth)
            return (label, f_arr)
            
            
        return None
        
    
    def __len__(self,i):
        len(self.labeled_images)
    
   

### Instantiate Datasets and DataLoaders

### Transformers

## Testing

In [18]:
labeled_ds = MILabeled(unp + '/labeled')

In [19]:
labeled_ds[2]

(1, Array([[[190, 160, 158],
         [189, 159, 157],
         [189, 159, 157],
         ...,
         [187, 156, 164],
         [185, 153, 164],
         [184, 152, 163]],
 
        [[187, 159, 158],
         [187, 159, 158],
         [188, 160, 159],
         ...,
         [187, 158, 163],
         [187, 156, 162],
         [185, 154, 160]],
 
        [[183, 157, 158],
         [184, 158, 159],
         [186, 160, 161],
         ...,
         [189, 161, 160],
         [188, 160, 159],
         [187, 159, 158]],
 
        ...,
 
        [[202, 169, 162],
         [202, 171, 166],
         [203, 172, 169],
         ...,
         [194, 164, 176],
         [192, 164, 178],
         [189, 161, 176]],
 
        [[208, 171, 163],
         [206, 171, 165],
         [205, 172, 165],
         ...,
         [191, 165, 178],
         [189, 165, 179],
         [184, 160, 176]],
 
        [[211, 173, 164],
         [209, 172, 164],
         [206, 171, 165],
         ...,
         [192, 168, 182],