## Data Processing Tasks:
- First Task: Sort the images from the HAM100000 dataset into respective folders based off of metadata file
- Second Task: Add the Dermnet images to their respective folders (Done manually, the Dermnet images are already sorted)
- Third Task: Add the Dermnet watermark to all images

---
**Task 1:** Sorting ```HAM10000``` into a new directory ```sorted```

In [1]:
import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [11]:
import pandas as pd
import os
import shutil

In [5]:
metadata = pd.read_csv('Datasets/HAM10000/HAM10000_metadata')

In [6]:
metadata.head

<bound method NDFrame.head of          lesion_id      image_id     dx dx_type   age     sex localization  \
0      HAM_0000118  ISIC_0027419    bkl   histo  80.0    male        scalp   
1      HAM_0000118  ISIC_0025030    bkl   histo  80.0    male        scalp   
2      HAM_0002730  ISIC_0026769    bkl   histo  80.0    male        scalp   
3      HAM_0002730  ISIC_0025661    bkl   histo  80.0    male        scalp   
4      HAM_0001466  ISIC_0031633    bkl   histo  75.0    male          ear   
...            ...           ...    ...     ...   ...     ...          ...   
10010  HAM_0002867  ISIC_0033084  akiec   histo  40.0    male      abdomen   
10011  HAM_0002867  ISIC_0033550  akiec   histo  40.0    male      abdomen   
10012  HAM_0002867  ISIC_0033536  akiec   histo  40.0    male      abdomen   
10013  HAM_0000239  ISIC_0032854  akiec   histo  80.0    male         face   
10014  HAM_0003521  ISIC_0032258    mel   histo  70.0  female         back   

            dataset  
0      vidi

In [15]:
HAM10000 = 'Datasets/HAM10000/HAM10000_images'
sortedHAM10000 = 'Datasets/HAM10000/sorted'

for index, row in metadata.iterrows():
    image = row['image_id'] + '.JPG'
    source = os.path.join(HAM100, image)
    dest = os.path.join(sortedHAM10000, row['dx'], image)
    if os.path.isfile(source):
        shutil.move(source, dest)

---
**Task 2:** Merge the images from Dermnet with the sorted HAM10000 directory, and move the new modified directory (Named ```skin_cancer_master```) to the location this jupyter notebook is running from (Done manually with the below mapping)

- ```akiec_bcc``` - Contains the ```akiec```, ```bcc```, and ```Actinic keratoses, Bowen's disease, Basal Cell Carcinoma, other Malignant lesions``` folders.  - Precancerous Conditions 
- ```bkl``` - Contains the ```bkl``` and ```Seborrheic Keratoses and other Benign Tumors``` folders
- ```df``` - Standalone folder with original ```df``` data
- ```mel``` - Contains data from ```mel``` and melanoma images from ```Melanoma Skin Cancer Nevi and Moles``` folders
- ```nv``` - Contains data from ```nv``` and nevus images from ```Melanoma Skin Cancer Nevi and Moles``` folders
- ```vasc``` - contains data from ```vasc``` and ```Vascular Tumors```
- ```other``` - Contains all other images from Dermnet dataset

---
**Task 3:** Add Dermnet watermark  to each image

In [49]:
from PIL import Image

watermark = Image.open('Blue_Watermark.png').convert('RGBA')
images = 'skin_cancer_master_unwatermarked'
new_images = 'skin_cancer_master_watermarked'

for subfolder in os.listdir(images):
    for image in os.listdir(images + '/' + subfolder):
        if image.endswith(('jpg', 'JPG')):
            img = Image.open(images + '/' + subfolder + '/' + image).convert('RGBA')
            img.paste(watermark, ((img.width - watermark.width) // 2, (img.height - watermark.height) // 2), watermark)    
            img.save(f"{new_images}/{subfolder}/{image}.png")

In [55]:
watermark = Image.open('Enlarged_Gray_Watermark.png').convert('RGBA')
img = Image.open(images + '/' + "nv" + '/' + "atypical-nevi-8.JPG").convert('RGBA')
img.paste(watermark, ((img.width - watermark.width) // 2, (img.height - watermark.height) // 2), watermark)    
img.save("one.png")

In [56]:
from PIL import Image

watermark = Image.open('Enlarged_Gray_Watermark.png').convert('RGBA')
images = 'skin_cancer_master_unwatermarked'
new_images = 'skin_cancer_master_gray'

for subfolder in os.listdir(images):
    for image in os.listdir(images + '/' + subfolder):
        if image.endswith(('jpg', 'JPG')):
            img = Image.open(images + '/' + subfolder + '/' + image).convert('RGBA')
            img.paste(watermark, ((img.width - watermark.width) // 2, (img.height - watermark.height) // 2), watermark)    
            img.save(f"{new_images}/{subfolder}/{image}.png")