# Manage Dataset

I've personally downloaded the files from drive and uploaded them. In this notebook, I will create a custom directory `./storage/Training/Merged/`. This will contain the merged outputs of combining the `fg` of images.


Name convention : 
Merged Adobe Images : `BGCODE_AD_FILENAME.JPG`

Merged Other Images : `BGCODE_OT_FILENAME.JPG`

We have currently 115 background images. 358 Adobe Images and 73 Other Images. Total 431 uinque images. Combined with 115 background images, we have 49565 images.

In [1]:
import os
print(len(os.listdir("./storage/Training/Adobe/fg")))
print(len(os.listdir("./storage/Training/Other/fg")))

358
73


In [2]:
ADOBE_FG = "./storage/Training/Adobe/fg/"
ADOBE_ALPHA = "./storage/Training/Adobe/alpha/"
ADOBE_TRIMAP = "./storage/Training/Adobe/trimaps/"
BACKGROUND = "./storage/Training/Background/"

Custom Dataloader only needs Image, Alpha and Trimaps. For Adobe, Trimaps are available. But for others, Trimaps needs to be generated.

In [3]:
MERGED = "./storage/Training/Merged/"
MERGED_ALPHA = "./storage/Training/Merged/Alpha"
MERGED_TRIMAP = "./storage/Training/Merged/Trimap"

In [4]:
from PIL import Image
import math
import time

In [5]:
def composite4(fg, bg, a, w, h):
    
    bbox = fg.getbbox()
    bg = bg.crop((0,0,w,h))
    
    fg_list = fg.load()
    bg_list = bg.load()
    a_list = a.load()
    
    for y in range(h):
        for x in range (w):
            alpha = a_list[x,y] / 255
            t = fg_list[x,y][0]
            t2 = bg_list[x,y][0]
            if alpha >= 1:
                r = int(fg_list[x,y][0])
                g = int(fg_list[x,y][1])
                b = int(fg_list[x,y][2])
                bg_list[x,y] = (r, g, b, 255)
            elif alpha > 0:
                r = int(alpha * fg_list[x,y][0] + (1-alpha) * bg_list[x,y][0])
                g = int(alpha * fg_list[x,y][1] + (1-alpha) * bg_list[x,y][1])
                b = int(alpha * fg_list[x,y][2] + (1-alpha) * bg_list[x,y][2])
                bg_list[x,y] = (r, g, b, 255)

    return bg

In [6]:
from time import time
def main(fg_path, a_path, tri_path,bg_path, out_path):
    fg_files = os.listdir(fg_path)
    a_files = os.listdir(a_path)
    bg_files = os.listdir(bg_path)

    bg_iter = iter(bg_files)
    image_count = 0
    total = 0
    for im_name in fg_files:
        skip = [".ipynb_checkpoints","locked_00000.jpg", "pexels-photo-106368.jpg", "035A4308.jpg", "archeology_00050.jpg"
                , "035A4546.jpg", "035A4310.jpg", "035A4548.jpg", "archeology_00040.jpg", "035A4457.jpg", "archeology_00120.jpg"
               "archeology_00145.jpg", "godiva_close_00035.jpg", "mmtest_00090.jpg"
               ]
        if im_name in skip:
            image_count+=1
            print("\n"+str(image_count)+"/358 "+im_name, "Skipped")
            continue
        image_count+=1
        print("\n"+str(image_count)+"/358 "+im_name, end=" ")
        start_time = time()
        im = Image.open(fg_path + im_name);
        a = Image.open(a_path + im_name);
        tri = Image.open(tri_path + im_name)
        bbox = im.size
        w = bbox[0]
        h = bbox[1]

        if im.mode != 'RGB' and im.mode != 'RGBA':
            im = im.convert('RGB')

        bcount = 0 
        for bg_name in bg_files:

#             bg_name = next(bg_iter)        
            if bg_name == ".ipynb_checkpoints" or os.path.exists(out_path + "/Image/"+ bg_name[:len(im_name)-4] +"_AD_" + im_name[:len(im_name)-4]+'.jpg'):
                bcount += 1
                if bcount%10 == 0:
                    print(bcount, end=" ")
                continue
            total+=1
            bg = Image.open(bg_path + bg_name)
            if bg.mode != 'RGB':
                bg = bg.convert('RGB')

            bg_bbox = bg.size
            bw = bg_bbox[0]
            bh = bg_bbox[1]
            wratio = w / bw
            hratio = h / bh
            ratio = wratio if wratio > hratio else hratio     
            if ratio > 1:        
                bg = bg.resize((math.ceil(bw*ratio),math.ceil(bh*ratio)), Image.BICUBIC)

            out = composite4(im, bg, a, w, h)
            

            out.save(out_path + "/Image/"+ bg_name[:len(im_name)-4] +"_AD_" + im_name[:len(im_name)-4]+'.jpg', "JPEG")
            a.save(out_path + "/Alpha/"+ bg_name[:len(im_name)-4] +"_AD_" + im_name[:len(im_name)-4]+'.jpg', "JPEG")        
            tri.save(out_path + "/Trimap/"+ bg_name[:len(im_name)-4] +"_AD_" + im_name[:len(im_name)-4]+'.jpg', "JPEG")        


            bcount += 1
            if bcount%10 == 0:
                print(bcount, end=" ")
        end_time = time()
        print("\nTime taken :", end_time-start_time)
    print("Total Images :",total)

In [7]:
main(ADOBE_FG,ADOBE_ALPHA,ADOBE_TRIMAP,BACKGROUND, MERGED )


1/358 locked_00000.jpg Skipped

2/358 pexels-photo-106368.jpg Skipped

3/358 035A4308.jpg Skipped

4/358 archeology_00050.jpg Skipped

5/358 035A4546.jpg Skipped

6/358 035A4310.jpg Skipped

7/358 035A4548.jpg Skipped

8/358 archeology_00040.jpg Skipped

9/358 035A4457.jpg Skipped

10/358 archeology_00120.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.2323620319366455

11/358 archeology_00145.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.110626220703125

12/358 godiva_close_00035.jpg Skipped

13/358 mmtest_00090.jpg Skipped

14/358 wedding-dress-1174168.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.21634316444396973

15/358 archeology_00070.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.14394521713256836

16/358 model-873690_960_720.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.09038543701171875

17/358 forests-231066_1920.jpg 10 20 30 40 50 60 70 80 90 100 110 
Time taken : 0.1506023406982422

18/358 squirrel-493790_1920.jpg 10 20 30 40 50 6

In [8]:
print("No. of Merged Images :",len(os.listdir("./storage/Training/Merged/Image")))

No. of Merged Images : 41245


Due to time constraints, we are training the model on 41K images rather than 49K images as it takes very long to merge images and then we have to generate trimaps for these images also.

Training part of the images will be handled in the different notebook

---
                                                           END