This file should be run only once, which is used to create a combined dataset for the models. If you run 2 times or above, the number of images in the final dataset will be different from the original and there will be some noisy datas. Steps before running this file:
- STEP 1: Download the original dataset and only pick 3 folders in v2_cam1_cam2_split_by_driver folder, which is the version 2 of the dataset. Copy that 3 folders (Camera1, Camera 2, skin_nonskin_pixels).
- STEP 2: Create a folder named "Distracted Driver Dataset" (Case-sensitive and spaces in between) and paste the 3 folders above into Distracted Driver Dataset folder.
- STEP 3: Open this file and run it once. DO NOT RUN IT AGAIN.

NOTE: 
- If your accidentally run twice or messed up the combined dataset, please delete the Combined folder and rerun this file.
- DO NOT EDIT ANY FOLDER/FILE in the original dataset
- This process cannot be done manually by copy files and paste in the new folders since there are some duplicates of images' name which leads to missing some images during the process. 

In [24]:
import os
import shutil

In [25]:
CAM1 = os.path.join(os.getcwd(),"Distracted Driver Dataset","Camera 1")
CAM2 = os.path.join(os.getcwd(),"Distracted Driver Dataset","Camera 2")
CAM1_TEST = os.path.join(CAM1, "test")
CAM2_TEST = os.path.join(CAM2, "test")
CAM1_TRAIN = os.path.join(CAM1, "train")
CAM2_TRAIN = os.path.join(CAM2, "train")
COMBINED_DIR = os.path.join(os.getcwd(),"Distracted Driver Dataset","Combined")
COMBINED_TEST = os.path.join(COMBINED_DIR, "test")
COMBINED_TRAIN = os.path.join(COMBINED_DIR, "train")

In [26]:
CLASS = ["c0","c1","c2","c3","c4","c5","c6","c7","c8","c9"]
CAM1_TEST_CLS = [os.path.join(CAM1_TEST, cls) for cls in CLASS]
CAM2_TEST_CLS = [os.path.join(CAM2_TEST, cls) for cls in CLASS]
CAM1_TRAIN_CLS = [os.path.join(CAM1_TRAIN, cls) for cls in CLASS]
CAM2_TRAIN_CLS = [os.path.join(CAM2_TRAIN, cls) for cls in CLASS]
COMBINED_TEST_CLS = [os.path.join(COMBINED_TEST, cls) for cls in CLASS]
COMBINED_TRAIN_CLS = [os.path.join(COMBINED_TRAIN, cls) for cls in CLASS]


#### Check if datasets exist

In [27]:
if not os.path.exists(CAM1_TEST):
  print("Test folder for Camera 1 does not exist")
if not os.path.exists(CAM2_TEST):
  print("Test folder for Camera 2 does not exist")
if not os.path.exists(CAM1_TRAIN):
  print("Train folder for Camera 1 does not exist")
if not os.path.exists(CAM2_TRAIN):
  print("Train folder for Camera 2 does not exist")
for cls in range(10):
  if not os.path.exists(CAM1_TEST_CLS[cls]):
    print("Test folder for Camera 1 class {} does not exist".format(cls))
  if not os.path.exists(CAM2_TEST_CLS[cls]):
    print("Test folder for Camera 2 class {} does not exist".format(cls))
  if not os.path.exists(CAM1_TRAIN_CLS[cls]):
    print("Train folder for Camera 1 class {} does not exist".format(cls))
  if not os.path.exists(CAM2_TRAIN_CLS[cls]):
    print("Train folder for Camera 2 class {} does not exist".format(cls))

#### Combine dataset

The original dataset is divided into 2 folders for camera 1 and 2, in the following code, we would combine the test and train folders of camera 1 and camera 2 into one folder COMBINED dataset. Moreover, we also combine class c1 (Text Right) with c3 (Text Left) and c2 (Phone Right) with c4 (Phone Left).


The new order of dataset is:
- c0: Safe driving
- c1: Text
- c2: Phone
- c3: Adjusting Radio
- c4: Drinking
- c5: Reaching Behind
- c6: Hair or Makeup
- c7: Talking to Passenger

In [28]:
if not os.path.exists(COMBINED_DIR):
  os.mkdir(COMBINED_DIR)
  print("Creating Combined dataset folder")
if not os.path.exists(COMBINED_TEST):
  os.mkdir(COMBINED_TEST)
  print("Creating Combined test folder")
if not os.path.exists(COMBINED_TRAIN):
  os.mkdir(COMBINED_TRAIN)
  print("Creating Combined test folder")

Creating Combined dataset folder
Creating Combined test folder
Creating Combined test folder


#### Combine camera 1 and 2 test/train sets

Since in the dataset of cam 1 and 2, there are some files that have the same name, resulting in loosing quite a lot of images (~2000). Therefore, during this process, we also need to rename the images before putting it in to the combined folders.

In [29]:
test_dirs  = [CAM1_TEST_CLS, CAM2_TEST_CLS]
train_dirs = [CAM1_TRAIN_CLS, CAM2_TRAIN_CLS]

for i in range(2):
  for cls in range(10):
    if not os.path.exists(COMBINED_TEST_CLS[cls]):
      os.mkdir(COMBINED_TEST_CLS[cls])
      print("Creating Combined test folder for class {}".format(cls))
    for test_image in os.listdir(test_dirs[i][cls]):
      new_name = str(i) + str(cls) + test_image
      old_test_image_path = os.path.join(test_dirs[i][cls], test_image)
      new_test_image_path = os.path.join(test_dirs[i][cls], new_name)
      destination_path = os.path.join(COMBINED_TEST_CLS[cls], new_name)
      shutil.copy(old_test_image_path, destination_path)

    if not os.path.exists(COMBINED_TRAIN_CLS[cls]):
      os.mkdir(COMBINED_TRAIN_CLS[cls])
      print("Creating Combined train folder for class {}".format(cls))
    for train_image in os.listdir(train_dirs[i][cls]):
      new_name = str(i) + str(cls) + train_image
      old_train_image_path = os.path.join(train_dirs[i][cls], train_image)
      new_train_image_path = os.path.join(train_dirs[i][cls], new_name)
      destination_path = os.path.join(COMBINED_TRAIN_CLS[cls], new_name)
      shutil.copy(old_train_image_path, destination_path)

Creating Combined test folder for class 0
Creating Combined train folder for class 0
Creating Combined test folder for class 1
Creating Combined train folder for class 1
Creating Combined test folder for class 2
Creating Combined train folder for class 2
Creating Combined test folder for class 3
Creating Combined train folder for class 3
Creating Combined test folder for class 4
Creating Combined train folder for class 4
Creating Combined test folder for class 5
Creating Combined train folder for class 5
Creating Combined test folder for class 6
Creating Combined train folder for class 6
Creating Combined test folder for class 7
Creating Combined train folder for class 7
Creating Combined test folder for class 8
Creating Combined train folder for class 8
Creating Combined test folder for class 9
Creating Combined train folder for class 9


#### Combine class c1 (Text Right) with c3 (Text Left) and c2 (Phone Right) with c4 (Phone Left) on Combined dataset

In [30]:
combined_dirs = [COMBINED_TEST_CLS, COMBINED_TRAIN_CLS] 
for combined_dir in combined_dirs: 
  for image in os.listdir(combined_dir[3]):
    image_path = os.path.join(combined_dir[3], image)
    destination_path = os.path.join(combined_dir[1], image)
    shutil.copy(image_path, destination_path)
  shutil.rmtree(combined_dir[3])
  
  for image in os.listdir(combined_dir[4]):
    image_path = os.path.join(combined_dir[4], image)
    destination_path = os.path.join(combined_dir[2], image)
    shutil.copy(image_path, destination_path)
  shutil.rmtree(combined_dir[4])

#### Relabeling the class from c5 to c9 

In [31]:
if len(COMBINED_TEST_CLS) == 10 and len(COMBINED_TRAIN_CLS) == 10:
  for cls in range(5,10):
    NEW_COMBINED_TEST_CLS = os.path.join(COMBINED_TEST, "c{}".format(cls-2))
    NEW_COMBINED_TRAIN_CLS = os.path.join(COMBINED_TRAIN, "c{}".format(cls-2))
    if os.path.exists(COMBINED_TEST_CLS[cls]):
      os.rename(COMBINED_TEST_CLS[cls], NEW_COMBINED_TEST_CLS)
    if os.path.exists(COMBINED_TRAIN_CLS[cls]):
      os.rename(COMBINED_TRAIN_CLS[cls], NEW_COMBINED_TRAIN_CLS)
  COMBINED_TEST_CLS.remove(os.path.join(COMBINED_TEST, "c8"))
  COMBINED_TEST_CLS.remove(os.path.join(COMBINED_TEST, "c9"))
  COMBINED_TRAIN_CLS.remove(os.path.join(COMBINED_TRAIN, "c8"))
  COMBINED_TRAIN_CLS.remove(os.path.join(COMBINED_TRAIN, "c9"))

#### Count the number of images in each class in train set 

In [32]:
NEW_CLASS = [["c0", "Safe Driving"], ["c1", "Text"], ["c2", "Phone"], 
         ["c3", "Adjusting Radio"], ["c4", "Drinking"], 
         ["c5", "Reaching Behind"], ["c6", "Hair or Makeup"], 
         ["c7", "Talking to Passenger"]]

In [33]:
total_images_ori = 0
for cls in range(10):
  total_images_ori += len(os.listdir(CAM1_TEST_CLS[cls]))
  total_images_ori += len(os.listdir(CAM2_TEST_CLS[cls]))
  total_images_ori += len(os.listdir(CAM1_TRAIN_CLS[cls]))
  total_images_ori += len(os.listdir(CAM2_TRAIN_CLS[cls]))
print("Total number of images in the original dataset: {}".format(total_images_ori))

total_images_new = 0
for cls in range(8):
  total_images_new += len(os.listdir(COMBINED_TEST_CLS[cls]))
  total_images_new += len(os.listdir(COMBINED_TRAIN_CLS[cls]))
print("Total number of images in the new dataset: {}".format(total_images_new))
if (total_images_ori == total_images_new):
  print("--> The number of two datasets are equal")
else:
  print("--> The number of two datasets are not equal")

Total number of images in the original dataset: 14478
Total number of images in the new dataset: 14478
--> The number of two datasets are equal


In [35]:
total_test = 0
for cls in range(8):
  num_images = len(os.listdir(COMBINED_TEST_CLS[cls]))
  total_test += num_images
  print("Number of test images in class {}({}) is {}".format(NEW_CLASS[cls][0], NEW_CLASS[cls][1], num_images))
print("The total number of test images is {}".format(total_test))
print("================================================================")

total_train = 0
for cls in range(8):
  num_images = len(os.listdir(COMBINED_TRAIN_CLS[cls]))
  total_train += num_images
  print("Number of train images in class {}({}) is {}".format(NEW_CLASS[cls][0], NEW_CLASS[cls][1], num_images))
print("The total number of train images is {}".format(total_train))
print("================================================================")

Number of test images in class c0(Safe Driving) is 346
Number of test images in class c1(Text) is 393
Number of test images in class c2(Phone) is 364
Number of test images in class c3(Adjusting Radio) is 170
Number of test images in class c4(Drinking) is 143
Number of test images in class c5(Reaching Behind) is 143
Number of test images in class c6(Hair or Makeup) is 146
Number of test images in class c7(Talking to Passenger) is 218
The total number of test images is 1923
Number of train images in class c0(Safe Driving) is 2640
Number of train images in class c1(Text) is 2449
Number of train images in class c2(Phone) is 2212
Number of train images in class c3(Adjusting Radio) is 953
Number of train images in class c4(Drinking) is 933
Number of train images in class c5(Reaching Behind) is 891
Number of train images in class c6(Hair or Makeup) is 898
Number of train images in class c7(Talking to Passenger) is 1579
The total number of train images is 12555


As we can see, this dataset is not balance.