<a href="https://colab.research.google.com/github/teddylew12/Waymo_Open_DataSet/blob/master/Generate_Weather_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating Weather Images

The goal of this Jupyter Notebook is to preprocess the TFRecord files to create a dataset for our Weather Classifier Competition (See Faster-RCNN-Part-2!).


### Imports

In [0]:
from google.colab import drive
drive.mount('/data')


In [0]:
!pip3 install waymo-open-dataset
!pip3 install tensorflow==2.0.0
import os
import tensorflow as tf
import math
import numpy as np
import itertools
import random
from PIL import Image
from tqdm import tqdm
from glob import glob
!pip install simple_waymo_open_dataset_reader
from simple_waymo_open_dataset_reader import WaymoDataFileReader
from simple_waymo_open_dataset_reader import dataset_pb2, label_pb2
from simple_waymo_open_dataset_reader import utils


In [0]:
# Change the CD To your specific path
DIRNAME ="/content/drive/Shared drives/Waymo Project/Data/CleanTest/weather_images"
os.chdir(DIRNAME)
! mkdir weather_images

## Class Imbalance Prevention

The main challenge of creating this dataset is the massive class imbalance between sunny and rainy images. 



In [None]:
sunny = 0
rainy = 0
for file in glob(DIRNAME +"/*.tfrecord"):
    if getweather(file) =="sunny":
        sunny +=1
    else
        rainy +=1
print(f"There are {sunny} sunny and {rainy} rainy files")

If we  were to naively use all of the files, our model would be baised towards always classifying the weather as sunny. So we need to make a evenly balanced dataset by only taking a portion of the sunny images to match the number of rainy images.

In [0]:

def getWeather(FILENAME):
    datafile = WaymoDataFileReader(tf_file)
    for frame in datafile:
        return frame.context.stats.weather

def generate_file_list(dirname):
    sun_files =[]
    rain_files =[]
    for fname in os.listdir(dirname):
        #Check that our file is an tfrecord file
        if not fname.endswith(".tfrecord"):
            continue
        fullname = os.path.join(dirname,fname)
        if getWeather(fullname) == "sunny":
            sun_files.append(fname)
        else:
            rain_files.append(fname)
    #There will be an imbalance (lots more sunny files), so lets make them equal
    num_files = len(rain_files)
    print(num_files)
    sun_files = random.shuffle(sun_files)
    sun_files = sun_files[:num_files]
    assert(len(sun_files)==len(rain_files))
    return sun_files, rain_files

sun_files,rain_files = generate_file_list(DIRNAME)


In [0]:
def save_image(image,savename):
    #Decode the encoded image
    bytes_img=BytesIO(image)
    im=Image.open(bytes_img)
    name="sunny"+str(count)+ ".png"
    im.save(os.path.join('weather_images/',name))

def create_new_images(sun_arr, rain_arr):

    count = 0
    for i in tqdm(range(len(sun_arr))):
        datafile = WaymoDataFileReader(sun_arr[i])
        for frame in datafile:
            for i in range(len(frame.images)):
                if i not in [2,4]:
                    save_name = "sunny"+str(count)+ ".png"
                    save_image(frame.images[i].image,save_name)
                    count += 1
    count = 0
    for i in tqdm(range(len(rain_arr))):
        datafile = WaymoDataFileReader(rain_arr[i])
        for frame in datafile:
            for i in range(len(frame.images)):
                if i not in [2,4]:
                    save_name = "rainy"+str(count)+".png"
                    save_image(frame.images[i].image, save_name)
                    count += 1

create_new_images(sun_files, rain_files)
