# Generate label dataframes from JSON files

 * The goal for this notebook is to put JSON label files into dataframes.
 * There are three camera viewing angles so there will be three dataframes generated.
 * Three dataframes will be merged. 
 * The merged dataframe specification:
   * Index column: epoch timestamp (second resolution)
   * Data colomns: left, right, op

In [1]:
import pandas as pd
import numpy as np
import os
import json
import time

from calendar import timegm

Specify the necessary parameters to start:
  * `base_dir`: label root directory
  * `cam_angle`: camera angle, also as part of the path
  * `f_prefix`: label file prefix
  * `f_suffix`: label file suffix

In [2]:
base_dir = '/home/yang/research/dock/explicator/tri-cam-labeled'
cam_angle = 'left'
f_prefix = 'out_'
f_suffix = '_fin-header-pos-labels.json'

Find all the folders under the directory. Remove the first one as it is the top level folder.

In [3]:
fol = [x[0] for x in os.walk(os.path.join(base_dir, cam_angle))]
del fol[0]

Initialize two lists. One for storing the parsed timestamps and one for storing the text labels.

In [4]:
ts = []
labels = []

Go through all the label files in folders:
  1. Read JSON file
  2. The timestamps are calculated as: `ts = base_ts + clip_idx`. Since each clip is 1 second, each `clip_idx` indicates the time has passed by 1 second. Append the calculated timestamps into `ts` list.
  3. Append the labels to `labels` list.

In [5]:
for f in fol:
    base_ts_str = os.path.basename(f)
    utc_ts = time.strptime(base_ts_str, "%Y-%m-%d_%H-%M-%S")
    base_ts = timegm(utc_ts)
    with open(os.path.join(f, f_prefix + os.path.basename(f) + f_suffix), 'r') as ff:
        j = json.loads(ff.read())
        for item in j:
            clip_idx = int(os.path.basename(item['video']).split('_')[5].replace('.mp4', ''))
            ts.append(base_ts + clip_idx)
            labels.append(item['label'])

A quick verification to look at the length of the resultant lists.

In [6]:
print(len(ts), len(labels))

3606 3606


Create a pd dataframe from `ts` and `labels` lists. Set `ts` as the index.

In [7]:
left_df = pd.DataFrame({'ts': ts, 'label': labels})

left_df.index = pd.to_datetime(left_df['ts'], unit='s')

left_df = left_df.sort_index().drop(columns=['ts'])
left_df.head(10)

Unnamed: 0_level_0,label
ts,Unnamed: 1_level_1
2019-07-15 19:15:00,header up
2019-07-15 19:15:01,header up
2019-07-15 19:15:02,header up
2019-07-15 19:15:03,header up
2019-07-15 19:15:04,header up
2019-07-15 19:15:05,header up
2019-07-15 19:15:06,header up
2019-07-15 19:15:07,header up
2019-07-15 19:15:08,header up
2019-07-15 19:15:09,header up


Perform the same procedure to `right` and `op` JSON files.

In [8]:
cam_angle = 'right'

In [9]:
fol = [x[0] for x in os.walk(os.path.join(base_dir, cam_angle))]
del fol[0]

ts = []
labels = []

for f in fol:
    base_ts_str = os.path.basename(f)
    utc_ts = time.strptime(base_ts_str, "%Y-%m-%d_%H-%M-%S")
    base_ts = timegm(utc_ts)
    with open(os.path.join(f, f_prefix + os.path.basename(f) + f_suffix), 'r') as ff:
        j = json.loads(ff.read())
        for item in j:
            clip_idx = int(os.path.basename(item['video']).split('_')[5].replace('.mp4', ''))
            ts.append(base_ts + clip_idx)
            labels.append(item['label'])
            
right_df = pd.DataFrame({'ts': ts, 'label': labels})
right_df.index = pd.to_datetime(right_df['ts'], unit='s')
right_df = right_df.sort_index().drop(columns=['ts'])
right_df.head(10)

Unnamed: 0_level_0,label
ts,Unnamed: 1_level_1
2019-07-15 19:15:05,header up
2019-07-15 19:15:06,header up
2019-07-15 19:15:07,header up
2019-07-15 19:15:08,header up
2019-07-15 19:15:09,header up
2019-07-15 19:15:10,header up
2019-07-15 19:15:11,header up
2019-07-15 19:15:12,header up
2019-07-15 19:15:13,header up
2019-07-15 19:15:14,header up


In [10]:
cam_angle = 'op'

In [11]:
fol = [x[0] for x in os.walk(os.path.join(base_dir, cam_angle))]
del fol[0]

ts = []
labels = []

for f in fol:
    base_ts_str = os.path.basename(f)
    utc_ts = time.strptime(base_ts_str, "%Y-%m-%d_%H-%M-%S")
    base_ts = timegm(utc_ts)
    with open(os.path.join(f, f_prefix + os.path.basename(f) + f_suffix), 'r') as ff:
        j = json.loads(ff.read())
        for item in j:
            clip_idx = int(os.path.basename(item['video']).split('_')[5].replace('.mp4', ''))
            ts.append(base_ts + clip_idx)
            labels.append(item['label'])
            
op_df = pd.DataFrame({'ts': ts, 'label': labels})
op_df.index = pd.to_datetime(op_df['ts'], unit='s')
op_df = op_df.sort_index().drop(columns=['ts'])
op_df.head(10)

Unnamed: 0_level_0,label
ts,Unnamed: 1_level_1
2019-07-15 19:16:10,none
2019-07-15 19:16:11,none
2019-07-15 19:16:12,none
2019-07-15 19:16:13,none
2019-07-15 19:16:14,none
2019-07-15 19:16:15,none
2019-07-15 19:16:16,none
2019-07-15 19:16:17,none
2019-07-15 19:16:18,none
2019-07-15 19:16:19,none


Merge `left`, `right`, `op` label dataframes into one.

In [17]:
m = pd.merge(left_df, right_df, how='outer', left_index=True, right_index=True, suffixes=('_left', '_right'))
labels_df = pd.merge(m, op_df, how='outer', left_index=True, right_index=True)
labels_df.rename(columns={'label_left': 'left', 'label_right': 'right', 'label': 'op'}, inplace=True)
labels_df.head(10)

Unnamed: 0_level_0,left,right,op
ts,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-07-15 19:15:00,header up,,
2019-07-15 19:15:01,header up,,
2019-07-15 19:15:02,header up,,
2019-07-15 19:15:03,header up,,
2019-07-15 19:15:04,header up,,
2019-07-15 19:15:05,header up,header up,
2019-07-15 19:15:06,header up,header up,
2019-07-15 19:15:07,header up,header up,
2019-07-15 19:15:08,header up,header up,
2019-07-15 19:15:09,header up,header up,


Save the final merged dataframe.

In [16]:
labels_df = labels_df.fillna('not available')
labels_df.to_hdf('./labels.h5', key='labels')