# Handling labels
---

# edit_labels
You need to properly set input file path in Excel input.xlsx. Depending which parameters are set, edit_labels start different mode.<BR>
**Mode-1. Initial labeling**
    
    (set: labeling_path, labeled_h5, inferred path, inferred_video)<BR>
**Mode-2. Refine labels adding labeled frames**
    
    (set: labeled_path, labeled_h5, training_path, labeled_for_train_pickle, inferred path, inferred_video, inferred_h5)<BR>

# Case-1. Initial labeling
#### 1. For Initial labeling, we assume you already did the followings
1. Run `dlc.extract_frames`. It extracts png images under `(dlc_root)/labeled-data/(video)/`.
2. Run `dlc.label_frames`. Load the frames and save. It creates `CollectedData_(scorer).h5`.

#### 2. Specify input files in Excel input.xlsx
- **labeling_path** that contains png and `CollectedData_(scorer).h5` files. `(dlc_root)/labeled-data/(video)/`
    - **labeled_h5** `CollectedData_(scorer).h5`
- **inferred path** that contains inferred_video.
    - **inferred_video:** video file name

#### 3. Run edit_labels
**Esc** key quits the tool and generates `(date)-(time)-extracted` folder that contains `extracted.h5` and `extracted.csv` and `png` image files. Move all the files to `(dlc_root)/labeled-data/(video)/`.

# Case-2. Refine labels adding labeled frames
**Attention:** The edit_labels cannot start from Jupyter with a likelihood graph. If you need it, please run it from VS code or command prompt with `plot_type='raster'`.(2022/06/02 wi)<BR>

#### 1. For refinement of labeled data, we assume you are at...
1. Run the first labeling and training. Then you have the followings...:
    - `(dlc_root)/dlc-models`
    - `(dlc_root)/evaluation-results`
    - `(dlc_root)/labeled-data`
    - `(dlc_root)/training-datasets`
    - `(dlc_root)/videos`
    - `(dlc_root)/config.yaml`
2. Inferred videos
    The videos may locate different folder from the `(dlc_root)`.
    
#### 2. Make backups of the labeled data for previous training
Make new folder and copy all files including the trained video images. `imgxxxxx.png`, and `CollectedData_(scorer).h5`.<BR>
    
#### 3. Specify input files in Excel input.xlsx

- **labeling_path** that contains the **aggregated** labeled coordinate h5 file.
    `(dlc_root)/training-datasets(iteration-?)/UnaugmentedDataSet_(project)(date)`
    - **labeled_h5**: `CollectedData_(scorer).h5`    
    
- **training_path** that contains `Documentation_data-(project)_95shuffle1.pickle`, which describes details which frames were used for training or testing.
    `(dlc_root)/training-datasets(iteration-?)/UnaugmentedDataSet_(project)(date)`.
    - **labeled_for_train_pickle**: `Documentation_data-(project)_95shuffle1.pickle`
    
- **inferred path** that contains inferred_video and h5 file for the inferred coordinates.
    - **inferred_video:** video file name
    - **inferred_h5:** h5 file for inferred coordinate. `(project)(corer)(project)(data)shuffle1_200000_el.h5`

**Note:** You can put multiple rows and select the row at the second argument in `el.read_input(input_path, 1)`.
    
#### 4. Run edit_labels 
**Esc** key quits the tool and generates `(date)-(time)-extracted` folder that contains `extracted.h5` and `extracted.csv` and `png` image files. Move all the files to `(dlc_root)/labeled-data/(video)/`.

In [3]:
import edit_labels as el
import os

if __name__ == '__main__':

    # input data

    input_path = 'input.xlsx'

    if os.path.exists(input_path):
        inferred_video, inferred_h5, labeled_h5, labeled_for_train_pickle = el.read_input(
            input_path, 5)
    else:
        ############################
        # example data
        # inferred video
        inferred_video = r'input_data\rpicam-01_1806_20210722_212134.mp4'
        # inferred result h5
        inferred_h5 = r'input_data\rpicam-01_1806_20210722_212134DLC_dlcrnetms5_homecage_test01May17shuffle1_200000_el.h5'
        # labeled data for training
        labeled_h5 = r'input_data\CollectedData_DJ.h5'
        # information which frame is used for training or testing
        labeled_for_train_pickle = r'input_data\Documentation_data-homecage_test01_95shuffle1.pickle'

    # video display magnification factor
    mag_factor = 1
    # set window size and position. win_y_len_axis is only for x-axis window.
    window_geo = {'win_x_len': 1000, 'win_y_len': 100, 'win_y_len_axis': 30,
                  'win_x_origin': 0, 'win_y_origin': 0}

    el.start(inferred_video=inferred_video, inferred_h5=inferred_h5,
          labeled_h5=labeled_h5, labeled_for_train_pickle=labeled_for_train_pickle,
          mag_factor=mag_factor, window_geo=window_geo, plot_type='')

#########################################################
video resolution: (960, 1280, 3)
total frame number: 18000
#########################################################
#########################################################
## The ratio of Nan to the entire video frames. (total:  18000  frames)
sub1snout:     1.00
sub1leftear:     1.00
sub1rightear:     1.00
sub1tailbase:     1.00
sub2snout:     1.00
sub2leftear:     1.00
sub2rightear:     1.00
sub2tailbase:     1.00
#########################################################
Reading _rpicam-01_1819_20210723_102136_track_freeze.csv
Writing _rpicam-01_1819_20210723_102136_track_freeze.csv
Writing _rpicam-01_1819_20210723_102136_freeze.csv


---

# Step3. Check index and column and merge labeled h5 files for re-training

#### 1. Read labeled coordinate h5 files and process them

**extracted labels**

In [24]:
import pandas as pd
import numpy as np
import os

extracted_labels = '20220607-202525-extracted/extracted.h5'
df_ext = pd.read_hdf(extracted_labels)
df_ext

FileNotFoundError: File 20220607-202525-extracted/extracted.h5 does not exist

**previous labels**

In [25]:
previous_labels = r'W:\wataru\dlc_data\homecage_test01-wi-2022-06-03\labeled-data\rpicam-01_1806_20210722_212134\3_iteration-0\CollectedData_wi.h5'
df_pre = pd.read_hdf(previous_labels)
df_pre

Unnamed: 0_level_0,Unnamed: 1_level_0,scorer,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi
Unnamed: 0_level_1,Unnamed: 1_level_1,individuals,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub2,sub2,sub2,sub2,sub2,sub2,sub2,sub2
Unnamed: 0_level_2,Unnamed: 1_level_2,bodyparts,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase
Unnamed: 0_level_3,Unnamed: 1_level_3,coords,x,y,x,y,x,y,x,y,x,y,x,y,x,y,x,y
labeled-data,rpicam-01_1806_20210722_212134,img00053.png,918.000,381.000,986.000,322.000,912.000,292.000,934.187,177.223,1054.000,408.000,,,1034.000,329.000,1025.000,234.000
labeled-data,rpicam-01_1806_20210722_212134,img00266.png,,,101.000,295.000,158.000,243.000,168.000,459.000,849.000,203.000,908.000,216.000,914.147,177.672,1015.000,337.000
labeled-data,rpicam-01_1806_20210722_212134,img00276.png,,,91.000,327.000,133.000,261.000,181.000,457.000,850.939,207.798,902.420,228.776,912.733,181.224,1014.955,334.889
labeled-data,rpicam-01_1806_20210722_212134,img00305.png,,,,,,,180.000,455.000,845.321,205.037,903.844,229.779,914.574,179.877,1015.088,335.446
labeled-data,rpicam-01_1806_20210722_212134,img00334.png,,,133.000,638.000,,,135.000,364.000,844.367,202.776,902.947,234.878,914.641,180.656,1014.526,336.456
labeled-data,rpicam-01_1806_20210722_212134,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
labeled-data,rpicam-01_1806_20210722_212134,img17492.png,281.519,944.115,213.000,860.000,,,80.000,558.000,808.439,525.219,874.485,481.606,796.498,484.972,835.000,189.000
labeled-data,rpicam-01_1806_20210722_212134,img17515.png,,,118.000,593.000,119.000,424.000,591.000,669.000,1003.110,539.031,1003.799,574.080,1049.960,568.630,1015.000,787.000
labeled-data,rpicam-01_1806_20210722_212134,img17842.png,,,,,,,171.000,732.000,1001.528,509.651,987.000,546.000,1046.000,555.000,991.154,765.699
labeled-data,rpicam-01_1806_20210722_212134,img17860.png,,,,,,,112.000,721.000,859.036,468.383,843.208,526.852,907.415,497.159,985.000,747.000


In [28]:
df_pre.columns.levels[0][0]

'wi'

**adjust index or column if necessary**<BR>
    Usually, you may need for video_name or scorer

In [5]:
def change_index_column(_df,scorer, video):
    # set multi-index column
    _a = _df.columns.levels[0].str.replace(list(_df.columns.levels[0])[0], scorer)
    _df.columns = _df.columns.set_levels(_a, level=0)

    _b = _df.index.levels[1].str.replace(list(_df.index.levels[1])[0], video)
    _df.index = _df.index.set_levels(_b, level=1)
    
    return _df

video_name = 'rpicam-01_1819_20210723_102136'
scorer = 'wi'

df_pre = change_index_column(df_pre,scorer, video_name)
df_pre

Unnamed: 0_level_0,Unnamed: 1_level_0,scorer,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi
Unnamed: 0_level_1,Unnamed: 1_level_1,individuals,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub2,sub2,sub2,sub2,sub2,sub2,sub2,sub2
Unnamed: 0_level_2,Unnamed: 1_level_2,bodyparts,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase
Unnamed: 0_level_3,Unnamed: 1_level_3,coords,x,y,x,y,x,y,x,y,x,y,x,y,x,y,x,y
labeled-data,rpicam-01_1819_20210723_102136,img00053.png,918.000,381.000,986.000,322.000,912.000,292.000,934.187,177.223,1054.000,408.000,,,1034.000,329.000,1025.000,234.000
labeled-data,rpicam-01_1819_20210723_102136,img00266.png,,,101.000,295.000,158.000,243.000,168.000,459.000,849.000,203.000,908.000,216.000,914.147,177.672,1015.000,337.000
labeled-data,rpicam-01_1819_20210723_102136,img00276.png,,,91.000,327.000,133.000,261.000,181.000,457.000,850.939,207.798,902.420,228.776,912.733,181.224,1014.955,334.889
labeled-data,rpicam-01_1819_20210723_102136,img00305.png,,,,,,,180.000,455.000,845.321,205.037,903.844,229.779,914.574,179.877,1015.088,335.446
labeled-data,rpicam-01_1819_20210723_102136,img00334.png,,,133.000,638.000,,,135.000,364.000,844.367,202.776,902.947,234.878,914.641,180.656,1014.526,336.456
labeled-data,rpicam-01_1819_20210723_102136,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
labeled-data,rpicam-01_1819_20210723_102136,img17492.png,281.519,944.115,213.000,860.000,,,80.000,558.000,808.439,525.219,874.485,481.606,796.498,484.972,835.000,189.000
labeled-data,rpicam-01_1819_20210723_102136,img17515.png,,,118.000,593.000,119.000,424.000,591.000,669.000,1003.110,539.031,1003.799,574.080,1049.960,568.630,1015.000,787.000
labeled-data,rpicam-01_1819_20210723_102136,img17842.png,,,,,,,171.000,732.000,1001.528,509.651,987.000,546.000,1046.000,555.000,991.154,765.699
labeled-data,rpicam-01_1819_20210723_102136,img17860.png,,,,,,,112.000,721.000,859.036,468.383,843.208,526.852,907.415,497.159,985.000,747.000


**Concatenate the two labels if necessary**

In [9]:
def concat_labels(_df1, _df2):
    if list(_df1.index.levels[1])[0] == list(_df2.index.levels[1])[0]:
        _df = pd.concat([_df1, _df2])
        
        # drop old duplicate labels and sort
        _df = _df[~_df.index.duplicated(keep='first')].sort_index(level=2)
        
        merged_h5_path = os.path.join(os.path.split(extracted_labels)[0], 'CollectedData_wi.h5')
        _df.to_hdf(merged_h5_path,key='df_output', mode='w')
        print('saved ', merged_h5_path)
        
        return _df

    else:
        print('video names do not match!')
        
    
        
concat_labels(df_ext, df_pre)

saved  20220607-202525-extracted\CollectedData_wi.h5


Unnamed: 0_level_0,Unnamed: 1_level_0,scorer,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi,wi
Unnamed: 0_level_1,Unnamed: 1_level_1,individuals,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub1,sub2,sub2,sub2,sub2,sub2,sub2,sub2,sub2
Unnamed: 0_level_2,Unnamed: 1_level_2,bodyparts,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase,snout,snout,leftear,leftear,rightear,rightear,tailbase,tailbase
Unnamed: 0_level_3,Unnamed: 1_level_3,coords,x,y,x,y,x,y,x,y,x,y,x,y,x,y,x,y
labeled-data,rpicam-01_1819_20210723_102136,img00053.png,,,326.000,254.000,326.000,254.000,327.0,254.0,339.000,376.000,339.000,376.000,339.000,376.000,339.000,376.000
labeled-data,rpicam-01_1819_20210723_102136,img00266.png,,,101.000,295.000,158.000,243.000,168.0,459.0,849.000,203.000,908.000,216.000,914.147,177.672,1015.000,337.000
labeled-data,rpicam-01_1819_20210723_102136,img00276.png,,,91.000,327.000,133.000,261.000,181.0,457.0,850.939,207.798,902.420,228.776,912.733,181.224,1014.955,334.889
labeled-data,rpicam-01_1819_20210723_102136,img00305.png,,,,,,,180.0,455.0,845.321,205.037,903.844,229.779,914.574,179.877,1015.088,335.446
labeled-data,rpicam-01_1819_20210723_102136,img00334.png,,,133.000,638.000,,,135.0,364.0,844.367,202.776,902.947,234.878,914.641,180.656,1014.526,336.456
labeled-data,rpicam-01_1819_20210723_102136,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
labeled-data,rpicam-01_1819_20210723_102136,img17492.png,281.519,944.115,213.000,860.000,,,80.0,558.0,808.439,525.219,874.485,481.606,796.498,484.972,835.000,189.000
labeled-data,rpicam-01_1819_20210723_102136,img17515.png,,,118.000,593.000,119.000,424.000,591.0,669.0,1003.110,539.031,1003.799,574.080,1049.960,568.630,1015.000,787.000
labeled-data,rpicam-01_1819_20210723_102136,img17842.png,,,,,,,171.0,732.0,1001.528,509.651,987.000,546.000,1046.000,555.000,991.154,765.699
labeled-data,rpicam-01_1819_20210723_102136,img17860.png,,,,,,,112.0,721.0,859.036,468.383,843.208,526.852,907.415,497.159,985.000,747.000


#### 2. Copy CollectedData_xx.h5 and png image files to `(dlc_root)/(video)/labeled-data`
#### 3. Go to re-training

---

# Debugging for Dalton's training dataset 

# Re-construct labeled dataframe from the pickle file 

In [148]:
# read the pickle file
train_pickle_path = r'W:\dalton\dlc_data\homecage_test01-DJ-2022-06-07\training-datasets\iteration-0\UnaugmentedDataSet_homecage_test01Jun7\Documentation_data-homecage_test01_95shuffle1.pickle'

train_pickle = pd.read_pickle(train_pickle_path)

# extract index
tuples = [train_pickle[0][frame_id]['image'] for frame_id in range(len(train_pickle[0]))]
indexNames = pd.MultiIndex.from_tuples(tuples)

# column labels
scorer = 'wi'

col0 = [scorer]
col1 = ['sub1', 'sub2']
col2 = ['snout', 'leftear', 'rightear', 'tailbase']
col3 = ['x', 'y']
columnName = pd.MultiIndex.from_product([col0, col1, col2, col3], names=[
                                        "scorer", "individuals", "bodyparts", "coords"])

# extract coordinates as array
subjects = 2
bodyparts = 4

frame_id = 2

stacked_coord = []

for frame_id in range(len(train_pickle[0])):
    # generate nan array
    coord = np.empty(subjects*bodyparts*2)
    coord[:] = np.NaN

    # extract coords from one frame
    one_frame_coords = train_pickle[0][frame_id]['joints']

    # need to care that some coords are missing depending sub and bodypart
    for sub in one_frame_coords.keys():
        for bodypart in range(len(one_frame_coords[sub])):
            _a = one_frame_coords[sub][bodypart]
            _base = int(_a[0])*2 + sub*bodyparts*2
            # print(_base, _a)
            coord[_base:_base+2] = _a[1:3]
            
    if frame_id == 0:
        stacked_coord = coord
    else:
        stacked_coord = np.vstack((stacked_coord, coord))

# assemble dataframe
df_labeled = pd.DataFrame(
    stacked_coord, index=indexNames, columns=columnName)

---