# Manual Classification

This script sets up an excel workbook with the images to make it easy to manually label. Loops through RGB images and inserts into excel worksheet, with TOID, grid and column for labelling. 

#### Labels
- 1: CAR present in image
- 2: YES, manmade surface big enough to park, but no car present
- 3: NO parking, too small, too green
- 4: not sure


Got 5 volunteers to label 266 rows each of the resulting excel workbook. I labelled all 1328 images myself and now I have a comparison for each with one other person's label. I will maybe look through again at the images we disagree on. Or use validation my set and other people's set. 


## Libraries

In [2]:
import os
from os import path
import glob
import sys
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

In [7]:
import xlsxwriter

In [56]:
import rasterio
from rasterio._base import gdal_version
#https://rasterio.readthedocs.io/en/latest/
import rasterio.warp
import rasterio.features
from rasterio import plot #essential to plot images in BNG, in correct position, and full RGB 

from rasterio.mask import mask

import pprint

## Create png version of the tif images for writing to excel cells

https://clouds.eos.ubc.ca/~phil/courses/atsc301/html/rasterio_demo.html

Change driver to PNG to covert tif to png files. For each png images a png.aux.xml file is created - presumably this stores the geolocation information. I don't really care about that because I just did this so as to be able to auto insert into Excel. The xlsxwriter method to write images to excel cells does not work for tif files. 

In [57]:
#collect tif picture paths
rgb_cg_paths = glob.glob(
    '../jigsaw_output/rgb_gdn_cropped/*.tif', recursive=True)

In [95]:
for tif_filename in rgb_cg_paths:
    
    #get grid and TOID from tif_filname
    grid = tif_filename[-27:-21]
    toid = tif_filename[-20:-4]
    #set png_filename
    png_filename = '../validation/png_image/' + grid + '_' + toid + '.png'
    
    with rasterio.open(tif_filename) as infile:
        profile=infile.profile
        # change the driver name from GTiff to PNG
        profile['driver']='PNG'
    
        raster=infile.read()
        with rasterio.open(png_filename, 'w', **profile) as dst:
            dst.write(raster)


## Create xlsx workbook and insert png images

This does not work for tif files, hence why I converted them to png above.

https://xlsxwriter.readthedocs.io/example_images.html

https://www.youtube.com/watch?v=3mWqQlYlFlY

In [99]:
#create new workbook & worksheet
wb = xlsxwriter.Workbook("../validation/excel_insert_images_test.xlsx")
ws = wb.add_worksheet()

#resize cells
ws.set_column('D:D', 100) #column for the pics
ws.set_default_row(height = 600)

#Write some data headers.
ws.write('A1', 'Label')
ws.write('B1', 'GRID')
ws.write('C1', 'TOID')
ws.write('D1', 'RGB_25cm_png_image')

#colect png picture paths
images = glob.glob(
    '../validation/png_image/*.png', recursive=True)

#insert images
#need to insert text next to these of the TOID
#and resize images

#start in row 1, col 1 
row = 1 #row 0 is first row for headers
col = 1 #col 0 is first col empty for manual label

for image_path in images:
    
    #get grid and TOID from image_path
    grid = image_path[-27:-21]
    toid = image_path[-20:-4]
    
    #write grid and toid
    ws.write_string(row, col, grid)
    ws.write_string(row, col + 1, toid)
    
    #insert images
    ws.insert_image(row
                    , col + 2
                    , image_path
                    , {
                        'x_scale' : 5, 'y_scale' : 5 #rescale images
                        , 'x_offset' : 5, 'y_offset' : 5 #?
                        , 'positioning' : 2 #see bottom
                    })
    
    row += 1

wb.close()

#Notes
#'positioning' : 1 = move and size with cells
#'positioning' : 2 = move but don't size with cells - want to keep images on same scale as each other

## RESULTS

Used excel to collate the results into one csv file of my labels and others' labels, and to do a label review on the disagreements

- label_tjf: labelled soley by me
- label_others: split into 5 sets of about 266 rows, labelled by colleagues: Andrew Kelly, Lisa Eyers, Amy Pearce, Ellie Page. Nicola George also volunteered but was unable to take part, so I also relabelled her set
- label_review: Looking only at the disagreements between label_tjf and label_others I relabelled about 259 images, and removed any 4's (don't knows), so label_review only contains 1,2,3.


In [22]:
#read file man_lab: manual labels
manLab = pd.read_csv(
    "../validation/manual_labelling/tq1980_labels.csv"
    , dtype = {
        'GRID' : object
        , 'TOID' : object
        , 'label_tjf' : np.int32
        , 'label_others' : np.int32
        , 'label_review' : np.int32

    }
)

In [23]:
manLab

Unnamed: 0,GRID,TOID,label_tjf,label_others,label_review
0,TQ1980,1000001778697865,1,1,1
1,TQ1980,1000001778701813,1,1,1
2,TQ1980,1000001778708189,3,3,3
3,TQ1980,1000001778701807,4,1,2
4,TQ1980,1000001778697871,1,1,1
...,...,...,...,...,...
1323,TQ1980,1000001778697840,2,2,2
1324,TQ1980,5000005138038924,3,3,3
1325,TQ1980,1000001778708184,3,3,3
1326,TQ1980,1000001778697868,2,2,2


## Add agree column to compare my labels to others'

In general about 80% agreement, which is pretty good. Most of the disagreement came from images I had labelled 3 and others labelled 2. During the review I noticed this quite a bit and found that I kept these images as 3. The were mostly too small to admit a car, I think my greater familiarity with the data and the sizes/scale of the images allowed me to better predict by size alone. Also I'm more familiar with spotting the roadside edge. 

I did consider including the roadside edge in the image for labelling but at the time this seemed an extra step that I did not have time for. But this would make things much easier for the human labellers to be able to see access points as well.

I did change some of my original labels to match the others' label, and on occasion, selected a completely different label. I noticed more bins/bike sheds masquerading as cars this time. 

vegetation/non-vegetation in shadow still seems to be the biggest problem.

In [25]:
manLab['tjf_vs_others'] = manLab['label_tjf'] == manLab['label_others']

In [26]:
sum(manLab['tjf_vs_others']) / manLab.shape[0]

0.8049698795180723

In [27]:
manLab.head()

Unnamed: 0,GRID,TOID,label_tjf,label_others,label_review,tjf_vs_others
0,TQ1980,1000001778697865,1,1,1,True
1,TQ1980,1000001778701813,1,1,1,True
2,TQ1980,1000001778708189,3,3,3,True
3,TQ1980,1000001778701807,4,1,2,False
4,TQ1980,1000001778697871,1,1,1,True


In [29]:
pd.crosstab( manLab['label_tjf'], manLab['label_others'])
#biggest difference between my 3 label and others' 2, then my 2 label and others' 3 label.

label_others,1,2,3,4
label_tjf,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,300,16,7,0
2,26,305,43,6
3,4,118,464,30
4,1,8,0,0


In [30]:
pd.crosstab( manLab['label_tjf'], manLab['label_review'])

label_review,1,2,3
label_tjf,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,310,9,4
2,8,349,23
3,0,20,596
4,0,3,6


In [31]:
pd.crosstab( manLab['label_others'], manLab['label_review'])

label_review,1,2,3
label_others,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,308,17,6
2,7,333,107
3,3,25,486
4,0,6,30


## Relabel based on label_review and combine 1&2

- 1: YES parking potential
- 0: NO parking not possible

In [36]:
manLab['label'] = (
    manLab['label_review'].apply(lambda x: 1 if x <= 2 else 0)
)

In [38]:
manLab

Unnamed: 0,GRID,TOID,label_tjf,label_others,label_review,tjf_vs_others,label
0,TQ1980,1000001778697865,1,1,1,True,1
1,TQ1980,1000001778701813,1,1,1,True,1
2,TQ1980,1000001778708189,3,3,3,True,0
3,TQ1980,1000001778701807,4,1,2,False,1
4,TQ1980,1000001778697871,1,1,1,True,1
...,...,...,...,...,...,...,...
1323,TQ1980,1000001778697840,2,2,2,True,1
1324,TQ1980,5000005138038924,3,3,3,True,0
1325,TQ1980,1000001778708184,3,3,3,True,0
1326,TQ1980,1000001778697868,2,2,2,True,1


In [43]:
manLab['label'].value_counts()

1    699
0    629
Name: label, dtype: int64

In [44]:
manLab.to_csv("../validation/manual_labelling/tq1980_labels_2.csv", index=False)