# CCTV Rain-Dry Image Prediction Post-processing
by Tio

There is no such perfect prophecy. The possibility of false classification will always be expected to happen regardless the amount of learning our model has taken. Model output will be pass through the process, namely 'Post-processing', so it can be adjusted with our desired condition yet minimizing baseless manipulation. The process will be comprehensively written and explored in this notebook.

In [12]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

## 1. Moving Average

When using moving average, you need to incorporate n previous prediction results into your current prediction post-processing calculation and take the average. It's important for you to set a threshold after getting the average. The threshold would determine whether the corrected prediction is 1 (rain) OR 0 (zero).

<img src="Illustration - MA.png" style="width: 500px;"/>

In [2]:
# define dummy data
data = [[1, 1, 1], [2, 1, 1], [3, 1, 1], [4, 0, 1], [5, 1, 1], [6, 0, 1], [7, 1, 1], [8, 1, 1], 
        [9, 1, 1], [10, 1, 1], [11, 0, 0], [12, 0, 0], [13, 0, 0], [14, 0, 0]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['time', 'pred', 'obs'])
df

Unnamed: 0,time,pred,obs
0,1,1,1
1,2,1,1
2,3,1,1
3,4,0,1
4,5,1,1
5,6,0,1
6,7,1,1
7,8,1,1
8,9,1,1
9,10,1,1


In [3]:
def moving_average(df, ma):
    b = []
    for i in range (0, df.shape[0]):
        if i < ma-1: 
            c = np.nan
        else:
            c = df['pred'][i-(ma-1): i+1].mean()
            if c < 0.5:
                c = 0
            else:
                c = 1
        b.append(c)
    return(b)

In [4]:
ma3 = moving_average(df, 3)
df['ma3'] = ma3
df

Unnamed: 0,time,pred,obs,ma3
0,1,1,1,
1,2,1,1,
2,3,1,1,1.0
3,4,0,1,1.0
4,5,1,1,1.0
5,6,0,1,0.0
6,7,1,1,1.0
7,8,1,1,1.0
8,9,1,1,1.0
9,10,1,1,1.0


Drawbacks using this method:
1. There will always be period of time "sacrificed" for not being corrected (look at the first 2 rows!)
2. Extremely dependent on how many "ma" you set.

## 2. In-between Mode

This method allows you to harness mode of the subset of data as a post-processing correction. You need to determine number of n (number of pre- and post-time you want to incorporate into your mode calculation).

<img src="Illustraltion In-Between mode.png" style="width: 500px;"/>

In [7]:
df2 = df

In [61]:
def ibmode(df, n):
    c = []
    if n%2 == 0 or n==1:
        b = print('n must be an odd number greater than 1')
    else:
        ib = np.floor(n/2).astype(int)
    
    for i in range(0, df.shape[0]):
        if i - ib < 0:
            b = df['pred'][i]
        elif i + ib > df.shape[0]-1:
            b = df['pred'][i]
        else:
            b = stats.mode(df['pred'][i-ib:i+ib+1])[0][0]
        c.append(b)
    
    return(c)

In [65]:
ibm = ibmode(df, 5)
df2['mode5'] = ibm
df2

Unnamed: 0,time,pred,obs,ma3,mode5
0,1,1,1,,1
1,2,1,1,,1
2,3,1,1,1.0,1
3,4,0,1,1.0,1
4,5,1,1,1.0,1
5,6,0,1,0.0,1
6,7,1,1,1.0,1
7,8,1,1,1.0,1
8,9,1,1,1.0,1
9,10,1,1,1.0,1
