# Welcome

**Welcome to the first version of the iScape Sensor Analysis tools. The framework is built with the purpose of helping on the calibration process of the sensors on field tests and it aim to be the primary tool for manipulating the sensors data.**

![](https://i.imgur.com/sAOtjv3.jpg)

The current notebook supports data from the iScape Citizen Sensors we provided you but we will add support to integrate the data from existing equipment. Currently all the data is loaded as CSV files but it is also ready to get live data directly from the Smart Citizen API in the future.

The primary goal of the tools is to help us validate the different iScape sensors and calculate their calibration values that later might automatically applied to the data the sensors push online.

_Notice this is currently a single Notebook that should work as a boilerplate to new Notebooks each with each own specific purpose._

[Complete Documentation](https://hackmd.io/KYTghgHArGAMBmBaAxrA7GxAWAjAZjEQhwiQjQDY08ATNYrWAJiA/publish)
[Github Repository](https://github.com/fablabbcn/smartcitizen-iscape-data)

# Get started

Are you new to Jupyter Notebooks? Please, visit the [How to install the framework](https://hackmd.io/s/Hkb-Cw0rb#how-to-install-the-framework) documentation we prepared.

Is this your first time running this Notebook? Then select and run (⇧↩) the cell above to install all the dependencies.

*Are you worried about some existing Python libraries you do not want to upgrade? Check the [documentation]()*

In [None]:
! pip install pytz==2017.2 fileupload==0.1.2 ipywidgets==6.0.0 pandas==0.20.1 numpy==1.12.1 matplotlib==2.0.2 seaborn==0.8.0
! jupyter nbextension install --py fileupload 
! jupyter nbextension enable --py fileupload
! jupyter nbextension install --py widgetsnbextension 
! jupyter nbextension enable --py widgetsnbextension

# Import data

Load the data from your Smart Citizen Kits in CSV format to analyse the data or prepare it for other applications. You can run a single one but to take full advantatge of the functionalities you should load two files. 

_On the next days we will include the possibility to load data from your exisiting sensors in order you can compare them easily with the Kits._

Select and run (⇧↩) the cell above. Then use the interface to select one or more files. You can also select the timezone to localize the data and download a clean and time localized version of the data for further processing in other software.

In [None]:
from IPython.display import display, Markdown, FileLink, FileLinks, clear_output
import io, pytz, os, time, datetime, fileupload
import ipywidgets as widgets
import pandas as pd
import numpy as np

start_date = 0
end_date = 0

def _upload():
    
    _upload_widget = fileupload.FileUploadWidget()
    _tz_widget = widgets.Dropdown(options=pytz.common_timezones, value='UTC', description='Timezone: ')
    _mm_widget = widgets.Checkbox(description='Remove MICS metadata', value=True)

    def _cb(change):
        
        if len(readings) == 2:
            print 'For now we can only process two files at he same time'
            return
        
        # get file
        decoded = io.StringIO(change['owner'].data.decode('utf-8'))
        filename = change['owner'].filename 
        fileData = io.StringIO(change['new'].decode('utf-8'))
        df = pd.read_csv(fileData).set_index('Time')
        
        # prepare dataframe
        df.index = pd.to_datetime(df.index).tz_localize('UTC').tz_convert(_tz_widget.value)
        df.sort_index(inplace=True)
        df = df.groupby(pd.TimeGrouper(freq='2Min')).aggregate(np.mean)
        if _mm_widget.value:
            df.drop([i for i in df.columns if 'heat' in i or 'load' in i or 'Unnamed' in i], axis=1, inplace=True)
        df.columns = [c.split('-', 1)[0] for c in df.columns]
        readings[filename] = df[df.index > '2001-01-01T00:00:01Z']
        if start_date > 0: readings[filename] = df[df.index > start_date]
        if end_date > 0: readings[filename] = df[df.index < end_date]
        listFiles(filename)
    
    # widgets
    _upload_widget.observe(_cb, names='data')
    _hb = widgets.HBox([_upload_widget, _tz_widget, widgets.HTML(' '),_mm_widget])
    
    display(_hb)

def delFile(b):
    clear_output()
    for d in list(b.hbl.children): d.close()
    readings.pop(b.f)

def describeFile(b):
    clear_output()
    display(readings[b.f].describe())
    
def exportFile(b):
    export_dir = 'exports'
    if not os.path.exists(export_dir): os.mkdir(export_dir)
    savePath = os.path.join(export_dir, b.f+'_clean_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%dT%H:%M:%S')
+'.csv')
    if not os.path.exists(savePath):
        readings[b.f].to_csv(savePath, sep=",")
        display(FileLink(savePath))
    else:
        display(widgets.HTML(' File Already exists!'))
    
def listFiles(filename):
    clear_output()
    temp = list(fileList.children)
    cb = widgets.Button(icon='close',layout=widgets.Layout(width='30px'))
    cb.on_click(delFile)
    cb.f = filename
    eb = widgets.Button(description='Export processed CSV', layout=widgets.Layout(width='180px'))
    eb.on_click(exportFile)
    eb.f = filename
    sb = widgets.Button(description='describe', layout=widgets.Layout(width='80px'))
    sb.on_click(describeFile)
    sb.f = filename
    hbl = widgets.HBox([cb, widgets.HTML(' <b>'+filename+'</b> \t'), sb, eb])
    cb.hbl = hbl
    eb.hbl = hbl
    temp.append(hbl)
    fileList.children = temp

readings = {}
display(widgets.HTML('<hr><h3>Select CSV files</h3>'))
_upload()
fileList = widgets.VBox([widgets.HTML('<hr>')])
display(fileList)



# Plot

This allows you to plot your data over time by sensor and device. On the next iterations you will be able to select the time range you want. The code will be helpful for you to cutomize any dedicated plots. 

Select and run (⇧↩) the cell above. Then use the interface to select what data to display.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(color_codes=True)

%matplotlib inline
matplotlib.style.use('seaborn-whitegrid')

# remap values to a 0-1 range
remap = False

toshow = []

def show_sensors(Source):
    _sensor_drop.options = [s for s in list(readings[Source].columns)]
    _sensor_drop.source = Source

def clear_all(b):
    clear_output()
    del toshow[:]
        
def add_sensor(b):
    clear_output()
    d = [_sensor_drop.source, _sensor_drop.value]
    if d not in toshow: toshow.append(d)
    
    plot_data = readings[toshow[0][0]].loc[:,(toshow[0][1],)]
        
    if len(toshow) > 1:
        for i in range(1, len(toshow)):
            plot_data = pd.merge(plot_data, readings[toshow[i][0]].loc[:,(toshow[i][1],)], left_index=True, right_index=True)
    
    if remap:
        plot_data = plot_data.sub(plot_data.min())
        plot_data = plot_data.div(plot_data.max())
        
    changed = []
    for i in range(len(plot_data.columns)):
        changed.append(toshow[i][0] + ' - '+ plot_data.columns[i])
    plot_data.columns = changed
    sns.set(font_scale=1.4)  
    ax = plot_data.plot(figsize=(14, 8), linewidth=2, alpha=0.75)
    ax.legend(bbox_to_anchor=(1, 1), loc=4)
    ax.grid(which='both')                                                                                           
    ax.grid(which='minor', alpha=1)                                                
    ax.grid(which='major')
    
    if remap: plt.ylim(0, 1.05)
    

layout=widgets.Layout(width='350px')
_kit = widgets.Dropdown(options=[k for k in readings.keys()], layout=layout)
_kit_drop = widgets.interactive(show_sensors, Source=_kit, layout=layout)
_sensor_drop = widgets.Dropdown(layout=layout)
_b_add = widgets.Button(description='Add', layout=widgets.Layout(width='100px'))
_b_add.on_click(add_sensor)
_b_reset = widgets.Button(description='Clear all', layout=widgets.Layout(width='100px'))
_b_reset.on_click(clear_all)
_sensor_box = widgets.HBox([_sensor_drop, _b_add, _b_reset])
root_box = widgets.VBox([_kit_drop, _sensor_box])
display(root_box)

# Cross sensors interferances

This module displays a simple correlation table per device that might be helpful to have an overview to possible cross sensors interferances.

Select and run (⇧↩) the cell above. Then use the interface to select what data to display.

In [None]:
def paint(Source):
    clear_output()
    fig, ax = plt.subplots(figsize=(9, 6))
    sns.set(font_scale=1.3)
    sns.heatmap(readings[Source].corr(), annot=True, fmt='.2f', linewidths=.5, ax=ax, cmap=sns.color_palette("Blues"))
    plt.show()

_kit = widgets.Dropdown(options=[k for k in readings.keys()], layout=layout)
_kit_drop = widgets.interactive(paint, Source=_kit, layout=layout)
display(_kit_drop)

This gives a more detailed correlation overview and can be helpful as a first overseeing before the correlation module.

Select and run (⇧↩) the cell above. Then use the interface to select what data to display.

In [None]:
def paint(Source):
    clear_output()
    sns.set(font_scale=1.4)
    g = sns.PairGrid(readings.values()[0])
    g = g.map(plt.scatter)

_kit = widgets.Dropdown(options=[k for k in readings.keys()], layout=layout)
_kit_drop = widgets.interactive(paint, Source=_kit, layout=layout)
display(_kit_drop)

# Sensor Correlations

This module is the core module towards understanding cross device correlation and determination. This currently allows you to look at the determination across two different devices but in the upcoming days it might support to correlate the data against your reference equipment.

This work is specially important for the MICS sensors since they all will require and onsite calibration process.

Select and run (⇧↩) the cell above. Then use the interface to select what data to display.

In [None]:
def redraw(b):
    mergedData = pd.merge(readings[A_kit.value].loc[:,(A_sensors.value,)], readings[B_kit.value].loc[:,(B_sensors.value,)], left_index=True, right_index=True, suffixes=('_'+A_kit.value, '_'+B_kit.value))
    clear_output()
    
    #jointplot
    df = pd.DataFrame()
    A = A_sensors.value + ' - ' + A_kit.value
    B = B_sensors.value + ' - ' + B_kit.value
    df[A] = mergedData.iloc[:,0]
    df[B] = mergedData.iloc[:,1]
    sns.set(font_scale=1.3)
    sns.jointplot(A, B, data=df, kind="reg", color="b", size=12, scatter_kws={"s": 80});
    
    pearsonCorr = list(df.corr('pearson')[list(df.columns)[0]])[-1]
    print 'Pearson correlation coefficient: ' + str(pearsonCorr)
    print 'Coefficient of determination R²: ' + str(pearsonCorr*pearsonCorr)

    # plot
    fig = plt.figure(figsize=(15, 15))
    ax = fig.add_subplot(2,1,1)
    ax.plot(df, linewidth=2, alpha=0.9)
    ax.legend(list(df.columns))

    # Rolling correlation
    roll = mergedData.iloc[:,0].rolling(12).corr(mergedData.iloc[:,1])
    ax1 = fig.add_subplot(2,1,2,sharex=ax)
    ax1.plot(roll, linewidth=2, alpha=0.9)
    plt.ylim([-1.1,1.1])
    ax1.axhline(0, color='red', alpha=0.35)
    
if len(readings) < 1: print "Please load some data first..."
else:
    layout=widgets.Layout(width='300px')
    b_redraw = widgets.Button(description='Redraw', layout=widgets.Layout(width='180px'))
    b_redraw.on_click(redraw)
    A_kit = widgets.Dropdown(options=[k for k in readings.keys()], layout=layout)
    A_sensors = widgets.Dropdown(options=[s for s in list(readings.values()[0].columns)], layout=layout)
    B_kit = widgets.Dropdown(options=[k for k in readings.keys()],  value=readings.keys()[0], layout=layout)
    B_sensors = widgets.Dropdown(options=[s for s in list(readings.values()[0].columns)], layout=layout)
    kit_box = widgets.HBox([A_kit, widgets.HTML('<h4><< Data source selection >></h4>') , B_kit], layout=widgets.Layout(justify_content='space-between'))
    sensor_box = widgets.HBox([A_sensors, widgets.HTML('<h4><< Sensor selection >></h4>') , B_sensors], layout=widgets.Layout(justify_content='space-between'))
    root_box = widgets.VBox([b_redraw, kit_box, sensor_box])
    display(root_box)
    redraw(b_redraw)