### Sensor data cleaning and visualization using plotly

This notebook contains codes on how to read data properly, make some cleaning and visualize using the plotly package.


First, I will start by importing neccesary packages.

In [157]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [158]:
%matplotlib inline

Now, I am going read the data from local disk using pandas `read_table() ` function. The data contains sensor reading taken every 1 min. The data contains more than 300k rows, and I am intersted to read only the rows after 250k hence I can do that using the `skiprows ` option. 

In [165]:
sensor_df = pd.read_table('GS Sensor Data 093018.txt', squeeze=True, header=0, skiprows=250000,
                           parse_dates=[0], infer_datetime_format=True, low_memory=False, names=['Time','Stage','DO', 'pH','NH4','NO3','MFC'])
dim = sensor_df.shape
dim

(57078, 7)

In [166]:
sensor_df.head()

Unnamed: 0,Time,Stage,DO,pH,NH4,NO3,MFC
0,3/20/2019 9:47:38 PM,Anaerobic,0.13,8.09,29.25,1.02,0.0
1,3/20/2019 9:48:38 PM,Anaerobic,0.13,8.06,29.5,1.02,0.0
2,3/20/2019 9:49:38 PM,Anaerobic,0.13,8.09,29.2,1.02,0.0
3,3/20/2019 9:50:38 PM,Anaerobic,0.13,8.12,29.15,1.02,0.0
4,3/20/2019 9:51:38 PM,Anaerobic,0.13,8.13,29.1,1.07,0.0


Let us check the data types 

In [167]:
sensor_df.dtypes

Time     object
Stage    object
DO       object
pH       object
NH4      object
NO3      object
MFC      object
dtype: object

As we can see the data types are not properly inferred, so we need to change the data types manually.

#### Covert to datatime format

Here I will use the pandas `to_datetime()' function to convert the data column to a proper datatime format so that the data can be used as a timeseries. We can specify the date format, but the function can automatically infer the format if we set the infer_datetime_format to True.

In [168]:
tm = sensor_df.loc[:,'Time']
sensor_df.loc[:,'Time'] = pd.to_datetime(tm, infer_datetime_format=True,
                                          errors='coerce')

#### Convert to numeric values

The columns from 3 to 7 in the the dataframe are numeric, hence we can convert it to numeric and coerce any errors it encouters as shown below.

In [169]:
df_2_7 = sensor_df.iloc[:,2:7]
sensor_df.iloc[:,2:7] = df_2_7.apply(pd.to_numeric, axis=1, errors='coerce')

#### Convert to string format

The second column contain the string data tpye and can be converted as shown below.

In [170]:
stage = sensor_df['Stage']
sensor_df['Stage'] = stage.astype(str)

Let us set the time column as index.

In [171]:
sensor_df.set_index('Time',inplace=True)

In [172]:
sensor_df.tail()

Unnamed: 0_level_0,Stage,DO,pH,NH4,NO3,MFC
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-30 19:55:48,Anaerobic,0.0,7.51,0.16,0.25,0.0
2019-04-30 19:56:48,Anaerobic,0.0,7.56,0.16,0.25,0.0
2019-04-30 19:57:48,Anaerobic,0.0,7.64,0.11,0.25,0.0
2019-04-30 19:58:48,Anaerobic,0.0,7.57,0.11,0.25,0.0
2019-04-30 19:59:48,Anaerobic,0.0,7.66,0.16,0.25,0.0


### Visualize using plotly

Now, we can use the clean data and produce a ploty using `plotly` package.

In [137]:
p = figure(title="Sensor data plot", x_axis_label='time', y_axis_label='y',
          plot_width=900, plot_height=450, x_axis_type="datetime")

# Setting the second y axis range name and range
p.extra_y_ranges = {"ph": Range1d(start=5, end=9)}

# Adding the second axis to the plot.  
p.add_layout(LinearAxis(y_range_name="ph"), 'right')

p.line(s_sensor_df.index, s_sensor_df['Ammonium Concentration'], line_width=2, legend='NH4')
p.line(s_sensor_df.index, s_sensor_df['MFC Voltage'], line_width=2, legend='Air', line_color="red")
p.line(s_sensor_df.index, s_sensor_df['pH'], line_width=2, legend='pH', line_color="green",
      y_range_name="ph")

show(p)

In [14]:
#from bokeh.plotting import figure, output_notebook, show
#from bokeh.models import LinearAxis, Range1d

#output_notebook()