# Envirosensor Data Cleaning

This notebook demonstrates the the initial cleaning of the ICRI Envirosensor data from the Here East Digital Twin. This data was aggregated using the IBM IoT Watson platform in 2018.

The ICRI Envirosensor Data is included in the project's [GitHub repository](https://github.com/virtualarchitectures/ICRI_Envirosensor_Data_Analysis), but is also available for download from the IEEE data portal under Creative Commons license: https://ieee-dataport.org/open-access/icri-envirosensor-data

NOTE: The data being cleaned here was downloadeded from the IBM IoT Watson platform archive for the Here East Digital Twin in early 2019. In addition to the original sensor payloads uploaded to IBM's platform, the downloaded data includes metadata added by the platform.

## Import dependencies

In [1]:
import pandas as pd
import sys

# access locally hosted modules in this GitHub repository
sys.path.append("..")
from src.envirosensor import clean_envirosensor_data

### Notebook configuration

In [2]:
# set pandas display option for floating point numbers
pd.set_option("float_format", "{:.2f}".format)

## Clean the Envirosensor data

In [3]:
df = clean_envirosensor_data()

Cleaning Envirosensor data...
Processing iotp_kfb22t_envirosensor_2018-w19.json...
Processing iotp_kfb22t_envirosensor_2018-w20.json...
Processing iotp_kfb22t_envirosensor_2018-w21.json...
Processing iotp_kfb22t_envirosensor_2018-w22.json...
Processing iotp_kfb22t_envirosensor_2018-w23.json...
Processing iotp_kfb22t_envirosensor_2018-w24.json...
Processing iotp_kfb22t_envirosensor_2018-w25.json...
Processing iotp_kfb22t_envirosensor_2018-w26.json...
Processing iotp_kfb22t_envirosensor_2018-w27.json...
Processing iotp_kfb22t_envirosensor_2018-w28.json...
Processing iotp_kfb22t_envirosensor_2018-w29.json...
Processing iotp_kfb22t_envirosensor_2018-w30.json...
Processing iotp_kfb22t_envirosensor_2018-w31.json...
Processing iotp_kfb22t_envirosensor_2018-w32.json...
Processing iotp_kfb22t_envirosensor_2018-w33.json...
Processing iotp_kfb22t_envirosensor_2018-w34.json...
Processing iotp_kfb22t_envirosensor_2018-w35.json...
Processing iotp_kfb22t_envirosensor_2018-w36.json...
Processing iotp_

## Review cleaned data

In [4]:
df.head()

Unnamed: 0,File,PlatformTime,DeviceID,DeviceType,Event,DeviceTime,TMP,OPT,BAT,HDT,BAR,HDH
0,iotp_kfb22t_envirosensor_2018-w19,2018-05-11 20:53:39.541000+01:00,8015,Envirosensor,event,2018-05-11 20:53:39.181958+01:00,34.44,4.57,34.76,34.62,1012.27,10.01
1,iotp_kfb22t_envirosensor_2018-w19,2018-05-12 11:34:07.641000+01:00,8008,Envirosensor,event,2018-05-12 11:34:07.296419+01:00,34.09,6.25,34.77,34.33,1011.63,15.8
2,iotp_kfb22t_envirosensor_2018-w19,2018-05-10 20:04:44.226000+01:00,8004,Envirosensor,event,2018-05-10 20:04:43.869317+01:00,34.78,6.67,35.05,34.9,1016.71,6.09
3,iotp_kfb22t_envirosensor_2018-w19,2018-05-11 19:06:17.113000+01:00,8007,Envirosensor,event,2018-05-11 19:06:16.760284+01:00,36.47,224.48,36.69,36.63,1012.87,5.81
4,iotp_kfb22t_envirosensor_2018-w19,2018-05-11 10:16:34.357000+01:00,8015,Envirosensor,event,2018-05-11 10:16:34.037201+01:00,33.78,9.16,34.13,33.99,1016.55,10.78


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3258298 entries, 0 to 3258297
Data columns (total 12 columns):
 #   Column        Dtype                        
---  ------        -----                        
 0   File          object                       
 1   PlatformTime  datetime64[ns, UTC+01:00]    
 2   DeviceID      object                       
 3   DeviceType    object                       
 4   Event         object                       
 5   DeviceTime    datetime64[ns, Europe/London]
 6   TMP           float64                      
 7   OPT           float64                      
 8   BAT           float64                      
 9   HDT           float64                      
 10  BAR           float64                      
 11  HDH           float64                      
dtypes: datetime64[ns, Europe/London](1), datetime64[ns, UTC+01:00](1), float64(6), object(4)
memory usage: 298.3+ MB


In [6]:
df.describe()

Unnamed: 0,TMP,OPT,BAT,HDT,BAR,HDH
count,3258176.0,3254638.0,3258276.0,3257881.0,3238076.0,3257037.0
mean,34.68,66.63,35.15,34.92,1018.1,18.17
std,1.11,68.33,1.13,1.1,5.76,5.19
min,21.06,0.0,21.17,21.32,5.94,3.74
25%,34.0,3.68,34.44,34.25,1014.48,14.51
50%,34.72,46.64,35.16,34.95,1017.58,18.3
75%,35.41,117.04,35.93,35.67,1021.74,21.95
max,38.62,380.16,39.05,38.82,1040.0,59.76


In [7]:
df.isna().sum()

File                0
PlatformTime        0
DeviceID            0
DeviceType          0
Event               0
DeviceTime          0
TMP               122
OPT              3660
BAT                22
HDT               417
BAR             20222
HDH              1261
dtype: int64

In [8]:
counts_df = df.groupby("DeviceID").count()
counts_df

Unnamed: 0_level_0,File,PlatformTime,DeviceType,Event,DeviceTime,TMP,OPT,BAT,HDT,BAR,HDH
DeviceID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
8001,202829,202829,202829,202829,202829,202828,202609,202829,202821,202031,202769
8002,211637,211637,211637,211637,211637,211605,211402,211632,211582,209993,211516
8003,197912,197912,197912,197912,197912,197902,197645,197911,197856,196787,197816
8004,204820,204820,204820,204820,204820,204815,204518,204820,204798,203998,204712
8005,149832,149832,149832,149832,149832,149830,149698,149831,149822,148485,149817
8006,206790,206790,206790,206790,206790,206789,206515,206790,206772,205855,206718
8007,107278,107278,107278,107278,107278,107275,107119,107276,107270,106732,107239
8008,50331,50331,50331,50331,50331,50331,50262,50331,50324,50136,50304
8010,205596,205596,205596,205596,205596,205586,205352,205595,205569,204700,205488
8011,205818,205818,205818,205818,205818,205814,205581,205817,205794,204975,205711


## Explore cleaned data

In [10]:
df[df["TMP"].isnull()]

Unnamed: 0,File,PlatformTime,DeviceID,DeviceType,Event,DeviceTime,TMP,OPT,BAT,HDT,BAR,HDH
10820,iotp_kfb22t_envirosensor_2018-w19,2018-05-11 17:19:42.094000+01:00,8003,Envirosensor,event,2018-05-11 17:19:41.802412+01:00,,199.68,35.53,35.30,1014.51,8.84
66386,iotp_kfb22t_envirosensor_2018-w19,2018-05-12 08:32:51.505000+01:00,8019,Envirosensor,event,2018-05-12 08:32:51.138573+01:00,,51.80,34.58,34.71,1012.21,12.46
67169,iotp_kfb22t_envirosensor_2018-w19,2018-05-12 05:26:48.029000+01:00,8003,Envirosensor,event,2018-05-12 05:26:47.750896+01:00,,34.72,34.77,34.62,1012.63,12.86
82979,iotp_kfb22t_envirosensor_2018-w19,2018-05-10 15:38:46.859000+01:00,8016,Envirosensor,event,2018-05-10 15:38:46.566665+01:00,,125.12,34.23,33.79,1016.37,8.03
122020,iotp_kfb22t_envirosensor_2018-w20,2018-05-15 15:42:06.765000+01:00,8011,Envirosensor,event,2018-05-15 15:42:06.448965+01:00,,22.64,33.19,33.01,1017.58,19.87
...,...,...,...,...,...,...,...,...,...,...,...,...
3164316,iotp_kfb22t_envirosensor_2018-w39,2018-09-30 12:27:24.588000+01:00,8012,Envirosensor,event,2018-09-30 12:27:24.252630+01:00,,52.72,35.57,35.22,1018.91,10.13
3166112,iotp_kfb22t_envirosensor_2018-w39,2018-09-25 11:36:09.918000+01:00,8010,Envirosensor,event,2018-09-25 11:36:09.659215+01:00,,64.60,33.29,33.06,1038.74,12.54
3198652,iotp_kfb22t_envirosensor_2018-w40,2018-10-01 20:27:58.492000+01:00,8017,Envirosensor,event,2018-10-01 20:27:58.236787+01:00,,3.60,32.68,32.43,1025.21,9.78
3235011,iotp_kfb22t_envirosensor_2018-w40,2018-10-01 20:17:34.188000+01:00,8003,Envirosensor,event,2018-10-01 20:17:33.880762+01:00,,184.64,35.81,35.61,1025.91,7.08
