# Notebook Objective 
Load, visualize, and understand the dataset

# Dataset Details

All the sensors are re-sampled to the same 15 minute interval 

__Internal Sensors Capturing Conditions in the Spheres__
- 4 Sensors for [Carbon Dioxide](https://en.wikipedia.org/wiki/Carbon_dioxide) CO2 [ ppm ]  
  - *column names: co2_1, co2_2, co2_3, co2_4*
  
  
- 4 Sensors for [Temperature](https://en.wikipedia.org/wiki/Temperature) [ degrees F ]
  - *column names: temp_1, temp_2, temp_3, temp_4*
  
  
- 4 Sensors for [Dewpoint Sensors](https://en.wikipedia.org/wiki/Dew_point) [ degrees F ]
  - *column names: dew_1, dew_2, dew_3, dew_4*


- 4 Sensors for [Relative Humidity](https://en.wikipedia.org/wiki/Relative_humidity) [ % ]
  - *column names: relH_1, relH_2, relH_3, relH_4*
  
__External Sensors Capturing Conditions in Downtown Seattle__
- 1 Sensor for External Temperature
  - *column name: externTemp_1*


- 1 Sensor for External Humidity 
  - *column name: externHumid_1*


- 1 Sensor for External Condition 
  - *column name: externCondition_1*
  - values: 0 = 'clear', 1 = 'cloudy', 2 = 'flurries', 3 = 'ice', 4 = 'partlyCloudy', 5 = 'rain', 6 = 'showers', 7 = 'snow', 8 = 'thunderstorms'



- 1 Sensor for External Sunrise 
  - *column name: externSunrise_1*
  - values: 0 = sun is set, 1 = sun is out



In [None]:
import matplotlib.pylab as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Unpack Dataset(s) 

In [None]:
!unzip data.zip

In [None]:
!cd data && ls

# Load Raw Data

In [None]:
dataset = pd.read_csv( index_col = 0,  parse_dates = True, infer_datetime_format = True, 
                       filepath_or_buffer = './data/2018-01-01__2019-01-01__NConservatory__allMerged.csv')

# localize time-index to PST [ the native timezone of the Spheres ]
dataset.index = pd.to_datetime( dataset.index, utc=True ).tz_convert('America/Los_Angeles')

In [None]:
externConditionLabelEncoder = LabelEncoder();
externConditionLabelEncoder.fit(['clear', 'cloudy', 'partlyCloudy', 'rain', 'showers', 'thunderstorms', 'ice', 'flurries', 'snow']);

# Explore Data

In [None]:
dataset.head()

In [None]:
dataset.index

In [None]:
dataset['co2_1'][0:10]

In [None]:
dataset['co2_1'][0:10].values

# Visualize Data

### Plotting parameters 
Make plots interactive & reduce whitespace in margins
For more options refer to 
- http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure
- https://matplotlib.org/gallery/style_sheets/style_sheets_reference.html

In [None]:
plt.rcParams['figure.figsize'] = [ 20, 15 ]
plt.rcParams['figure.subplot.left'] = plt.rcParams['figure.subplot.bottom'] = .1
plt.rcParams['figure.subplot.right'] = plt.rcParams['figure.subplot.top'] = .9

### Visualize entire dataset 

In [None]:
dataset.plot(subplots=True);

### Selecting a subset of sensors to plot [ co2 ]

In [None]:
dataset[['co2_1', 'co2_2', 'co2_3', 'co2_4']].plot(subplots=True);

### First week of 2018 [ co2, temp, dew, relH ]
This cell uses a more explicit/fine-grained form of plotting using matplotlib syntax

In [None]:
#plt.figure( figsize = (15,15) )

plt.subplot(4,1,1)
plt.title('co2')
plt.plot( dataset['co2_1'][0:7*96], '-xg')

plt.subplot(4,1,2)
plt.title('temperature')
plt.plot( dataset['temp_1'][0:7*96], '-ob') 

plt.subplot(4,1,3)
plt.title('dewpoint')
plt.plot( dataset['dew_1'][0:7*96], '-ok') 

plt.subplot(4,1,4)
plt.title('relative humidity')
plt.plot( dataset['relH_1'][0:7*96], '-m.') 

plt.show()

### Closer look at temperature [ external and internal over the course of the first week ]

In [None]:
dataset[['externTemp_1', 'temp_1', 'externCondition_1']][0:96*7].plot(subplots=True);

### First week's external weather conditions

In [None]:
externConditionLabelEncoder.inverse_transform(dataset[ 'externCondition_1'][0:96*7])

### Histogram of External Weather Conditions

In [None]:
plt.figure(figsize=(15,10));

nPossibleConditons = len(externConditionLabelEncoder.classes_)

dataset['externCondition_1'].hist(bins=nPossibleConditons-1, align='left', rwidth=1);

plt.gca().set_xticklabels([''] + list(externConditionLabelEncoder.inverse_transform(list(range(nPossibleConditons)))));

plt.title('prevalence of external conditions'); plt.ylabel('# of 15 minute instances');