# Hands-on Environmental Sensing I
Aufbau und Charakterisierung eines Low-Cost CO<sub>2</sub> Sensors.

### Beschreibung
In diesem Notebook werden die gemessenen Daten analysiert, grafisch dargestellt und ausgewertet.n
Gehe Schritt für Schritt durch das Notebook und führe die beschriebenen Aufgaben aus.

### Python Library Import
Der erste Schritt besteht darin, Python-Pakete zu importieren, die verwendet werden. Pandas ist für Data Scientists unverzichtbar und wird zum Importieren, Manipulieren und Generieren von Daten verwendet. Außerdem verwenden wir matplotlib zum Plotten der Daten

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

from matplotlib.dates import DateFormatter

%matplotlib inline

### Data import from the database
Here we access the database and query all data available. With more advanced SQL queries we could also add some filters or sorting. However, since we don't have huge amounts of data, this is not necessary.

In [None]:
# load data from json file
filename = 'sensor_data.json'

# Read JSON lines into DataFrame
sensor_data = pd.read_json("data/" + filename, lines=True)

# Convert timestamp to datetime format and set it as index
sensor_data['timestamp'] = pd.to_datetime(sensor_data['timestamp'])
sensor_data.set_index('timestamp', inplace=True)

sensor_data.head()

In [None]:
# print datatypes of the columns
sensor_data.dtypes

## Grundlegende Informationen und Statistiken über die Daten  

Wir haben die Daten nun aus dem File importiert und können im Notebook damit arbeiten.  

Mit `df.head()` kannst du dir schnell die ersten Zeilen eines DataFrames anzeigen lassen, um einen ersten Überblick zu erhalten. Jetzt werden wir die Daten etwas genauer untersuchen und:  

- Sie auf den relevanten Zeitraum eingrenzen  
- Den arithmetischen Mittelwert berechnen  
- Die Varianz berechnen

In [None]:
# trimeframe of the data
print(f'Die ersten Daten sind vom: {sensor_data.timestamp.min()}')
print(f'Die letzten Daten sind vom: {sensor_data.timestamp.max()}')

# TODO: Choose the timeframe of interest with the loc operator. Perform it for the dataframe.
#       Example: co2_sel = sensor_data.loc['YYYY-MM-DD H:M:S':'YYYY-MM-DD H:M:S']

In [None]:
# You can use df['column_name'].mean() to calculate the average of a dataframe column.  

# TODO: Apply .mean() to the 'value' column of the dataframes (e.g. no2['value']). 
#       You can also try out other functions such as: .sum(), .std() etc...

---
## Apply conversion factors
As described in the script, we need to apply conversion factors to convert the ppm value to µg/m^3. This is important, since the unit of the official limits is µg/m^3. 

In [None]:
# Convert the value from ppm to µg/m^3 and store it in a new column. 

# TODO: Look for the conversion factors in your course script and apply them to all values in the value 
#       column of the dataframe. Store the converted values in a new column. Investigate the converted
#       table with df.head()
# Hint: With df[NewColumn] = df[OldColumn] * value you can easily convert the values; 1 ppm = 1000 ppb 


---
## Plot the data

Now it is time to plot the values of each sensor and investigate the timeseries. Below you can find an example plot of the no2 values, the measured temperature and humidity. Find a good way to investigate the data, play around with scales, colors and plottypes.<br>
You can find example plots and descriptions [here](https://matplotlib.org/stable/plot_types/index.html). 

In [None]:
#value plot
fig, ax = plt.subplots(3, 1, figsize = (16,9), sharex = True)

ax[0].plot(no2["value"])
ax[1].plot(co["value"])
ax[2].plot(o3["value"])

hh_mm = DateFormatter('%H:%M')
ax[2].xaxis.set_major_formatter(hh_mm)

## Plot temperature and humidity in same timeframe

Using the example above, plot the temperature and Humidity in the same timeframe. 

In [None]:
# TODO: Plot temperature and humidity in the same timeframe
# Hint: Use the example above; The temperature and humidity in all dataframes are the same, 
#       therefore you only need to print them once. 



---
## Interpretation of the data
 What can you deduce from the graphs above?
 Do you think that the mean is a representative value considering the plot? 
 Are the sensors suited for the shown evaluation?

## Resampling of the data

With the very simple function df.resample('resample_time') you can calculate groups of data that fit a specific timeframe. E.g. df.resample('M').mean() returns a dataframe holding the monthly average of the data in df. We will now use this function to calculate 5min averaged and median values. 

In [None]:
# TODO: Use df.resample() to calculate a new dataframe holding the 5min average and median values. 
#       Plot the resampled values. 

