# CAPUEE 2019: Data Visualization Using Python

## Introduction
Until now, we have worked with Arduino, Raspberry Pi, APIs, python, with a common objective: Retrieve data for a specific objective. Using data, we can observe how is the behavior of the load we are controlling and we can schedule its operation according to some specific signals, such as light intensity, market prices, time, etc. 

A useful way to see that the load is behaving in the way we want is by using data visualization tools. Data visualization has become a key activity in companies to extract conclusions and define the next steps of the company.

In this session, we will use a wind turbine dataset, using data shared by DTU at the following DOIs: 10.11583/DTU.7856891 and 10.11583/DTU.7856888. We will have historical observations of V52 Wind turbine. We will work with historical observations, using a csv file. 

![V52 Wind Turbine](./images/v52turbine_3.jpg)

We will use different Python libraries to visualize these data. Specifically, the libraries used in this lab session will be matplotlib, plotly and Dash. 

## Importing Libraries



In [3]:
# data processing
import pandas as pd 
# numerical library 
import numpy as np
# timer, dates
import datetime
# data visualization libraries
import matplotlib.pyplot as plt 

## Loading Data

In [23]:
df = pd.read_csv('./data/V52_ExtensiveData.csv', sep='\t', skiprows=12)

## First look at the dataset

In [24]:
df.head()

Unnamed: 0,Date,Wsp_44m,Wdir_41m,ActPow,RePow,ActPow_std,Wsp_44m_std,Wdir_41m_std,stability
0,201801010000,4.71803,200.743,64.6673,0.00086,33.4251,0.566131,6.4573,1.0
1,201801010010,5.441,201.768,70.8152,-0.000657,26.3829,0.765691,6.6694,1.0
2,201801010020,5.32178,197.962,80.8037,-0.000617,30.2002,0.603442,6.99113,1.0
3,201801010030,5.95325,204.606,86.1123,-0.00237,43.1192,0.872915,5.47062,1.0
4,201801010040,6.17765,204.398,110.857,0.001033,29.9507,0.55016,4.93713,1.0


In [25]:
df.shape

(52241, 9)

In [26]:
df.dtypes

Date              int64
Wsp_44m         float64
Wdir_41m        float64
ActPow          float64
RePow           float64
ActPow_std      float64
Wsp_44m_std     float64
Wdir_41m_std    float64
stability       float64
dtype: object

Questions for you: 

1. Are all the columns in the right format? 
2. Do we have to change any of the types? 
3. Are the columns' name useful for us? What's the information they are trying to tell us? 

## Changing columns' type

In [27]:
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d%H%M')

In [28]:
df.head()

Unnamed: 0,Date,Wsp_44m,Wdir_41m,ActPow,RePow,ActPow_std,Wsp_44m_std,Wdir_41m_std,stability
0,2018-01-01 00:00:00,4.71803,200.743,64.6673,0.00086,33.4251,0.566131,6.4573,1.0
1,2018-01-01 00:10:00,5.441,201.768,70.8152,-0.000657,26.3829,0.765691,6.6694,1.0
2,2018-01-01 00:20:00,5.32178,197.962,80.8037,-0.000617,30.2002,0.603442,6.99113,1.0
3,2018-01-01 00:30:00,5.95325,204.606,86.1123,-0.00237,43.1192,0.872915,5.47062,1.0
4,2018-01-01 00:40:00,6.17765,204.398,110.857,0.001033,29.9507,0.55016,4.93713,1.0


## Exploratory Data Analysis (EDA)

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52241 entries, 0 to 52240
Data columns (total 9 columns):
Date            52241 non-null datetime64[ns]
Wsp_44m         48988 non-null float64
Wdir_41m        48988 non-null float64
ActPow          52103 non-null float64
RePow           52103 non-null float64
ActPow_std      52103 non-null float64
Wsp_44m_std     48988 non-null float64
Wdir_41m_std    48988 non-null float64
stability       52241 non-null float64
dtypes: datetime64[ns](1), float64(8)
memory usage: 3.6 MB
