<h1 style="color:green"><b>Machine Learning - getting started with Python</b></h1>

<h2 style="color:green">Background</h2> 
<p>Machine Learning is becoming increasingly important in the renewable energy sector. One such example is in the use of ML to detect when equipment may be failing and is in need of servicing. In this notebook, we are going to work through an example of using Python to read and plot sensor state-of-heath (SOH) data; the initial stages of developing a machine learning model.</p>

<p>The standard Python library contains over 200 packages (pre-written code and functions), which we can use in our scripts. The <i>Python Package Index</i> (PyPI) contains an additional half a million packages, developed and shared by the Python community; many of these directly support the stages of Machine Learning, as shown below:</p>

<div style="text-align: center;">
    <img src="process.jpg" alt="stages of machine learning">
</div>

<p>The data set we will be using contains the input voltage and power of a seismometer. Not only are these parameters crucial for designing solar arrays for power, they can also tell us about the health of the sensor, as we saw in the PowerPoint.</p>

<h2 style="color:green">Aims</h2> 
    - To introduce Python as a ML tool</br>
    - To work through examples of two of the most commonly used Python packages in the early stages of ML: pandas and matplotlib.</br></br>


<b>Pandas</b> - <i>Panel Data / Python data analysis</i>: data structures (series and dataframes) with support to read in data from sources such as Excel or CSV files

<b>Matplotlib</b>: Data visualisation through static, interactive and animated plots

<div class="alert alert-success"> Although this notebook focuses on pandas and matplotlib, there are many others</div>

<h1 style="color:green">1) Import the packages</h1>
The first step when using pre-written functions is to <i>import</i> them.

In [None]:
# import the packages to access all their functions
import pandas as pd

# import a specific module from matplotlib
from matplotlib import pyplot as plt

<h1 style="color:green">2) Using Pandas</h1>
Pandas is a powerful data manipulation package with functions which support data analysis, statistics, data cleaning etc. It also includes support for reading in data of various formats, from a variety of file types.</br></br>

The two main data structures are known as <i>series</i>, which can be thought of as a 1D structure (e.g. a column) and <i>dataFrames</i>, which can be thought of as a 2D structure (e.g. a table with muliple columns). 

We will be using the <i>read_csv</i> function to read the seismometer SOH file, </i>myData1.csv</i>, into a pandas dataFrame:

In [None]:
# read_csv function of the pandas library (imported as 'pd') with the ability to process dates as 'datetime objects'
myData=pd.read_csv('data/myData1.csv', parse_dates=['date'])

In [None]:
# head is a pandas function, which returns the first n entries of a dataframe. n is optional, with 5 being the default.
myData.head(n=7)

<b>Question:</b> Can you think of a way to display the last 7 entries from our dataFrame?

The pandas dataFrame can accept many data formats, and these can be viewed by calling <i>dtypes</i>. Note the format of the date column in the dataFrame we have just created. 

In [None]:
# viewing the data types within our dataframe
myData.dtypes

<h1 style="color:green">3) Using Matplotlib</h1>
Plotting the contents of our pandas dataframe with the plot function, taken from matplotlib:

<b>Simple plot of one column of data</b>

In [None]:
fig, ax1 =plt.subplots()

# plot data and set colour
ax1.plot(myData['date'], myData['sensor_power'], color='g')
ax1.set_ylabel('Power (W)',color='g') #axis label
ax1.tick_params(axis='x',labelrotation=45)

plt.title("Sensor 4E67")
plt.show()

<b>More advanced plot, with both columns of data</b>

In [None]:
fig, ax1 =plt.subplots()

# plot data and set colour
ax1.plot(myData['date'], myData['input_voltage'], color='m')
ax1.set_ylabel('Voltage (V)',color='m') # axis label

ax1.tick_params(axis='x',labelrotation=45) # rotate x xaxis to make dates fit

ax2 = ax1.twinx() # setup a second axis, sharing a common x-axis

ax2.plot(myData['date'], myData['sensor_power'], color='g')
ax2.set_ylabel('Power (W)',color='g')

plt.title("Sensor 4E67")
plt.show()

<h1 style="color:green">Summary</h1>

This workbook has shown examples of how Python can be used to gather and start pre-processing data, as part of a machine learning process. 

<h1 style="color:green">Next steps</h1>

This notebook is only the beginning! Try using the dir() command to see all the available parameters for the modules we have looked at today. Remember they are already loaded within this notebook as pd and plt:

In [None]:
dir(pd)