<a href="https://colab.research.google.com/github/vsbca/Data-Science/blob/master/Time_series_analysis_intro1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#data https://github.com/vsbca/Data-Science/raw/master/Data/India_exchange_rate_dataset.xls
https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/

<h3>Time-Series Data</h3><br/>
A time series contains data points that increase, decrease, or otherwise change in chronological order over a period. A time series that incorporates the records of a single feature or variable is called a univariate time series. If the records incorporate more than one feature or variable, the series is called a multivariate time series. In addition, a time series can be designated in two ways: continuous or discrete.
In a continuous time series, data observation is carried out continuously throughout the period.
In a discrete time series , data observation is carried out at a specific time or equally spaced, as with temperature increases or decreases, exchange rates of currencies, air pressure data, etc.



**Trend** <br/>
A trend is a pattern that is observed over a period of time and represents the mean rate of change with respect to time. A trend usually shows the tendency of the data to increase/uptrend or decrease/downtrend during the long run. It is not always necessary that the increase or decrease is in the same direction throughout the given period of time. A trend line is also drawn using candlestick charts.

For example, you may have heard about an increase or decrease in different market commodities such as gold, silver, stock prices, gas, diesel, etc., or about the rate of interest for banks or home loans increasing or decreasing. These are all commodity market conditions, which may either increase or decrease over time, that show a trend in data.

**Detecting Trend Using a Hodrick-Prescott Filter** <br/>
The Hodrick-Prescott (HP) filter has become a benchmark for getting rid of trend movements in data. This method is broadly employed for econometric methods in applied macroeconomics research. The technique is nonparametric and is used to dissolve a time series into a trend; it is a cyclical component unaided by economic theory or prior trend specification. Like all nonparametric methods, the HP filter is contingent significantly on a tuning parameter that controls the degree of smoothing. 

In [1]:
import pandas as pd
import pandas.util.testing as tm
from statsmodels.tsa.filters.hp_filter import hpfilter

  


In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv")
df.head()

Unnamed: 0,Date,Consumption,Wind,Solar,Wind+Solar
0,2006-01-01,1069.184,,,
1,2006-01-02,1380.521,,,
2,2006-01-03,1442.533,,,
3,2006-01-04,1457.217,,,
4,2006-01-05,1477.131,,,


Before we dive into the OPSD data, let’s briefly introduce the main pandas data structures for working with dates and times. In pandas, a single point in time is represented as a Timestamp. We can use the to_datetime() function to create Timestamps from strings in a wide variety of date/time formats. Let’s import pandas and convert a few dates and times to Timestamps.

In [3]:
pd.to_datetime('2006-01-01')

Timestamp('2006-01-01 00:00:00')

In [4]:
pd.to_datetime('7/8/1984', dayfirst=True)

Timestamp('1984-08-07 00:00:00')

In [5]:
df.shape

(4383, 5)

In [6]:
df.tail(3)

Unnamed: 0,Date,Consumption,Wind,Solar,Wind+Solar
4380,2017-12-29,1295.08753,584.277,29.854,614.131
4381,2017-12-30,1215.44897,721.247,7.467,728.714
4382,2017-12-31,1107.11488,721.176,19.98,741.156


In [7]:
df.dtypes

Date            object
Consumption    float64
Wind           float64
Solar          float64
Wind+Solar     float64
dtype: object

In [8]:
#Change the Date to datetime
df['Date'] = pd.to_datetime(df['Date'])

In [9]:
df.dtypes

Date           datetime64[ns]
Consumption           float64
Wind                  float64
Solar                 float64
Wind+Solar            float64
dtype: object

In [10]:
#Create a new dataframe with index date
df2 = df.set_index('Date')

In [11]:
df2.head()

Unnamed: 0_level_0,Consumption,Wind,Solar,Wind+Solar
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006-01-01,1069.184,,,
2006-01-02,1380.521,,,
2006-01-03,1442.533,,,
2006-01-04,1457.217,,,
2006-01-05,1477.131,,,


In [24]:
#Create columns year, month, weekday name
import datetime as dt
df2["Year"] = df2.index.year
df2["Month"] = df2.index.month
df2["Weekday Name"] = df2.index.weekday

df2.sample(5, random_state=0)

Unnamed: 0_level_0,Consumption,Wind,Solar,Wind+Solar,Year,Month,Weekday Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-08-23,1152.011,,,,2008,8,5
2013-08-08,1291.984,79.666,93.371,173.037,2013,8,3
2009-08-27,1281.057,,,,2009,8,3
2015-10-02,1391.05,81.229,160.641,241.87,2015,10,4
2009-06-02,1201.522,,,,2009,6,1


In [25]:
import matplotlib.pyplot as plt
import seaborn as sns
