<a href="https://colab.research.google.com/github/thenameisAnurag/BigData/blob/main/Time%20Series%20with%20theory%20.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A time series is a sequence of data points collected or recorded at specific time intervals. Each data point represents observations or measurements taken at successive, equally spaced points in time. Time series data can be found in various fields like finance, economics, weather forecasting, stock market analysis, and more. It is used to analyze patterns, trends, and behaviors over time, allowing us to make predictions or understand the underlying dynamics of a system.

Now, let's break down the code provided and understand it in detail:

Loading and Preparing Data:
The code starts by loading a time series dataset of air passengers from a CSV file.
The 'Month' column is converted to datetime format and set as the index of the DataFrame.

Visualizing the Time Series:
The time series data is plotted using plt.plot() to visualize the trend of air passengers over the years. This helps us understand the overall pattern or behavior of the data.

Visualizing Seasonality:
Seasonality refers to patterns or fluctuations that occur at specific intervals within a time series (e.g., daily, weekly, monthly). The code uses seaborn's lineplot() function to visualize the seasonality of air passengers over different months and years. This helps us identify any recurring patterns or trends within each month across different years.

Decomposing Time Series:
Time series decomposition separates the time series into its constituent components, including trend, seasonality, and residuals (the remainder after removing trend and seasonality). The code decomposes the time series using the seasonal_decompose() function from statsmodels and visualizes each component separately using subplots. This helps us understand the individual contributions of trend, seasonality, and residuals to the overall behavior of the time series.

Checking Stationarity:
Stationarity is an important property of time series data, indicating whether the statistical properties of the data (e.g., mean, variance) remain constant over time. The code defines a function stationarity_test() to check stationarity by plotting rolling statistics (rolling mean and rolling standard deviation) and performing the Dickey-Fuller test. This helps us determine if the time series is stationary or exhibits trends or seasonality that need to be addressed.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller
import statsmodels.api as sm
from pylab import rcParams

rcParams['figure.figsize'] = 10, 6

# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/satishgunjal/datasets/master/Time_Series_AirPassengers.csv')

# Convert 'Month' column to datetime and set it as index
df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)

# Plot the time series
plt.figure(figsize=(10, 6))
plt.plot(df, color='blue')
plt.xlabel('Years')
plt.ylabel('No of Air Passengers')
plt.title('Trend of the Time Series')
plt.show()

# Plot the seasonality using seaborn's lineplot
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x=df.index.month, y='Passengers', hue=df.index.year, palette='viridis')
plt.title('Seasonality of the Time Series')
plt.xlabel('Month')
plt.ylabel('No of Air Passengers')
plt.legend(title='Year')
plt.show()

# Decompose the time series and visualize using subplot
decomposition = sm.tsa.seasonal_decompose(df, model='additive')
fig, axes = plt.subplots(4, 1, figsize=(10, 10), sharex=True)
axes[0].plot(df, color='blue', label='Original')
axes[0].legend()
axes[1].plot(decomposition.trend, color='green', label='Trend')
axes[1].legend()
axes[2].plot(decomposition.seasonal, color='red', label='Seasonality')
axes[2].legend()
axes[3].plot(decomposition.resid, color='purple', label='Residuals')
axes[3].legend()
plt.xlabel('Years')
plt.tight_layout()
plt.show()

# Function for checking stationarity and visualizing
def stationarity_test(timeseries):
    # Get rolling statistics for window = 12 (monthly data)
    rolling_mean = timeseries.rolling(window=12).mean()
    rolling_std = timeseries.rolling(window=12).std()

    # Plot rolling statistics
    plt.figure(figsize=(10, 6))
    plt.plot(timeseries, color='blue', label='Original')
    plt.plot(rolling_mean, color='green', label='Rolling Mean')
    plt.plot(rolling_std, color='red', label='Rolling Std')
    plt.xlabel('Years')
    plt.ylabel('No of Air Passengers')
    plt.title('Stationary Test: Rolling Mean and Standard Deviation')
    plt.legend()
    plt.show()

    # Dickey-Fuller test
    print('Results of Dickey-Fuller Test')
    df_test = adfuller(timeseries)
    df_output = pd.Series(df_test[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'])
    for key, value in df_test[4].items():
        df_output['Critical Value (%s)' % key] = value
    print(df_output)

# Perform stationarity test
stationarity_test(df)
