# Introduction

## Different types of data
Business analysts and data scientists come across many different types of data in their analytics projects. Most data commonly found in academic and industrial projects can be broadly classified into the following categories:

- Cross-sectional data
- Time series data
- Panel data

### Cross-sectional data
Cross-sectional data or cross-section of a population is obtained by taking observations
from multiple individuals at the same point in time. Cross-sectional data can comprise of
observations taken at different points in time, however, in such cases time itself does not
play any significant role in the analysis.

Examples:
    SAT scores of high school students in a particular year is an example of cross-sectional data. Gross domestic product (GDP) of countries in a given year is another example of cross-sectional data. Data for customer churn analysis is another example of cross-sectional data

Analysis of cross-sectional data starts with a plot of the variables to visualize their statistical properties such as central tendency, dispersion, skewness, and kurtosis

### Time series data
A time series is made up of quantitative observations on one or more measurable characteristics of an individual entity and taken at multiple points in time.

Time series data is typically characterized by several interesting internal structures such as trend, seasonality,
stationarity, autocorrelation, and so on. 

### Panel data
We observe multiple entities over multiple points in time we get a panel data also known as longitudinal data

## Internal structures of time series
In this section, we will conceptually explain the following special characteristics of time series data that requires its special mathematical treatment:
- General trend
- Seasonality
- Cyclical movements
- Unexpected variations

### General trend
When a time series exhibits an upward or downward movement in the long run, it is said to
have a general trend.
![image.png](attachment:image.png)

General Trend might not be evident over a short run of the series. Short run effects such as seasonal fluctuations and irregular variations cause the time series to revisit lower or higher values observed in the past and hence can temporarily obfuscate any general trend.
![image.png](attachment:image.png)

The general trend in the time series is due to fundamental shifts or systemic changes of the
process or system it represents. 

- A general trend is commonly modeled by setting up the time series as a regression against time and other known factors as explanatory variables. 
- The regression or trend line can then be used as a prediction of the long run movement of the time series. 
- Residuals left by the trend line is further analyzed for other interesting properties such as seasonality, cyclical behavior, and irregular variations.
![image.png](attachment:image.png)

### Seasonality
Seasonality manifests as repetitive and period variations in a time series. In most cases, exploratory data analysis reveals the presence of seasonality

A practical technique of determining seasonality is through exploratory data analysis through the following plots:
- Run sequence plot
- Seasonal sub series plot
- Multiple box plots

#### Run sequence plot
A simple run sequence plot of the original time series with time on x-axis and the variable on y-axis is good for indicating the following properties of the time series:
- Movements in mean of the series
- Shifts in variance
- Presence of outliers

The run sequence plot of a hypothetical time series that is obtained from the mathematical formulation xt
 = (At + B) sin(t) + Є(t) with a time-dependent mean and error Є(t) that varies with a normal distribution N(0, at + b) variance. Additionally, a few exceptionally high and low observations are also included as outliers.

#### Seasonal sub series plot
For a known periodicity of seasonal variations, seasonal sub series redraws the original series over batches of successive time periods.  To visualize seasonality in the residuals, we create quarterly mean and standard deviations.

A seasonal sub series reveals two properties:
- Variations within seasons as within a batch of successive months
- Variations between seasons as between batches of successive months
![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Multiple box plots

A box plot displays both central tendency and dispersion within the seasonal data over a batch of time units. Besides, separation between two adjacent box plots reveal the within season variations:
![image.png](attachment:image.png)

### Cyclical changes
Cyclical changes are movements observed after every few units of time, but they occur less
frequently than seasonal fluctuations. Unlike seasonality, cyclical changes might not have a
fixed period of variations. Besides, the average periodicity for cyclical changes would be
larger (most commonly in years), whereas seasonal variations are observed within the same
year and corresponds to annual divisions of time such as seasons, quarters, and periods of
festivity and holidays and so on.

- A long run plot of the time series is required to identify cyclical changes that can occur, for example, every few years and manifests as repetitive crests and troughs. 
- In this regard, time series related to economics and business often show cyclical changes that correspond to usual business and macroeconomic cycles such as periods of recessions followed by every of boom, but are separated by few years of time span. 
- Similar to general trends, identifying cyclical movements might require data that dates significantly back in the past.

![image.png](attachment:image.png)

### Unexpected variations
Referring to our model that expresses a time series as a sum of four components, it is noteworthy that in spite of being able to account for the three other components, we might still be left with an irreducible error component that is random and does not exhibit systematic dependency on the time index. 

This fourth component reflects unexpected variations in the time series. Unexpected variations are stochastic and cannot be framed in a mathematical model for a definitive future prediction. 

This type of error is due to lack of information about explanatory variables that can model these variations or due to presence of a random noise.

## Models for time series analysis
- The purpose of time series analysis is to develop a mathematical model that can explain the observed behavior of a time series and possibly forecast the future state of the series. 
- The chosen model should be able to account for one or more of the internal structures that might be present. 
- To this end, we will give an overview of the following general models that are often used as building blocks of time series analysis:
    - Zero mean models
    - Random walk
    - Trend models
    - Seasonality models

### Zero mean models
- The zero-mean models have a constant mean and constant variance and shows no predictable trends or seasonality.
- Observations from a zero mean model are assumed to be independent and identically distributed (iid) and represent the random noise around a fixed mean, which has been deducted from the time series as a constant term.