# Missing Value Imputation in Time Series

## **Interpolation**

- Interpolation is a commonly used technique for time series missing value imputation. 
- It helps in estimating the missing data-point using the two surrounding known data points.

-   Time-Based Interpolation
-   Spline Interpolation
-   Linear Interpolation
>Imputations from these methods make more sense when t**he missing value window ( width of missing data) is small**. For instance, if several consecutive values are missing, it becomes harder for these methods to estimate them.

## **Denoising a Time Series**

-   **Rolling means**
	- The Rolling mean is simply the mean for a window of previous observations,
	- This can greatly help in minimizing the noise in time series data.

-   **Fourier Transform**
	- Fourier Transform can help remove the noise by converting the time series data into the frequency domain, and from there, we can filter out the noisy frequencies. 
	- Then, we can apply the inverse Fourier transform to obtain the filtered time series. 

#### **_Non-Stationarity_**

- Non-stationarity is when the statistical properties of a series, e.g the mean, variance, and covariance (or the process generating the series) changes over time.
- A proven method of stationarizing a non-stationary series is through the use of differencing.![enter image description here](https://miro.medium.com/max/750/0*JRvNGPeRl4RF9lRS.webp)


# **_ARIMA_**

ARIMA is one example of a traditional method of forecasting time series. ARIMA stands for **Auto-Regressive Integrated Moving Average** and is divided into 3 parts —

-   AR "p" 
	-  The **auto-regressive** part represents the number of time periods to apply lag our data for. 
	- A p term of 2 means we consider two time-steps before each observation as the explanatory variables for that observation for the autoregressive portion of the calculation. `T**he observation itself becomes the target variable.*`

-   I(d) — 
	- For the **Integrated** portion, d represents the number of differencing transformations applied to the series to transform a non-stationary time series to a stationary one.

-   MA(q)— A time series is thought to be a combination of 3 components: 
	- **Trend** refers to the gradual increase or decrease of the time series over time
	- **Seasonality** is a repeating short-term cycle in the series, and 
	- **Error** refers to noise not explained by seasonality or trend. 

	**Moving average** therefore refers to the number of lags of the error component to consider. .

AR & MA are quite similar, the difference is 
		- AR considers lagged time series values from previous time periods, while 
		- MA considers errors from previous periods.

**Lagged Correlation**

- With time series, we don’t only consider error metrics like MSE, MAE, etc. This is because such metrics don’t capture lag.
-  To get a better picture of how well our model is performing, we also look at **Lag Correlation (Cross-Correlation) between the true and predicted values**.

#### Stationary TimeSeries
If a time-series is stationary, its mean and standard deviation stays constant over time. This implies that the time-series has no trend and no cyclic variability.

Thus, to apply classical models, a time-series usually should be decomposed into different components.

1.  Test for stationarity
2.  Differencing [if stationarity detected]
3.  Fit method and forecast
4.  Add back the trend and seasonality

> most classical models are **linear**, which means they **assume linearity** in the dependencies between values at the same time and **between values at different time steps.**

**Stationarity**  is defined by three characteristics:

1.  Finite variation
2.  Constant mean
3.  Constant variation

- Constant variation means that the variation of the time-series in a window between two points is constant over time: ![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781801819626/files/Images/B17577_05_006.png), although it can change with the size of the window.


The ARMA model consists of two types of **lagged values**.
> -  autoregressive component 
> 	- moving average component.

Therefore, we write _ARMA(p, q)_, 
> p indicating the order of the autoregression, 
> 	_q_, the order of the moving average, as:

`ARMA assumes that the series is stationary. In practice, to ensure stationarity, preprocessing has to be applied.`

##### ARIMA

**Auto regressive (AR) process** , a time series is said to be AR when present value of the time series can be obtained using previous values of the same time series.
**Moving average (MA) process,** a process where the present value of series is defined as a linear combination of past errors. We assume the errors to be independently distributed with the normal distribution.
	- Order q of the MA process is obtained from the ACF plot, this is the lag after which ACF crosses the upper confidence interval for the first time


**ARIMA(p, d, q)** 
	- includes a data preprocessing step, called **integration**, to make the time-series stationary, **which is by replacing values by subtracting the immediate past values,** a transformation called **differencing**.+


### ACF
> **ACF** is an (complete) auto-correlation function which gives us values of auto-correlation of any series with its lagged values.

>it describes how well the present value of the series is related with its past values. 

A time series can have components like
- trend	
- seasonality
 - cyclic and residual

	 ACF considers all these components while finding correlations hence **it’s a ‘complete auto-correlation plot’.**

#### PACF

**PACF** is a partial auto-correlation function. Basically instead of finding correlations of present with lags like ACF, it f**inds correlation of the residuals**  with the next lag value hence ‘partial’ and not ‘complete’ as we r**emove already found variations before we find the next correlation**

> So if there is any **hidden information in the residual which can be modeled b**y the next lag, we might get a good correlation and we will keep that next lag as a feature while modeling

>  it removes variations explained by earlier lags so we get only the relevant features.

## Types of time series

There are two  types of time series data, as  outlined here:

-   **Regular time series**: This  is the most  common type of time series where we have observations coming in at **regular intervals of time, such as every hour or  every month.**
-   **Irregular time series**: There  are a  few time series where we do not have observations at a regular interval of time. 
	- For example, consider we have a sequence of readings from lab tests of a patient. We see an observation in the time series only when the patient heads to the clinic and carries out the lab test, and this may not happen in regular intervals  of time.
## Main areas of application for time series analysis

There are broadly three important areas of application for time series analysis, outlined  as follows:

-   **Time series forecasting**: Predicting  the future values of a time series, given the past values—for example, predict the next day's temperature using the last 5 years of  temperature data.
-   **Time series classification**: Sometimes, instead of predicting the future value of the  time series, we may also want to predict an action based on past values. For example, given a history of  an  **electroencephalogram**  (**EEG**; tracking electrical activity in the brain) or  an  **electrocardiogram**  (**EKG**; tracking electrical activity in the heart), we need to predict whether the result of an EEG or an EKG is normal  or abnormal.
-   **Interpretation and causality**: Understand the whats and whys of the time series based on  the past values, understand the  interrelationships among several related time series, or derive causal inference based on time  series data.

> An extreme case of a stochastic process that generates a time series is a **white noise** process. It has a sequence of random numbers with zero mean and constant standard deviation. This is also one of the most popular assumptions of noise in a time series.

**Red noise**, on the other hand, has zero mean and constant variance but is serially correlated in time. This serial correlation or redness is parameterized by a correlation coefficient _r_, such that:


![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781803246802/files/image/Formula_01_001.png)


**autoregressive (AR) signal**. AR signal refers to *when the value of a time series for the current timestep is dependent on the values of the time series in the previous timesteps*. This serial correlation is a key property of the **AR signal**, and it is parametrized by a few parameters, outlined as follows:

**seasonal signals** are patterns in a time series that repeat over a fixed period of time. For example, a time series of monthly sales may have a seasonal signal that shows higher sales in December and lower sales in January every year. 

**Cyclical signals** are similar to seasonal signals, but they do not have a fixed and known frequency. For example, a time series of economic indicators may have a cyclical signal that shows periods of growth and recession, but the length and timing of these cycles may vary.

I can help you convert your content into creative markdown. Here is what I came up with:

# Stationary and Non-Stationary Time Series

A **stationary time series** is one whose statistical properties do not depend on the time at which the series is observed. This means that the **mean, variance, and autocorrelation** of the series are constant over time. A **non-stationary time series** is one whose statistical properties do change over time. This can be due to a **trend, seasonality, or other factors**.

Here are some examples of stationary and non-stationary time series:

**Stationary Time Series**

 - The number of people who visit a website each day'.
 -  The price of a stock 
 - The amount of rainfall in a city each year

**Non-Stationary Time Series**
- The number of people who visit a website each month
- The price of a stock over time
- The amount of rainfall in a city each decade

Stationary time series can be analyzed using a variety of methods, such as **linear regression** and **ARIMA models**. 

Non-stationary time series can be analyzed using methods that account for the time-varying nature of the series, such as **exponential smoothing** and **state space models**.

there are two ways the **stationarity assumption can be broken,** as  outlined here:

-   Change in mean over time
	- If there is an upward/downward trend in the time series, the mean across two windows of time would not be the same.
-   Change in variance  over time
	- If the time series starts off with low variance and as time progresses, the variance keeps getting bigger and bigger, we have a non-stationary time series.

Certainly! Here are responses to the interview questions on stationary and nonstationary time series using markdown syntax:



### **2. What are the different types of stationarity?**
There are three main types of stationarity:

- **Strict Stationarity:** A time series is said to be strictly stationary **if the joint distribution of any set of observations at different time points is the same**.
	-  This means that the mean, variance, and autocorrelation structure are constant over time.

- **Weak Stationarity:** Weak stationarity, also known as **second-order stationarity,** requires that only the **mean and variance of the time series remain constant over time**. 
	- The autocorrelation structure can change as long as it depends only on the time lag between observations.

- **Trend Stationarity:** In some cases, a time series may exhibit a deterministic trend component while still being stationary around that trend. 
- This is called trend stationarity. In this case, **the mean and variance vary with time but exhibit some periodicity or pattern around a trend.**

### **3. How can you test for stationarity?**
To test for stationarity, various statistical tests can be employed:

- **Visual Inspection:** Plot the time series data and look for any obvious trends or seasonality. A stationary series should appear flat and not show any systematic patterns.

- **Summary Statistics:** Calculate the mean and variance of the series over different time intervals. If these statistics remain relatively constant, it indicates stationarity.

- **Augmented Dickey-Fuller (ADF) Test:** This statistical test assesses whether a time series is stationary by examining the presence of **a unit root in the data**. A p-value below a certain significance level indicates stationarity. `p < significance_value`

- **Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test:** This test complements the ADF test by examining whether a series **is trend-stationary**. It checks for stationarity around a deterministic trend.

### **4. What are the different methods for analyzing stationary and nonstationary time series?**
Analyzing time series data involves different methods depending on its stationarity:

**For Stationary Time Series:**
1. **Autocorrelation and Partial Autocorrelation:** These functions help identify the order of autoregressive (AR) and moving average (MA) terms in an ARIMA model.

2. **Spectral Analysis:** Tools like the periodogram and spectral density estimation help identify frequency components in stationary time series.

3. **ARIMA Modeling:** Autoregressive Integrated Moving Average (ARIMA) models are suita**ble for modeling and forecasting stationary time series data.**

**For Nonstationary Time Series:**
1. **Differencing:** Transforming a nonstationary series into a stationary one by taking *differences between consecutive observations*. This is often done before applying *ARIMA modeling*.

2. **Decomposition:** Decompose the time series into **trend, seasonality, and residual components** using methods like seasonal decomposition of time series (STL).

3. **Cointegration Analysis:** In cases of multiple nonstationary time series, cointegration analysis helps identify long-term relationships among them.

4. **State-Space Models:** Techniques like the Kalman filter and structural time series modeling can capture complex dynamics in nonstationary data.


### **ARIMA (AutoRegressive Integrated Moving Average):**

1.  **AutoRegressive (AR):** The "AR" component of ARIMA refers to the autoregressive part of the model. *It captures the relationship between the current value of a time series and its past values*. Specifically, it models how the current value depends on previous values with a linear combination. **The order of the autoregressive component, denoted as "p," represents the number of past observations used for prediction.**
    
2.  **Integrated (I):** The "I" component signifies differencing, which is employed to make a nonstationary time series stationary. `Differencing involves subtracting the current value from a lagged value`, typically denoted as "d" times. The order of differencing, "d," is determined by `the number of differences required to stabilize the mean and make the series stationary`.
    
3.  **Moving Average (MA):** The "MA" component deals with the moving average part of the model. `It models how the current value depends on past white noise or error terms`. The order of the moving average component, denoted as "q," indicates `the number of past white noise terms considered in the model.`
    

### **ARIMA Plus (ARIMA+):**

"ARIMA Plus" is not a standard term but can be used to describe an extended version of the ARIMA model that incorporates additional components or features to enhance its forecasting capabilities. Here are some possible extensions:

1.  **Seasonal Component (S):** In many time series, there are recurring patterns or seasonal effects. ARIMA can be extended to SARIMA (Seasonal ARIMA) by including seasonal autoregressive and moving average terms. The seasonal component accounts for periodic patterns within the data.
    
-   **Multivariate forecasting**

Multivariate time series  consist of `more than one time series variable that is not only dependent on its past values but also has some dependency on the other variables`. 

For example, a set of  macroeconomic indicators such as  **gross domestic product**  (**GDP**), inflation, and so on of a particular country can be considered as a multivariate time series. The aim of multivariate forecasting is to come up with a model that captures the interrelationship between the different variables along with its relationship with its past and forecast all the time series together in  the future.

-   **Explanatory forecasting**

In addition  to the past values of a time series, we might use some other information to predict the future values of a time series. For example, for predicting retail store sales, information regarding promotional offers (both historical and future ones) is usually helpful.


##  Time series preprocessing and feature engineering are two important steps in building predictive models for time series data. 
	Here is a breakdown and explanation of these steps as if I were giving an interview answer.

Time series preprocessing is the process of cleaning and transforming the raw time series data into a format that is suitable for modeling. Some common preprocessing steps are:

1.  **Handling missing values**: Missing values can occur in time series data due to various reasons, such as sensor failures, data entry errors, or irregular sampling intervals. Missing values can affect the quality and accuracy of the models, so they need to be handled appropriately. Some common methods for handling missing values are:
    
    -   **Imputation**: Replacing the missing values with some estimated values, such as the mean, median, or mode of the available data, or using more sophisticated methods, such as interpolation, regression, or machine learning.
    -   **Deletion**: Removing the observations with missing values from the data, either by dropping the entire rows or columns, or by using a sliding window approach that only considers a fixed number of observations at a time.
    -   **Flagging**: Marking the missing values with a special indicator, such as zero, negative one, or NaN, and treating them as a separate category in the models.
2. **Resampling**: `Resampling is the process of changing the frequency or granularity of the time series data`. 
		- For example, we can resample daily data into weekly, monthly, or yearly data, or vice versa. 
		> Resampling can help reduce noise, smooth out fluctuations, and capture long-term trends or seasonal patterns in the data. Some common methods for resampling are:
    
    -   **Aggregation**: Aggregating the data over a larger time interval by applying a summary statistic, such as **sum, average, minimum, maximum, or count**. For example, we can aggregate daily sales data into monthly sales data by taking the sum of sales for each month.
    -   **Disaggregation**: Disaggregating the data over a smaller time interval by applying a splitting rule, such as equally dividing, proportionally allocating, or randomly assigning. For example, we can disaggregate monthly sales data into daily sales data by dividing the monthly sales by the number of days in each month.
    
  3. **Interpolation**: Interpolating the data over a different time interval by using a mathematical function, such as **linear, polynomial, spline, or exponential.** 
	    - For example, we can interpolate hourly temperature data into minute temperature data by using a linear function that connects the adjacent hourly observatons.
	    -
4. **Normalization**: Normalization is the process of scaling or transforming the time series data to have a common range or distribution. Normalization can help reduce the effect of outliers, improve the comparability of different time series, and enhance the performance of some models that assume certain properties of the data. Some common methods for normalization are:
    
    -   **Min-max scalin**g: Scaling the data to have a fixed minimum and maximum value, usually between zero and one. This method preserves the shape and proportion of the data but is sensitive to outliers.
    -   S**tandardization**: Scaling the data to have a zero mean and unit variance. This method makes the data follow a standard normal distribution but changes the shape and proportion of the data.
    -   **Log transformation**: Transforming the data by taking the natural logarithm of each value. This method reduces the skewness and variance of the data but cannot handle zero or negative values.

Time series feature engineering is the process of creating new variables or features from the existing time series data that can capture useful information or patterns for modeling. Some common feature engineering techniques are:

- **Fourier Transforms:**
	- Apply Fourier transforms to capture frequency domain information in the time series, which is especially useful for detecting periodic patterns.`can filter out the noisy frequencies`

-   **Date-related features**: `Extracting features from the date-time column that can reflect temporal aspects of the data, such as year, month, day, hour, minute, second, weekday, weekend, holiday, season, quarter, etc`. These features can help capture cyclical or seasonal patterns in the data that may affect the target variable.
-   **Time-based features**: `Creating features based on the elapsed time or duration between different events or observations in the data`. 
	- For example, we can create features such as time since last purchase, time until next purchase, average time between purchases, etc. These features can help capture trends or changes in customer behavior over time.
-   **Lag features:** `Creating features based on previous values or observations of the target variable or other variables in the data`. 
	- For example, we can create features such as lagged sales, lagged temperature, lagged demand, etc. These features can help capture autocorrelation or dependency in the data that may affect future outcomes.
-   **Rolling window features:** Creating features based on statistics calculated over a fixed window of previous observations in the data. For example, we can create features such as rolling mean, rolling standard deviation, rolling minimum, rolling maximum, etc. These features can help capture `local patterns or variations in the data that may affect short-term outcomes`.

-   **Expanding window feature**s: Creating features based on statistics calculated over an expanding window of all previous observations in the data. For example, we can create features such as `cumulative sum, cumulative count, cumulative average, cumulative maximum,` etc. These features can help capture **global patterns or trends in the data that may affect long-term outcomes.**
- 
-   **Domain-specific features**: Creating features based on domain knowledge or external information that can provide additional context or insight for the data. For example, we can create features such as weather conditions, economic indicators, social media sentiment, etc. These features can help capture exogenous factors or influences that may affect the target variable.

### **Fourier Transform:**

The Fourier transform is a mathematical transformation `used to analyze the frequency components of a time series or signal`. It decomposes a signal into its constituent sinusoidal components, `revealing the frequencies and amplitudes present in the data.`

Why we need it:

-   **Frequency Analysis:** Fourier transform helps in understanding the periodic patterns, cycles, and oscillations present in time series data. For example, it's used in signal processing to analyze sound waves and in analyzing the seasonality of economic data.
    
-   **Filtering and Denoising:** By identifying the dominant frequencies, you can filter out noise or unwanted components from the signal, leaving behind the essential information.
    
-   **Feature Extraction:** In some cases, Fourier transform can be used as a feature engineering technique to extract relevant frequency-domain features for machine learning models.

Any time series can contain some or all of the  following components:

-   **Trend**
-   **Seasonal**
-   **Cyclical**
-   **Irregular**

additive_  (_Y = Trend + Seasonal + Cyclical + Irregular_) 
> Additive decomposition is appropriate when the magnitude of the `seasonal fluctuations is relatively constant across different levels of the time series`. In other words, if the amplitude of seasonality doesn't depend on the overall level of the data, an additive decomposition is suitable

multiplicative_  (_Y = Trend * Seasonal * Cyclical *_ _Irregular_).

>Multiplicative decomposition is suitable `when the magnitude of seasonal fluctuations varies with the overall level of the time series`. If the amplitude of seasonality increases or decreases as the level of the data changes, a multiplicative decomposition is often more appropriate.

> The **trend** is a long-term change in the mean of a time series. It is the smooth and steady movement of a time series in a particular direction

> When a time `series exhibits regular, repetitive, up-and-down fluctuations`, we call that **seasonality**. For instance, retail sales typically shoot up during the holidays,

> the cyclical component also `exhibits a similar up-and-down pattern around the trend line`, but instead of repeating the pattern every period, the cyclical component is irregular.

		- A good example of this is economic recession, which happens over a 10-year cycle.

> The irregular component

This component  is left after removing the `trends, seasonality, and cyclicity ` from a time series. 
Traditionally, this component is considered  _unpredictable_  and is also called the  _residual_  or  _error term_.

`autocorrelation is the correlation between the values of a time series in successive periods`

> partial autocorrelation removes any `indirect correlation that may be present before presenting the correlations`

If _t_ is the current time step, 
		- let’s assume _t-1_ is highly correlated to _t_. 
		- So, by extending this logic, _t-2_ will be highly correlated with _t-1_ and 
			`because of this correlation, the autocorrelation between _t_ and _t-2_ would be high.` 
			
However, `partial autocorrelation corrects this and extracts the correlation, which can be purely attributed to _t-2_ and _t_`.


### # Decomposing a time series
1.  **Detrending**: Here, we estimate the  **trend component**  (which is the smooth change in the time series) and  remove  it from the time  series, giving  us a  **detrended** **time series**.

3.  **Deseasonalizing**: Here, we estimate the seasonality  component from  the `detrended time series.` After removing the seasonal component, what is left is the residual.

**Log transforms** are typically known to reduce the variance of the data and thereby remove heteroscedasticity in the data. Intuitively, we can think of a log transform as something that _pulls in_ the extreme values on the right of the histogram, at the same time stretching back the very low values on the left of the histogram.



### **Deterministic Trend:**

A deterministic trend in a time series refers to `a predictable and systematic long-term pattern or movement in the data that can be modeled and forecasted with a high degree of certainty`. 
- This trend is driven by known, fixed factors or variables, and it follows a clear, consistent path over time. In other words, a deterministic trend has a well-defined and deterministic relationship with time.

**Example of Deterministic Trend:**

Consider the monthly sales data of a popular smartphone brand over several years. If the company has been consistently launching a new model with improved features every year in September, the sales data is likely to exhibit a deterministic upward trend during the month of September each year. This predictable increase in sales can be attributed to the annual product release and can be modeled with a high degree of certainty.

In this case, the deterministic trend is driven by a known event (the product release), and it follows a clear pattern, making it possible to forecast the sales for the next September with a high level of confidence.

### **Stochastic Trend:**

 `refers to a random and unpredictable long-term pattern or movement that cannot be precisely modeled or forecasted`. '
 - Unlike a deterministic trend, a stochastic trend does not follow a fixed and known path over time. Instead, it exhibits fluctuations and variations that are not driven by identifiable, fixed factors.

**Example of Stochastic Trend:**

Consider a daily `stock price of a company known for its volatility`. While there may be periods where the stock price generally increases due to positive news or strong financial performance, the long-term movement of the stock price is often characterized by random fluctuations, influenced by a multitude of unpredictable factors such as market sentiment, economic events, and investor behavior.






**The sine and cosine transformation is a way of encoding the timestamp of a time series data into two new features that can capture the cyclical or seasonal patterns in the data.** 

	- The idea is to map the timestamp to a point on a unit circle, where the x-coordinate is the cosine value and the y-coordinate is the sine value. This way, the timestamp can be represented by an angle that varies from 0 to 2π over a fixed period of time, such as a day, a week, or a year.

`The effect of this transformation is that it can preserve the temporal information of the data while reducing the dimensionality and complexity of the data.`

` It can also help improve the performance of some models that assume linear relationships between the features and the target variable, such as linear regression or neural networks`. 