# Masoscience (preparation)

**Author:** Mir Yasin Zeinaliyan

**Email:** yasinprodebian@gmail.com  

**Github:** https://github.com/yasin-pro/masoscience

**Description:** In this notebook, we implement a part of Masoscience project related to data preparation for model learning.
This data is related to the price of the EURUSD stock market in a one-hour time frame, and the year we are preparing this project is 2024, and the prices are from ten years ago to this year.


### Import libraries

In this section, I entered the code of all the libraries that are required to run the following codes

In [41]:
import pandas as pd
import numpy as np
import datetime as dt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split

### Read data

In the code below, I have loaded my data from my Google Drive and I want it. If your data is somewhere else, you must change the way to load the csv file.

I read the csv file and sort it according to time and get the prices of each candle, including the opening and closing prices and the highest and lowest prices.

In [43]:
from google.colab import drive

drive.mount('/content/drive')
df = pd.read_csv("/content/drive/My Drive/eurusd.csv")

# df = pd.read_csv("eurusd.csv")

df['time'] = pd.to_datetime(df['time'])

df = df.sort_values(by='time')

df = df.dropna()

df = df[['open', 'high', 'low', 'close']].copy()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Calculation of rsi for periods of 14, 16, 18 and 20 days

The Relative Strength Index (RSI) is a momentum oscillator used in technical analysis to measure the speed and change of price movements. It was developed by J. Welles Wilder and is designed to identify overbought or oversold conditions in a market.

In [44]:
def calculate_rsi(data, period=14):
    """This function for calculate rsi."""

    close_prices = data['close']

    daily_returns = close_prices.diff()

    gain = daily_returns.where(daily_returns > 0, 0)

    loss = -daily_returns.where(daily_returns < 0, 0)

    average_gain = gain.rolling(window=period).mean()

    average_loss = loss.rolling(window=period).mean()

    rs = average_gain / average_loss

    rsi = 100 - (100 / (1 + rs))

    return rsi

df["rsi_14"] = calculate_rsi(df, 14)

df["rsi_16"] = calculate_rsi(df, 16)

df["rsi_18"] = calculate_rsi(df, 18)

df["rsi_20"] = calculate_rsi(df, 20)

df = df.dropna()

### Calculation of bollinger bands for periods of 18, 20, 22 and 24 days

Bollinger Bands are a technical analysis tool developed by John Bollinger, used to measure market volatility and identify potential overbought or oversold conditions. Bollinger Bands consist of three lines, typically plotted on a price chart:

    Middle Band: This is usually a simple moving average (SMA) of the price, typically set to 20 periods.

    Upper Band: This is calculated by adding a certain number of standard deviations (usually 2) to the middle band. The standard deviation measures the dispersion of price data from the average.

    Lower Band: This is calculated by subtracting the same number of standard deviations (usually 2) from the middle band.


In [None]:
def calculate_bollinger_bands(data, period=20):
    """
    THIS FUNCTION FOR CALCULATE BOLLINGER BANDS
    """
    close_prices = data['close']

    sma = close_prices.rolling(window=period).mean()

    std = close_prices.rolling(window=period).std()

    upper_band = sma + 2 * std

    lower_band = sma - 2 * std

    return upper_band, sma, lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 18)

df['upper_band_18'] = upper_band

df['sma_18'] = sma

df['lower_band_18'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 20)

df['upper_band_20'] = upper_band

df['sma_20'] = sma

df['lower_band_20'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 22)

df['upper_band_22'] = upper_band

df['sma_22'] = sma

df['lower_band_22'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 24)

df['upper_band_24'] = upper_band

df['sma_24'] = sma

df['lower_band_24'] = lower_band

df = df.dropna()

### Calculation of atr for periods of 14, 16, 18 and 20 days

ATR (Average True Range) is a technical analysis indicator that measures market volatility by analyzing the range of an asset's price over a specific period. It was developed by J. Welles Wilder and introduced in his 1978 book "New Concepts in Technical Trading Systems."


In [None]:
def calculate_atr_14(df, atr_period=14):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_14'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_16(df, atr_period=16):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_16'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_18(df, atr_period=18):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_18'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_20(df, atr_period=20):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_20'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

calculate_atr_14(df, 14)

calculate_atr_16(df, 16)

calculate_atr_18(df, 18)

calculate_atr_20(df, 20)

df = df.dropna()

### Calculation macd

MACD (Moving Average Convergence Divergence) is a popular technical analysis indicator used in stock trading to identify changes in the strength, direction, momentum, and duration of a trend in a stock's price.

In [None]:
def calculate_macd(df, short_window=12, long_window=26, signal_window=9):
    short_ema = df['close'].ewm(span=short_window, adjust=False).mean()

    long_ema = df['close'].ewm(span=long_window, adjust=False).mean()

    df['macd'] = short_ema - long_ema

    return df


df = calculate_macd(df)

df = df.dropna()

###  Calculation of adx for periods of 14, 16, 18, 20, 22, 24 and 26 days


ADX (Average Directional Index) is a technical indicator used to quantify the strength of a trend, regardless of its direction. It is part of the Directional Movement System developed by J. Welles Wilder, which also includes the Positive Directional Indicator (+DI) and Negative Directional Indicator (-DI).

In [None]:
def calculate_adx(df, timeperiod=14, high_col='high', low_col='low', close_col='close', adx_col='adx_14'):

    df['High-Low'] = df[high_col] - df[low_col]
    df['High-PrevClose'] = abs(df[high_col] - df[close_col].shift(1))
    df['Low-PrevClose'] = abs(df[low_col] - df[close_col].shift(1))
    df['TR'] = df[['High-Low', 'High-PrevClose', 'Low-PrevClose']].max(axis=1)

    df['UpMove'] = df[high_col] - df[high_col].shift(1)
    df['DownMove'] = df[low_col].shift(1) - df[low_col]
    df['PlusDM'] = np.where((df['UpMove'] > df['DownMove']) & (df['UpMove'] > 0), df['UpMove'], 0)
    df['MinusDM'] = np.where((df['DownMove'] > df['UpMove']) & (df['DownMove'] > 0), df['DownMove'], 0)

    df['ATR'] = df['TR'].rolling(window=timeperiod).mean()

    df['PlusDI'] = 100 * (df['PlusDM'].rolling(window=timeperiod).sum() / df['ATR'])
    df['MinusDI'] = 100 * (df['MinusDM'].rolling(window=timeperiod).sum() / df['ATR'])

    df['RS'] = df['PlusDI'] / df['MinusDI']
    df[f'{adx_col}'] = 100 * df['RS'].ewm(span=timeperiod, adjust=False).mean()

    df.drop(['High-Low', 'High-PrevClose', 'Low-PrevClose', 'TR', 'UpMove', 'DownMove', 'PlusDM', 'MinusDM', 'ATR', 'RS', 'PlusDI', 'MinusDI'], axis=1, inplace=True)

    return None

calculate_adx(df, timeperiod = 14, adx_col='adx_14')

calculate_adx(df, timeperiod = 16, adx_col='adx_16')

calculate_adx(df, timeperiod = 18, adx_col='adx_18')

calculate_adx(df, timeperiod = 20, adx_col='adx_20')

calculate_adx(df, timeperiod = 22, adx_col='adx_22')

calculate_adx(df, timeperiod = 24, adx_col='adx_24')

calculate_adx(df, timeperiod = 26, adx_col='adx_26')

df = df.dropna()

###  Calculation of vix for periods of 14, 16, 18 and 20 days


The VIX (Volatility Index), often referred to as the "Fear Gauge" or "Fear Index," is a popular measure of the stock market's expectation of volatility based on S&P 500 index options. It is calculated and published by the Chicago Board Options Exchange (CBOE) and represents the market's expectations for volatility over the next 30 days.

In [None]:
def calculate_vix(df, timeperiod=20, high_col='high', low_col='low', close_col='close', vix_col='vix_20'):

    # Calculate log returns
    df['LogReturns'] = np.log(df[close_col] / df[close_col].shift(1))

    # Calculate squared returns
    df['SquaredReturns'] = df['LogReturns'].pow(2)

    # Calculate rolling sum of squared returns
    df['SumSquaredReturns'] = df['SquaredReturns'].rolling(window=timeperiod).sum()

    # Calculate VIX
    df[vix_col] = 100 * np.sqrt(df['SumSquaredReturns'] * (252 / timeperiod))

    # Drop temporary columns
    df.drop(['LogReturns', 'SquaredReturns', 'SumSquaredReturns'], axis=1, inplace=True)

    return None


calculate_vix(df, timeperiod = 14, vix_col='vix_14')

calculate_vix(df, timeperiod = 16, vix_col='vix_16')

calculate_vix(df, timeperiod = 18, vix_col='vix_18')

calculate_vix(df, timeperiod = 20, vix_col='vix_20')

df = df.dropna()

### Differencing

Differencing is commonly used in time series analysis to remove linear trends and seasonality. By subtracting the current value from the previous value, it is possible to analyze the changes between data points.


    1) Removing Trends and Seasonality

    2) Focusing on Relative Changes

    3) Improving Model Stability and Learning

    4) Better Alignment with Data Characteristics

    5) Reducing Noise

In [None]:
df['change_percent'] = (
    (df['close'].shift(-1) - df['close']) / df['close']
) * 100
df = df.dropna()

### Save

Finally, we save the data that was prepared to give it to the next step, which is the learning of that data by the deep learning model

I saved the data in my Google Drive, to save it somewhere else, you need to change the code below

In [45]:
# output_file_path = "processed_eurusd.csv"
output_file_path = "/content/drive/My Drive/processed_eurusd.csv"
df.to_csv(output_file_path, index=False)