# Masoscience

**Author:** Mir Yasin Zeinaliyan

**Email:** yasinprodebian@gmail.com  

**Github:** https://github.com/yasin-pro/masoscience

**Description:** Masoscience is a project in which, with the help of data analysis and deep learning and advanced concepts related to it and even concepts related to the stock market, a model has been implemented to predict the increase or decrease in price, but it has a lot of work to do.


### import libraries

In [None]:
import pandas as pd
import numpy as np
import datetime as dt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt

### read data (this data collectted from meta trader 5)

##### EURUSD 1H



In [None]:
df = pd.read_csv("data.csv")

df['time'] = pd.to_datetime(df['time'])

df = df.sort_values(by='time')

df = df.dropna()

df = df[['open', 'high', 'low', 'close']].copy()

### calculate Rsi for 14, 16, 18, 20 period time

The Relative Strength Index (RSI) is a momentum oscillator used in technical analysis to measure the speed and change of price movements. It was developed by J. Welles Wilder and is designed to identify overbought or oversold conditions in a market.

In [None]:
def calculate_rsi(data, period=14):
    """This function for calculate rsi."""

    close_prices = data['close']

    daily_returns = close_prices.diff()

    gain = daily_returns.where(daily_returns > 0, 0)

    loss = -daily_returns.where(daily_returns < 0, 0)

    average_gain = gain.rolling(window=period).mean()

    average_loss = loss.rolling(window=period).mean()

    rs = average_gain / average_loss

    rsi = 100 - (100 / (1 + rs))

    return rsi

df["rsi_14"] = calculate_rsi(df, 14)

df["rsi_16"] = calculate_rsi(df, 16)

df["rsi_18"] = calculate_rsi(df, 18)

df["rsi_20"] = calculate_rsi(df, 20)

df = df.dropna()

### calculate Bollinger Bands for 18, 20, 22, 24 period

Bollinger Bands are a technical analysis tool developed by John Bollinger, used to measure market volatility and identify potential overbought or oversold conditions. Bollinger Bands consist of three lines, typically plotted on a price chart:

    Middle Band: This is usually a simple moving average (SMA) of the price, typically set to 20 periods.

    Upper Band: This is calculated by adding a certain number of standard deviations (usually 2) to the middle band. The standard deviation measures the dispersion of price data from the average.

    Lower Band: This is calculated by subtracting the same number of standard deviations (usually 2) from the middle band.


In [None]:
def calculate_bollinger_bands(data, period=20):
    """
    THIS FUNCTION FOR CALCULATE BOLLINGER BANDS
    """
    close_prices = data['close']

    sma = close_prices.rolling(window=period).mean()

    std = close_prices.rolling(window=period).std()

    upper_band = sma + 2 * std

    lower_band = sma - 2 * std

    return upper_band, sma, lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 18)

df['upper_band_18'] = upper_band

df['sma_18'] = sma

df['lower_band_18'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 20)

df['upper_band_20'] = upper_band

df['sma_20'] = sma

df['lower_band_20'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 22)

df['upper_band_22'] = upper_band

df['sma_22'] = sma

df['lower_band_22'] = lower_band

upper_band, sma, lower_band = calculate_bollinger_bands(df, 24)

df['upper_band_24'] = upper_band

df['sma_24'] = sma

df['lower_band_24'] = lower_band

df = df.dropna()

### calculate ATR for period (14, 16, 18, 20)

ATR (Average True Range) is a technical analysis indicator that measures market volatility by analyzing the range of an asset's price over a specific period. It was developed by J. Welles Wilder and introduced in his 1978 book "New Concepts in Technical Trading Systems."


In [None]:
def calculate_atr_14(df, atr_period=14):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_14'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_16(df, atr_period=16):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_16'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_18(df, atr_period=18):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_18'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

def calculate_atr_20(df, atr_period=20):
    """
    THIS FUNCTION FOR CALCULATE ATR
    """
    df['TR'] = df.apply(
        lambda row: max(
            row['high'] - row['low'], abs(
                row['high'] - row['close']
            ),
            abs(
                row['low'] - row['close']
            )
        ), axis=1
    )

    df['atr_20'] = df['TR'].rolling(window=atr_period).mean()

    df.drop('TR', axis=1, inplace=True)

    return df

calculate_atr_14(df, 14)

calculate_atr_16(df, 16)

calculate_atr_18(df, 18)

calculate_atr_20(df, 20)

df = df.dropna()

### calculate MACD

MACD (Moving Average Convergence Divergence) is a popular technical analysis indicator used in stock trading to identify changes in the strength, direction, momentum, and duration of a trend in a stock's price.

In [None]:
def calculate_macd(df, short_window=12, long_window=26, signal_window=9):
    short_ema = df['close'].ewm(span=short_window, adjust=False).mean()

    long_ema = df['close'].ewm(span=long_window, adjust=False).mean()

    df['macd'] = short_ema - long_ema

    return df


df = calculate_macd(df)

df = df.dropna()

### calculate ADX (14, 16, 18, 20, 22, 24, 26)

ADX (Average Directional Index) is a technical indicator used to quantify the strength of a trend, regardless of its direction. It is part of the Directional Movement System developed by J. Welles Wilder, which also includes the Positive Directional Indicator (+DI) and Negative Directional Indicator (-DI).

In [None]:
def calculate_adx(df, timeperiod=14, high_col='high', low_col='low', close_col='close', adx_col='adx_14'):

    df['High-Low'] = df[high_col] - df[low_col]
    df['High-PrevClose'] = abs(df[high_col] - df[close_col].shift(1))
    df['Low-PrevClose'] = abs(df[low_col] - df[close_col].shift(1))
    df['TR'] = df[['High-Low', 'High-PrevClose', 'Low-PrevClose']].max(axis=1)

    df['UpMove'] = df[high_col] - df[high_col].shift(1)
    df['DownMove'] = df[low_col].shift(1) - df[low_col]
    df['PlusDM'] = np.where((df['UpMove'] > df['DownMove']) & (df['UpMove'] > 0), df['UpMove'], 0)
    df['MinusDM'] = np.where((df['DownMove'] > df['UpMove']) & (df['DownMove'] > 0), df['DownMove'], 0)

    df['ATR'] = df['TR'].rolling(window=timeperiod).mean()

    df['PlusDI'] = 100 * (df['PlusDM'].rolling(window=timeperiod).sum() / df['ATR'])
    df['MinusDI'] = 100 * (df['MinusDM'].rolling(window=timeperiod).sum() / df['ATR'])

    df['RS'] = df['PlusDI'] / df['MinusDI']
    df[f'{adx_col}'] = 100 * df['RS'].ewm(span=timeperiod, adjust=False).mean()

    df.drop(['High-Low', 'High-PrevClose', 'Low-PrevClose', 'TR', 'UpMove', 'DownMove', 'PlusDM', 'MinusDM', 'ATR', 'RS', 'PlusDI', 'MinusDI'], axis=1, inplace=True)

    return None

calculate_adx(df, timeperiod = 14, adx_col='adx_14')

calculate_adx(df, timeperiod = 16, adx_col='adx_16')

calculate_adx(df, timeperiod = 18, adx_col='adx_18')

calculate_adx(df, timeperiod = 20, adx_col='adx_20')

calculate_adx(df, timeperiod = 22, adx_col='adx_22')

calculate_adx(df, timeperiod = 24, adx_col='adx_24')

calculate_adx(df, timeperiod = 26, adx_col='adx_26')

df = df.dropna()

### calculate vix (14, 16, 18, 20)

The VIX (Volatility Index), often referred to as the "Fear Gauge" or "Fear Index," is a popular measure of the stock market's expectation of volatility based on S&P 500 index options. It is calculated and published by the Chicago Board Options Exchange (CBOE) and represents the market's expectations for volatility over the next 30 days.

In [None]:
def calculate_vix(df, timeperiod=20, high_col='high', low_col='low', close_col='close', vix_col='vix_20'):

    # Calculate log returns
    df['LogReturns'] = np.log(df[close_col] / df[close_col].shift(1))

    # Calculate squared returns
    df['SquaredReturns'] = df['LogReturns'].pow(2)

    # Calculate rolling sum of squared returns
    df['SumSquaredReturns'] = df['SquaredReturns'].rolling(window=timeperiod).sum()

    # Calculate VIX
    df[vix_col] = 100 * np.sqrt(df['SumSquaredReturns'] * (252 / timeperiod))

    # Drop temporary columns
    df.drop(['LogReturns', 'SquaredReturns', 'SumSquaredReturns'], axis=1, inplace=True)

    return None


calculate_vix(df, timeperiod = 14, vix_col='vix_14')

calculate_vix(df, timeperiod = 16, vix_col='vix_16')

calculate_vix(df, timeperiod = 18, vix_col='vix_18')

calculate_vix(df, timeperiod = 20, vix_col='vix_20')

df = df.dropna()

### calculate target (close - next close)

In [None]:
df['target'] = df['close'].shift(-1) - df['close']
df = df.dropna()

# visulize data

In [None]:
import matplotlib.pyplot as plt

# Train Model

### reshape and scalar data and test train data

In [None]:
X = df.drop("target", axis=1)

y = df["target"]

scaler_X = StandardScaler()

X_scaled = scaler_X.fit_transform(X)

scaler_y = StandardScaler()

y_scaled = scaler_y.fit_transform(
    y.values.reshape(-1, 1)
)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled,
    y_scaled,
    test_size=0.2,
    random_state=42
)

X_train = X_train.reshape(
    X_train.shape[0],
    X_train.shape[1],
    1
)

X_test = X_test.reshape(
    X_test.shape[0],
    X_test.shape[1],
    1
)

### What is LSTM?

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) specifically designed to overcome the limitations of traditional RNNs in processing and predicting time series and sequential data. Unlike standard RNNs, which can struggle with issues like vanishing and exploding gradients, LSTMs are capable of maintaining information over long periods.

### Structure of LSTM

LSTM networks include memory units known as "cells," which control the flow of information through three different gates:
1. **Forget Gate**: Decides how much of the past information should be discarded.
2. **Input Gate**: Determines how much of the new information should be added to the memory.
3. **Output Gate**: Decides how much of the memory information should be used to produce the output.

This structure allows LSTMs to retain information for longer periods and is therefore well-suited for learning long-term patterns in sequential data.

### Applications of LSTM

1. **Time Series Forecasting**: LSTMs are widely used for predicting future values in time series, such as stock prices, product demand, and weather forecasting.

2. **Natural Language Processing (NLP)**: In NLP applications such as sentiment analysis, machine translation, and text generation, LSTMs can model long-term dependencies between words.

3. **Speech Recognition**: In speech-to-text conversion and speech analysis, LSTMs perform well because they can handle long sequences of audio data.

4. **Human Activity Recognition**: For analyzing sensor data to recognize human activities or predict behavior based on motion data.

5. **Financial Series Prediction**: For analyzing and predicting asset prices and other financial metrics due to the presence of complex and long-term patterns.

### Advantages of LSTM

- **Long-Term Information Retention**: Unlike traditional RNNs, LSTMs can hold onto information for extended periods.
- **Mitigation of Vanishing and Exploding Gradient Problems**: The unique structure of LSTM helps to control these issues.

### compile model

In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt

# Data preparation
# Assume df contains several features (e.g., open, close, high, low, volume, etc.)
X = df.drop("target", axis=1)  # Use 'target' for future price
y = df["target"]  # This could be the target future price

# Data scaling
scaler_X = StandardScaler()
X_scaled = scaler_X.fit_transform(X)

scaler_y = StandardScaler()
y_scaled = scaler_y.fit_transform(y.values.reshape(-1, 1))  # Scaling y for continuous models

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

# Reshape data for LSTM and GRU models
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Create LSTM model
def create_lstm_model():
    model = Sequential()
    model.add(LSTM(units=64, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2]),
                   kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01)))
    model.add(Dropout(0.2))
    model.add(LSTM(units=32, kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01)))
    model.add(Dropout(0.2))
    model.add(Dense(units=1, activation='linear', kernel_regularizer=l2(0.01)))
    model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
    return model

# Create GRU model
def create_gru_model():
    model = Sequential()
    model.add(GRU(units=64, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2]),
                  kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01)))
    model.add(Dropout(0.2))
    model.add(GRU(units=32, kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01)))
    model.add(Dropout(0.2))
    model.add(Dense(units=1, activation='linear', kernel_regularizer=l2(0.01)))
    model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
    return model

# Train models
def train_model(model, X_train, y_train, X_test, y_test):
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), callbacks=[early_stopping])
    return history

# Create and train LSTM and GRU models
lstm_model = create_lstm_model()
gru_model = create_gru_model()

print("Training LSTM model...")
lstm_history = train_model(lstm_model, X_train, y_train, X_test, y_test)
print("Training GRU model...")
gru_history = train_model(gru_model, X_train, y_train, X_test, y_test)

# Evaluate models
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)
    y_test_original = scaler_y.inverse_transform(y_test)
    y_pred_original = scaler_y.inverse_transform(y_pred)
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))
    return mae, rmse

print("Evaluating LSTM model...")
lstm_mae, lstm_rmse = evaluate_model(lstm_model, X_test, y_test)
print(f'LSTM Mean Absolute Error (MAE): {lstm_mae}')
print(f'LSTM Root Mean Squared Error (RMSE): {lstm_rmse}')

print("Evaluating GRU model...")
gru_mae, gru_rmse = evaluate_model(gru_model, X_test, y_test)
print(f'GRU Mean Absolute Error (MAE): {gru_mae}')
print(f'GRU Root Mean Squared Error (RMSE): {gru_rmse}')

# Improved auto-regressive prediction using ensemble of models
def auto_regressive_prediction(models, initial_input, n_steps=10):
    predictions = np.zeros((n_steps, len(models)))

    for i, model in enumerate(models):
        current_input = initial_input.copy()
        for step in range(n_steps):
            pred = model.predict(current_input[np.newaxis, :, :])
            predictions[step, i] = pred[0, 0]
            current_input = np.roll(current_input, shift=-1, axis=0)
            current_input[-1] = pred

    avg_predictions = np.mean(predictions, axis=1)
    return avg_predictions

# Generate improved auto-regressive predictions using ensemble of models
n_steps = 10  # Number of steps to predict
models = [lstm_model, gru_model]  # Add more models to the list for a true ensemble approach
auto_regressive_preds = auto_regressive_prediction(models, X_test[0], n_steps=n_steps)

# Convert predictions back to original scale
auto_regressive_preds_original = scaler_y.inverse_transform(np.array(auto_regressive_preds).reshape(-1, 1))

# Print auto-regressive predictions
print(f'Auto-regressive Predictions: {auto_regressive_preds_original.flatten()}')


  super().__init__(**kwargs)


Training LSTM model...
Epoch 1/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 24ms/step - loss: 1.5902 - mean_squared_error: 1.0636 - val_loss: 0.9776 - val_mean_squared_error: 0.9683
Epoch 2/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 23ms/step - loss: 1.0370 - mean_squared_error: 1.0314 - val_loss: 0.9699 - val_mean_squared_error: 0.9691
Epoch 3/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 24ms/step - loss: 1.0407 - mean_squared_error: 1.0402 - val_loss: 0.9689 - val_mean_squared_error: 0.9688
Epoch 4/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 23ms/step - loss: 0.9817 - mean_squared_error: 0.9816 - val_loss: 0.9688 - val_mean_squared_error: 0.9688
Epoch 5/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 23ms/step - loss: 0.9955 - mean_squared_error: 0.9955 - val_loss: 0.9688 - val_mean_squared_error: 0.9688
Epoch 6/50
[1m656/656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m