# Trading Strategies (RSI, Stochastic, MACD, Bollinger Bands and Keltner Channel)

### The project's main focus is on implementing the most accurate and profitable trading strategies and automating the process of trading. In this project we would be using Bitcoin cryptocurrency as it is known for its volatility in the market. Almost all of 5 strategies presented below are built upon measuring volatility and resistance levels of the price.

#### Bibliography: 
1. **Investopedia** https://www.investopedia.com
2. **RBC Journal (РБК)** https://www.rbc.ru
3. **Technical Analysis Library in Python** https://technical-analysis-library-in-python.readthedocs.io/en/latest/index.html
4. **GitHub** https://github.com
5. **Livermore –– "How to trade in stocks" (pdf)** https://www.trendfollowing.com/pdfs/Jesse_Livermore-How_To_Trade_In_Stocks_(1940_original)-EN.pdf
6. **Steve Nison –– Japanese Candlestick (pdf)** https://dl.kohanfx.com/pdf/stevie-nison-candlestick-(KohanFx.com).pdf 
7. **Algorithmic Trading with Python (YouTube)** https://www.youtube.com/playlist?list=PLwEOixRFAUxZmM26EYI1uYtJG39HDW1zm
8. **ML-Quantitative Finance** https://www.ml-quant.com

## 0. Importing libraries and getting access to Binance account

In [1]:
# For trading:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import plotly.graph_objects as go
from yahooquery import Ticker
from binance.client import Client
import pandas_ta as ta
from ta.volatility import BollingerBands
from ta.volatility import KeltnerChannel



# For ML: 
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.utils import shuffle
from sklearn.metrics import (roc_curve,
                             roc_auc_score,
                             f1_score,
                             confusion_matrix,
                             precision_score,
                             recall_score,
                             accuracy_score)
from imblearn.over_sampling import SMOTE

In [2]:
pd.options.mode.chained_assignment = None #avoids getting "SettingWithCopyWarning" type of error

In [3]:
key = ''

rsa =''

In [4]:
client = Client(key,rsa) #initializing the session in binance account

In [5]:
#getting data within the given interval and provided ticker of cryptocurrency
def get_data(ticker_, interval,start, end):
    frame = pd.DataFrame(client.get_historical_klines(ticker_,interval, start, end))
    frame = frame.iloc[:,:6]
    frame.columns = ['time','open', 'high' , 'low' , 'close' , 'volume']
    frame= frame.set_index('time')
    frame.index = pd.to_datetime(frame.index,unit = 'ms')
    frame = frame.astype(float)
    return frame

In [6]:
ticker_btc = 'BTCUSDT'
start =  '10 MAR 2022'
end = '11 MAR 2023'

data_btc = get_data(ticker_btc,'5m',start,end)
data_btc = data_btc.reset_index(drop = True)
data_btc

Unnamed: 0,open,high,low,close,volume
0,41941.70,41984.41,41853.99,41972.65,201.44320
1,41972.65,42011.00,41851.57,41891.17,146.49099
2,41891.18,41936.47,41846.27,41886.02,106.92714
3,41886.02,42039.63,41872.13,42021.32,150.15897
4,42021.33,42031.21,41941.00,41951.83,120.97185
...,...,...,...,...,...
105404,20180.33,20208.66,20164.49,20205.51,1224.17009
105405,20205.51,20240.40,20195.75,20209.57,1551.02788
105406,20208.71,20210.50,20096.01,20135.35,2357.42902
105407,20135.37,20167.21,20127.62,20150.69,1153.00136


## 1. RSI

According to Investopedia (https://www.investopedia.com/terms/r/rsi.asp): 

**RSI (Relative Strength Index)** is a momentum indicator used in technical analysis. RSI measures the speed and magnitude of a security's recent price changes to evaluate overvalued or undervalued conditions in the price of that security.

* **Calculation**: RSI is calculated by *pandas-ta* library according to the following formula:

    ${RSI} = \displaystyle100 - \frac{100}{1+\displaystyle\frac{EMA(Up)}{EMA(Down)}}$

    EMA is the notation for Exponential Moving Average. The source: https://www.investopedia.com/ask/answers/122314/what-exponential-moving-average-ema-formula-and-how-ema-calculated.asp


* **Strategy for RSI Index** is as follows: If the RSI index is lower that 30, then the price will go up. If the RSI index is above 70, then the price will go down. Basically, 30 and 70 are boundary levels: the stock is considered oversold when RSI < 30 and considered overbought when RSI > 70



![](rsii.jpeg)

### 1.01 Calculating RSI

In [7]:
data_btc['RSI_14'] = ta.rsi(data_btc['close'], timeperiod = 14)
# data_btc

### 1.02 RSI Strategy

In [8]:
rsi_oversold = 30
rsi_overbought = 70

# Creating a new column for signals. 1 for the price to go up, -1 for the price to go down
data_btc['RSI_Signal'] = 0

In [9]:
# data_btc['RSI_14'].plot(kind = 'line');

In [10]:
def rsi_strategy(df):
    for i in range(1, len(df) - 1): 
        if (df['RSI_14'][i] < rsi_overbought and
            df['RSI_14'][i+1] > rsi_overbought):
            df['RSI_Signal'][i+1] = -1 #the price will go down
            
        if (df['RSI_14'][i] > rsi_oversold and
            df['RSI_14'][i+1] < rsi_oversold):
            df['RSI_Signal'][i+1] = 1 #the price will go up 
        
            
rsi_strategy(data_btc)
data_btc['RSI_Signal'].value_counts() #checking the number of signals

 0    103023
 1      1222
-1      1164
Name: RSI_Signal, dtype: int64

#### The percentage of signals is very low, which is obvioius – RSI is mostly sensitive to global trend reversals and cannot perform better than that. Nevertheless, we still need it to make conclusions about the trend dynamics

## 2. Stochastic Oscillator

According to RBC Journal (https://quote.rbc.ru/news/article/628ccd3a9a79474ed43db40f):

**Stochastic Oscillator** is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time. The sensitivity of the oscillator to market movements is reducible by adjusting that time period or by taking a moving average of the result. It is used to generate overbought and oversold trading signals, utilizing a 0–100 bounded range of values.

**Calculation of Stochastic Oscillator** consists of two components: *%K* and *%D*. 
* *%K* is called "fast" stochastic oscillator, which is calculated as follows:

    ${\%K} = \displaystyle \left( \frac{C - L14}{H14 - L14} \right) \times 100$ , where

    $C$ –– the most recent closing price

    $L14$ –– the lowest price traded within the last 14-day period 

    $H14$ –– the highest price traded within the last 14-day period



* *%D* is called "slow" stochastic oscillator, which is calculated as 3-day moving average of %K




**Stochastic Oscillator Strategy** that we use is one of the safest when trading on the market. The price must go down when %K crosses %D from above and both of them are above 70. We obtain a bullish signal when %K crosses %D from below and both of them are below 30. The strategy is similar to RSI Indicator strategy. When stochastic oscillator is above 80 –– the stock is considered overbought, below 20 –– oversold. Alternative strategy can be not taking into account oversold and overbought levels, but it is more risky

![](stoch.jpeg)

### 2.01 Calculating Stochastic Oscillator

In [11]:
data_btc['%K'] = tech.momentum.stoch(data_btc.high, data_btc.low, data_btc.close, window = 14, smooth_window = 3)
data_btc['%D'] = data_btc['%K'].rolling(window = 3).mean()

# data_btc.reset_index(drop = True).head(16) check to see where %K and %D starts

### 2.02 Stochastic Oscillator Strategy 

In [12]:
data_btc['Stoch_Signal'] = 0
def stoch_osc_1(df):
    for i in range(len(df)-1):
        if (df['%K'][i] < df['%D'][i] and
            df['%K'][i+1] > df['%D'][i+1] and
            df['%K'][i+1] < 30):
            
            df['Stoch_Signal'][i] = 1 #the price will go up
            
            
        if (df['%K'][i] > df['%D'][i] and
            df['%K'][i+1] < df['%D'][i+1] and
            df['%K'][i+1] > 70):
            
            df['Stoch_Signal'][i] = -1 #the price will go down
            
stoch_osc_1(data_btc)

data_btc['Stoch_Signal'].value_counts()

 0    94090
-1     5801
 1     5518
Name: Stoch_Signal, dtype: int64

#### 5.19%

In [13]:
# data_btc['%K'].plot(kind = 'line');

## 3. Bollinger Bands 

According to Investopedia (https://www.investopedia.com/terms/b/bollingerbands.asp):

**Bollinger Bands** is a technical analysis tool defined by a set of trendlines. They are plotted as two standard deviations, both positively and negatively, away from a simple moving average (SMA) of a security's price and can be adjusted to user preferences.

* **Bollinger Bands Indicator** is constructed by calculating 20-day Simple Moving Average (SMA) –– Middle Bollinger Band. In our project we used **ta.volatility** library. The function **ta.volatility.BollingerBands** have 3 arguments: close price, window (period for which the bollinger bands are calculated) and window_dev (the amount of standard deviations from 20-day SMA). Upper and lower bands are calculated as follows:

    $Up = SMA + 2\sigma \left[TP, n \right] $

    $Down = SMA - 2\sigma \left[TP, n \right] $, where

    $TP = \displaystyle \frac{High + Low + Close}{3}$ –– typical price 

    $\sigma \left[ TP, n \right]$ –– the standard deviation of typical price within $n$ days
    
    
    
* **Bollinger Bands Strategy**. The strategy is based upon the so called "support and resistance levels": when the upper bollinger band is crossed from below, the price must return to the range between upper and lower band. The signal to short stock is obtained when the price crosses upper band from above. Same with the lower band: when the price crosses the lower band from above, we expect the price to bounce off of the lower bollinger band or recover within short time. The signal to buy is obtained when the lower band is crossed by the price from below. The signals are generated with the function **bollinger_hband_indicator()** and **bollinger_lband_indicator()**. The first one returns 1 when price crosses the higher band from below (sell) and the latter returns 1 when price crosses the lower band from above (buy)

![](boll.png)

### 3.01 Calculating Bollinger Bands

In [14]:
indicator_bb = BollingerBands(close = data_btc["close"], window=20, window_dev=2)

### 3.02 Bollinger Bands Strategy

In [15]:
def bollinger(df, window = 20, window_dev=2):
    df['bb_h'] = -(indicator_bb.bollinger_hband_indicator()) #we receive a signal -1 (the price will go down)
    df['bb_l'] = indicator_bb.bollinger_lband_indicator() #we receive a signal 1 (the price will go up)
    
    return df


bollinger(data_btc)

Unnamed: 0,open,high,low,close,volume,RSI_14,RSI_Signal,%K,%D,Stoch_Signal,bb_h,bb_l
0,41941.70,41984.41,41853.99,41972.65,201.44320,,0,,,0,-0.0,0.0
1,41972.65,42011.00,41851.57,41891.17,146.49099,,0,,,0,-0.0,0.0
2,41891.18,41936.47,41846.27,41886.02,106.92714,,0,,,0,-0.0,0.0
3,41886.02,42039.63,41872.13,42021.32,150.15897,,0,,,0,-0.0,0.0
4,42021.33,42031.21,41941.00,41951.83,120.97185,,0,,,0,-0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
105404,20180.33,20208.66,20164.49,20205.51,1224.17009,66.256279,0,98.315688,91.628218,-1,-1.0,0.0
105405,20205.51,20240.40,20195.75,20209.57,1551.02788,66.674705,0,85.906930,91.848152,0,-0.0,0.0
105406,20208.71,20210.50,20096.01,20135.35,2357.42902,53.591807,0,51.979338,78.733985,0,-0.0,0.0
105407,20135.37,20167.21,20127.62,20150.69,1153.00136,55.533861,0,45.534576,61.140281,0,-0.0,0.0


#### 7.61%

## 4. MACD Strategy

According to Investopedia (https://www.investopedia.com/terms/m/macd.asp):

**MACD (Moving average Convergence/Divergence)** is a trend-following momentum indicator that shows the relationship between two exponential moving averages (EMAs) of a security’s price. 

* **Calculation**.The MACD line is calculated by subtracting the 26-period EMA from the 12-period EMA.
   
   **MACD** = EMA12 −  EMA26, where
   
   **EMA12** –– Exponential Moving Average, calculated within the last 12 days; **EMA26** –– Exponential Moving Average, calculated within the last 26 days
   
* **The MACD Strategy** is based upon the fact of MACD line and **Signal Line** cross. **Signal Line** is calculated as 9-day moving average of MACD. We expect the price to go up when MACD Line crosses Signal Line from below, and expect price to go down when MACD Line crosses Signal Line from above

![](macd.jpeg)

### 4.01 Calculating MACD

In [16]:
def get_macd(df):
    df['MACD'] = ta.ema(df.close, length = 12) - ta.ema(df.close, length = 26)
    df['MACD_EMA'] = ta.ema(df.MACD, length = 9)
    return df

get_macd(data_btc)

Unnamed: 0,open,high,low,close,volume,RSI_14,RSI_Signal,%K,%D,Stoch_Signal,bb_h,bb_l,MACD,MACD_EMA
0,41941.70,41984.41,41853.99,41972.65,201.44320,,0,,,0,-0.0,0.0,,
1,41972.65,42011.00,41851.57,41891.17,146.49099,,0,,,0,-0.0,0.0,,
2,41891.18,41936.47,41846.27,41886.02,106.92714,,0,,,0,-0.0,0.0,,
3,41886.02,42039.63,41872.13,42021.32,150.15897,,0,,,0,-0.0,0.0,,
4,42021.33,42031.21,41941.00,41951.83,120.97185,,0,,,0,-0.0,0.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105404,20180.33,20208.66,20164.49,20205.51,1224.17009,66.256279,0,98.315688,91.628218,-1,-1.0,0.0,45.139079,41.266796
105405,20205.51,20240.40,20195.75,20209.57,1551.02788,66.674705,0,85.906930,91.848152,0,-0.0,0.0,46.695613,42.352559
105406,20208.71,20210.50,20096.01,20135.35,2357.42902,53.591807,0,51.979338,78.733985,0,-0.0,0.0,41.462292,42.174506
105407,20135.37,20167.21,20127.62,20150.69,1153.00136,55.533861,0,45.534576,61.140281,0,-0.0,0.0,38.113310,41.362267


### 4.02 MACD Strategy

In [17]:
data_btc['MACD_Signal'] = 0 
def macd_strategy(df):
    for i in range(len(df)-1):
        
        if (df['MACD'][i] < df['MACD_EMA'][i] and
            df['MACD'][i+1] > df['MACD_EMA'][i+1]):
            df['MACD_Signal'][i+1] = -1 #signal for a price to go down
        
        if (df['MACD'][i] > df['MACD_EMA'][i] and
            df['MACD'][i+1] < df['MACD_EMA'][i+1]):
            df['MACD_Signal'][i+1] = 1 #signal for a price to go up 
            
    return df
macd_strategy(data_btc)
data_btc['MACD_Signal'].value_counts()

 0    96839
-1     4285
 1     4285
Name: MACD_Signal, dtype: int64

#### 6.9%

## 5. Keltner Channel

According to Investopedia (https://www.investopedia.com/terms/k/keltnerchannel.asp):

**Keltner Channel** is a volatility-based technical indicator composed of three separate lines. The middle line is an exponential moving average (EMA) of the price. Additional lines are placed above and below the EMA. The upper band is typically set two times the ATR above the EMA, and the lower band is typically set two times the ATR below the EMA.

* **Calculation**. Keltner Channel is calculated in the same way as Bollinger Bands, via ta.volatility library. However, the calculation of Keltner Channel is based primarily upon 20-day Exponential Moving Averages, not on Simple Moving Averages, which makes it more sensitive to price changes and a little bit more risky. Moreover, instead of standard deviation we use **ATR** –- Average True Range (typically over 10 days). The formula for Keltner Channel is as follows:
    
    $Up = EMA_{20} + 2 \times ATR_{10}$
    
    $Down = EMA_{20} - 2 \times ATR_{10}$, where 
    
    $ATR_{10} = \left(\displaystyle \frac{1}{n} \right) \displaystyle \sum_{i}^{n} TR_i $
    
    $TR_i = max \left[ |H_i-L_i|, |H_i-C_{i-1}^p|, |L_i-C_{i-1}^p|  \right]$
    
    $H_i$ –– today's high price
    
    $L_i$ –– today's low price
    
    $C_{i-1}^p$ –– yesterday's closing price
    
    
* **The Strategy for Keltner Channel** follows the same logic as Bollinger Bands do: when the upper band is crossed from above (**keltner_channel_hband_indicator()**), we expect a bearish trend (down); when the lower band is crossed from below, we expect a bullish trend (up)

![](kelt.jpg)

### 5.01 Calculating Keltner Channel

In [18]:
indicator_kc = KeltnerChannel(data_btc.high, data_btc.low, data_btc.close, window = 20, window_atr = 10)

### 5.02 Keltner Channel Strategy

In [19]:
def keltner_channel(df):
    df['kc_h'] = -(indicator_kc.keltner_channel_hband_indicator()) #returns -1 when close price is higher than upper band (price will go down)
    df['kc_l'] = indicator_kc.keltner_channel_lband_indicator() #returns -1 when close price is lower than upper band (price will go up)
    df['kc_l'] = df['kc_l'].astype('int')
    df['kc_h'] = df['kc_h'].astype('int')
    return df

keltner_channel(data_btc)
# amount of bearish signals:
data_btc.kc_l.value_counts(normalize = True).round(2)

0    0.78
1    0.22
Name: kc_l, dtype: float64

#### 22.8%

# Machine Learning Applications to Trading Signals:
1. **Check the amount of true calls. It is important for us as the model will be trained on this data, and if target variable shows low accuracy, then the model will not perform better than 0.5 (maybe even lower)**
2. **Combining strategies with the most accurate rate of calls (> 70%)**
3. **Build a model, which is going to train on the strategy with the highest rate of true calls**
4. **Try to upgrade the performance of the model by using upsampling technique**

## 1. True calls

In [20]:
def true_calls(df, signal):
    count = 0
    for i in range(len(df)):
        if df[signal][i] == 1:
            if df['close'][i+1] > df['close'][i]:
                count += 1
    percentage = count / len(df[df[signal] == 1])
    return round(percentage, 2)

In [21]:
signals =  ['RSI_Signal', 'Stoch_Signal', 'bb_l', 'MACD_Signal', 'kc_l']

for signal in signals:
    print(signal, '––', true_calls(data_btc, signal))


RSI_Signal –– 0.59
Stoch_Signal –– 0.74
bb_l –– 0.57
MACD_Signal –– 0.53
kc_l –– 0.55


**We have measured the probability of true calls for each trading strategy for the year. Further work will be build upon top-3 strategies: 1. RSI, 2. Stochastic Oscillator, 3. Bollinger Bands. Next step is to check the combination of these strategies and come up with the higher probability of true calls**

## 2. Combinations of trading strategies
**A call of a combined strategy is a call, when two of the chosen strategies have made a call simultaneously**

### 2.1 RSI & Stochastic Oscillator

In [22]:
data_btc['RSI_Stoch_Signal'] = [1 if (data_btc['RSI_Signal'][i] == 1 and data_btc['Stoch_Signal'][i] == 1) else 0 for i in range(len(data_btc))]

data_btc.RSI_Stoch_Signal.value_counts()

0    105009
1       400
Name: RSI_Stoch_Signal, dtype: int64

In [23]:
true_calls(data_btc, 'RSI_Stoch_Signal')

0.75

In [24]:
#Percentage of calls
data_btc.RSI_Stoch_Signal.value_counts(normalize = True).round(4)

0    0.9962
1    0.0038
Name: RSI_Stoch_Signal, dtype: float64

**Disbalance is 1 to 263,5. We will need that in part 3**

### 2.2 RSI & Bollinger Bands

In [25]:
data_btc['RSI_Bollinger_Signal'] = [1 if (data_btc['RSI_Signal'][i] == 1 and data_btc['bb_l'][i] == 1) else 0 for i in range(len(data_btc))]
data_btc['RSI_Bollinger_Signal'].value_counts()

0    104503
1       906
Name: RSI_Bollinger_Signal, dtype: int64

In [26]:
true_calls(data_btc, 'RSI_Bollinger_Signal')

0.59

### 2.3 Stochastic Oscillator & Bollinger Bands

In [27]:
data_btc['Stoch_Bollinger_Signal'] = [1 if (data_btc['Stoch_Signal'][i] == 1 and data_btc['bb_l'][i] == 1) else 0 for i in range(len(data_btc))]
data_btc['Stoch_Bollinger_Signal'].value_counts()

0    103511
1      1898
Name: Stoch_Bollinger_Signal, dtype: int64

In [28]:
true_calls(data_btc, 'Stoch_Bollinger_Signal')

0.73

### 2.4 MACD & Stochastic Oscillator (Bonus)

In [29]:
data_btc['MACD_Stoch_Signal'] = [1 if (data_btc['Stoch_Signal'][i] == 1 and data_btc['MACD_Signal'][i] == 1) else 0 for i in range(len(data_btc))]
true_calls(data_btc, 'MACD_Stoch_Signal')

0.7

**To conclude, the best trading strategy is RSI & Stochastic Indicator. RSI is an indicator of global trend reversals and overall movements of the price, whereas Stochastic Indicator is a more volatile indicator (adopts to price movement quicker than RSI), that enables one to trade on a 5-min timeframe. Next step is to build a model which is going to train on the data, obtained by this strategy**

## 3. Building a model
**We will choose optimal model for RSI & Stochastic Oscillator. For features dataset we will need only *open, close, low, high and volume*, and for target variable we will choose *RSI_Stoch_Signal*. Dataset is to be divided in proportion 60/20/20 as we don't have a testing sample. 60% for train dataset, 20% for validation, 20% for testing**

### 3.0 Preparing the data

In [30]:
# data_btc

In [31]:
df = data_btc[['open', 'close', 'low', 'high', 'volume', 'RSI_14', 'RSI_Stoch_Signal']]
df = df[14:] #for RSI_14 column, because we cannot fit the model with NaN values
df

Unnamed: 0,open,close,low,high,volume,RSI_14,RSI_Stoch_Signal
14,41736.66,41790.01,41724.51,41810.20,207.97724,38.798444,0
15,41790.01,41844.67,41783.63,41888.40,91.08533,45.930703,0
16,41844.66,41892.19,41838.44,41898.48,83.53489,51.249741,0
17,41892.18,41896.89,41892.18,41950.00,114.71618,51.755261,0
18,41896.90,41835.31,41815.00,41919.79,126.44188,45.149260,0
...,...,...,...,...,...,...,...
105404,20180.33,20205.51,20164.49,20208.66,1224.17009,66.256279,0
105405,20205.51,20209.57,20195.75,20240.40,1551.02788,66.674705,0
105406,20208.71,20135.35,20096.01,20210.50,2357.42902,53.591807,0
105407,20135.37,20150.69,20127.62,20167.21,1153.00136,55.533861,0


**Dividing the dataset:**

In [32]:
df_train, df_valid = train_test_split(df, test_size = 0.40, random_state = 12345)

df_valid, df_test = train_test_split(df_valid, test_size = 0.50, random_state = 12345)

In [33]:
features_train = df_train.drop(['RSI_Stoch_Signal'], axis = 1)
target_train = df_train['RSI_Stoch_Signal']

features_valid = df_valid.drop(['RSI_Stoch_Signal'], axis = 1)
target_valid = df_valid['RSI_Stoch_Signal']

features_test = df_test.drop(['RSI_Stoch_Signal'], axis = 1)
target_test = df_test['RSI_Stoch_Signal']

**Checking the shapes of obtained datasets:**

In [34]:
features_train.shape, target_train.shape, features_valid.shape, target_valid.shape, features_test.shape, target_test.shape

((63237, 6), (63237,), (21079, 6), (21079,), (21079, 6), (21079,))

**The column *volume* have smaller and model will think that values in this column are less important that values in other columns, which is not true. We will need to standardize features using *ScalerTransform()* function**

In [35]:
numeric = ['open', 'close', 'low', 'high', 'close', 'volume']
scaler = StandardScaler()
scaler.fit(features_train[numeric]) 

features_train[numeric] = scaler.transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])

In [36]:
def stats(predictions, target_valid, features_valid):
    print('Accuracy:', accuracy_score(predictions, target_valid))
    print('Recall:', recall_score(predictions, target_valid))
    print('Precision:',  precision_score(predictions, target_valid))
    print('F1:', f1_score(predictions, target_valid))
    
    probabilities_one_valid = model.predict_proba(features_valid)[:, 1]
    print('AUC-ROC:', roc_auc_score(target_valid, probabilities_one_valid))

## 4. Creating a model that predicts the truthfulness of a call

### 4.0 Preparing the dataset

In [77]:
df = df.reset_index(drop = True)
df['True'] = [1 if df['RSI_Stoch_Signal'][i] == 1 and df['close'][i+1] > df['close'][i] else 0 for i in range(len(df))]
dfs = df[df['RSI_Stoch_Signal'] == 1]
dfs['True'].value_counts()

1    299
0    101
Name: True, dtype: int64

In [78]:
dfs = dfs.drop('RSI_Stoch_Signal', axis = 1) # we won't need column RSI_Stoch_Signal as a feature, because it consists only of "1"
dfs

Unnamed: 0,open,close,low,high,volume,RSI_14,True
84,39104.28,38957.11,38931.51,39145.39,467.04180,27.864183,1
291,38953.99,38683.44,38636.36,38993.80,675.37292,23.446349,1
296,38800.01,38605.16,38570.64,38800.01,459.04508,28.534311,1
300,38600.01,38303.73,38248.37,38600.01,1037.04007,24.782087,1
523,38592.08,38364.53,38347.98,38622.21,644.49354,27.046748,0
...,...,...,...,...,...,...,...
104595,22077.59,22039.43,22038.81,22083.95,1367.49554,25.236896,1
105116,20105.84,20068.25,20040.01,20109.70,2315.53183,28.946120,1
105161,19937.47,19880.48,19866.41,19940.52,2578.62017,26.332535,1
105164,19898.36,19864.90,19861.53,19907.78,1067.02503,28.415032,1


In [79]:
dfs_train, dfs_valid = train_test_split(dfs, test_size = 0.20, random_state = rndst)

dfs_valid, dfs_test = train_test_split(dfs_valid, test_size = 0.50, random_state = rndst)

In [80]:
features_train = dfs_train.drop(['True'], axis = 1)
target_train = dfs_train['True']

features_valid = dfs_valid.drop(['True'], axis = 1)
target_valid = dfs_valid['True']

features_test = dfs_test.drop(['True'], axis = 1)
target_test = dfs_test['True']

### 4.1 Models
**We will use Random Forest Classifier and Logistic Regression with standardized features. Firstly we will try to train the model with imbalance of classes. Next step is to check the balanced weights.**

#### 4.1.1 Logistic Regression

In [90]:
model = LogisticRegression(max_iter = 10000)
model.fit(features_train, target_train)
predictions = model.predict(features_test)

stats(predictions, target_valid, features_test)

Accuracy: 0.725
Recall: 0.725
Precision: 1.0
F1: 0.8405797101449275
AUC-ROC: 0.5329153605015674


In [105]:
oversample = SMOTE(random_state = rndst)
features_train_up, target_train_up = oversample.fit_resample(features_train, target_train)
features_train_down, target_train_down = downsampling(features_train, target_train, 0.333) 

model = LogisticRegression(max_iter = 10000)
model.fit(features_train_up, target_train_up)
predictions = model.predict(features_test)

stats(predictions, target_valid, features_test)

Accuracy: 0.6
Recall: 0.76
Precision: 0.6551724137931034
F1: 0.7037037037037037
AUC-ROC: 0.5329153605015673


#### 4.1.2 Random Forest Classifier

In [56]:
best_f1 = 0
best_estimators = 0
best_depth = 0
best_criterion = ''

for estimator in range(1,101,10):
    for depth in range(1,21):
        for criterion in ['gini', 'entropy']:
            model = RandomForestClassifier(
                                            n_estimators = estimator, 
                                            max_features = 'sqrt', 
                                            max_depth = depth,
                                            criterion = criterion,
                                           )
            model.fit(features_train, target_train)

            predictions = model.predict(features_valid)
            f1 = f1_score(target_valid, predictions)
            if f1 > best_f1:
                best_f1 = f1
                best_estimators = estimator
                best_depth=depth
                best_criterion = criterion
                                
params = {'Best Estimator' : best_estimators,
          'Best Depth' : best_depth,
          'Best Criterion' : best_criterion
         }
                        
print('Best F1:', best_f1)
print('Best parameters:', params)

Best F1: 0.8857142857142858
Best parameters: {'Best Estimator': 31, 'Best Depth': 5, 'Best Criterion': 'entropy'}


In [113]:
model = RandomForestClassifier(n_estimators = 31, max_depth = 5, criterion = 'entropy')
model.fit(features_train_up, target_train_up)

predictions = model.predict(features_test)

stats(predictions, target_valid, features_test)

Accuracy: 0.575
Recall: 0.75
Precision: 0.6206896551724138
F1: 0.679245283018868
AUC-ROC: 0.5485893416927901


In [115]:
model = RandomForestClassifier(n_estimators = 31, max_depth = 5, criterion = '')
model.fit(features_train_down, target_train_down)

predictions = model.predict(features_valid)

stats(predictions, target_valid, features_valid)

Accuracy: 0.725
Recall: 0.725
Precision: 1.0
F1: 0.8405797101449275
AUC-ROC: 0.4341692789968652


## 5. Creating a model that predicts the truthfulness of a call (bigger dataset)

In [59]:
df

Unnamed: 0,open,close,low,high,volume,RSI_14,RSI_Stoch_Signal,True
0,41736.66,41790.01,41724.51,41810.20,207.97724,38.798444,0,0
1,41790.01,41844.67,41783.63,41888.40,91.08533,45.930703,0,0
2,41844.66,41892.19,41838.44,41898.48,83.53489,51.249741,0,0
3,41892.18,41896.89,41892.18,41950.00,114.71618,51.755261,0,0
4,41896.90,41835.31,41815.00,41919.79,126.44188,45.149260,0,0
...,...,...,...,...,...,...,...,...
105390,20180.33,20205.51,20164.49,20208.66,1224.17009,66.256279,0,0
105391,20205.51,20209.57,20195.75,20240.40,1551.02788,66.674705,0,0
105392,20208.71,20135.35,20096.01,20210.50,2357.42902,53.591807,0,0
105393,20135.37,20150.69,20127.62,20167.21,1153.00136,55.533861,0,0


In [60]:
dff = pd.DataFrame()      

for i in range(len(df)):
    if df['RSI_Stoch_Signal'][i] == 1:
        if df['close'][i+1] > df['close'][i]:
            
            dff = pd.concat([dff, df[i:i-4:-1]])
            
dff = dff.reset_index(drop = True)            
dff['True'].value_counts(normalize = True).round(3)

0    0.732
1    0.268
Name: True, dtype: float64

In [61]:
dff_train, dff_valid = train_test_split(dff, test_size = 0.20, random_state = rndst)

dff_valid, dff_test = train_test_split(dff_valid, test_size = 0.50, random_state = rndst)

In [62]:
features_train = dff_train.drop(['True'], axis = 1)
target_train = dff_train['True']

features_valid = dff_valid.drop(['True'], axis = 1)
target_valid = dff_valid['True']

features_test = dff_test.drop(['True'], axis = 1)
target_test = dff_test['True']

In [63]:
dff['True'].value_counts()

0    876
1    320
Name: True, dtype: int64

### Logistic Regression:

In [116]:
model = LogisticRegression(max_iter = 10000)

model.fit(features_train, target_train)
predictions = model.predict(features_valid)

stats(predictions, target_valid, features_test)

Accuracy: 0.725
Recall: 0.725
Precision: 1.0
F1: 0.8405797101449275
AUC-ROC: 0.5329153605015674


In [119]:
oversample = SMOTE(random_state = rndst)
features_train_up, target_train_up = oversample.fit_resample(features_train, target_train)

model.fit(features_train_up, target_train_up)
predictions = model.predict(features_test)

stats(predictions, target_valid, features_test)

Accuracy: 0.6
Recall: 0.76
Precision: 0.6551724137931034
F1: 0.7037037037037037
AUC-ROC: 0.5329153605015673


In [118]:
features_train_down, target_train_down = downsampling(features_train, target_train, 0.333) 

model.fit(features_train_down, target_train_down)
predictions = model.predict(features_valid)

stats(predictions, target_valid, features_test)

Accuracy: 0.725
Recall: 0.725
Precision: 1.0
F1: 0.8405797101449275
AUC-ROC: 0.4075235109717868


### Random Forest Classifier:

In [120]:
best_f1 = 0
best_estimators = 0
best_depth = 0
best_criterion = ''

for estimator in range(1,101,10):
    for depth in range(1,21):
        for criterion in ['gini', 'entropy']:
            model = RandomForestClassifier(
                                            n_estimators = estimator, 
                                            max_features = 'sqrt', 
                                            max_depth = depth,
                                            criterion = criterion,
                                           )
            model.fit(features_train, target_train)

            predictions = model.predict(features_valid)
            f1 = f1_score(target_valid, predictions)
            if f1 > best_f1:
                best_f1 = f1
                best_estimators = estimator
                best_depth=depth
                best_criterion = criterion
                                
params = {'Best Estimator' : best_estimators,
          'Best Depth' : best_depth,
          'Best Criterion' : best_criterion
         }
                        
print('Best F1:', best_f1)
print('Best parameters:', params)

Best F1: 0.8656716417910448
Best parameters: {'Best Estimator': 1, 'Best Depth': 7, 'Best Criterion': 'entropy'}


In [123]:
model = RandomForestClassifier(n_estimators = 1, max_depth = 7, criterion = 'entropy')

model.fit(features_train, target_train)
predictions = model.predict(features_test)
stats(predictions, target_valid, features_test)

Accuracy: 0.675
Recall: 0.7352941176470589
Precision: 0.8620689655172413
F1: 0.7936507936507937
AUC-ROC: 0.5501567398119123


### Gradient Boosting Classifier

In [125]:
best_f1
best_features = 0
best_depth = 0
best_estimator = 0
for estimator in range(1,101,10):
    for feature_ in range(1,5):
        for depth in range(1,8):
            grad_boost = GradientBoostingClassifier(n_estimators=estimator, learning_rate=1, max_features=feature_, max_depth=depth,random_state = 12345)
            
            model = grad_boost.fit(features_train, target_train)
            predictions = model.predict(features_valid)
            f1 = f1_score(target_valid, predictions)
            if f1 > best_f1:
                best_estimator = estimator
                best_features = feature_
                best_depth = depth
                
                
params = {'Best Estimator' : estimator,
          'Best Features' : feature_,
          'Best Depth' : depth,
          'Best F1' : best_f1}

params

{'Best Estimator': 91,
 'Best Features': 4,
 'Best Depth': 7,
 'Best F1': 0.8656716417910448}

In [133]:
model = GradientBoostingClassifier(n_estimators = 91, max_features = 4, max_depth = 7)
model.fit(features_train, target_train)

predictions = model.predict(features_test)

stats(predictions, target_valid, features_test)

Accuracy: 0.7
Recall: 0.7297297297297297
Precision: 0.9310344827586207
F1: 0.8181818181818181
AUC-ROC: 0.567398119122257


# Project Report:
**The work has been completed in a form of a group project. Group members are: Titov Georgy Alexandrovich and Frolov Anatolii Alexandrovich, 2 year students of ICEF HSE. During the work we have used the platform *Jupyter Notebook*. The project is divided into 5 parts: 0. Importing Libraries and getting access to Binance Account, 1. Calculating Trading Strategies, 2.Combinations of trading strategies, 3. Machine Learning Applications to trading, 4. Evaluation of best models (includes training and validation of models and also checking the model adequacy using the calculation of different metrics that show the performance of the model)**

**Results of the project:**
1. Imported necessary libraries for work
2. Successfully gained access to binance account
3. Successfully gained access to binance data (historical data of price movement and volumes). Obtained data is the following metrics: open price, close price, highest and lowest price, and also volume of trading (the most important one is close price)
4. Successfully calculated **RSI Index** strategy and determined the breakout points (where to sell and where to buy)
5. Calculated Stochastic Oscillator and determined the strategy of trading
6. Empirically proved, that **Stochastic Oscillator** and **RSI Index** work together pretty well as they're both a volatile indicator and reflect the reverse trend movement
7. Calculated **Bollinger Bands** and determined the trading strategy. Each band (lower, higher and middle) was calculated separately. It can be done either by built-in functions, or by hands. In our case we used built-in functions from ***ta*** or ***pandas.ta*** library
8. Calculated **MACD** trading strategy. The same logic of building works for this strategy and below
9. Successfully implemented the strategy of Keltner Channel. We have used our background knowledge to determine, whether the chosen trading strategy would be profitable (or useful) or not. In our case, **Keltner Channel** is a more volatile type of **Bollinger Bands**, therefore we would not take this strategy seriously.
10. Empirically proved, that **Keltner Channel** would not work better than **Bollinger Bands** and other strategies in our case
11. Tried all of the most profitable combinations of trading strategies. During this procedure we have accounted for the number of calls to *long* or *short*, meaning when to buy or sell (in 1 year) and the truthfullness of calls. The most truthfull trading strategy has appeared to be **Stochastic Oscillator, RSI Index** and **Bollinger Bands**. We have tried all combinations of these trading strategies and **reached the truthfulness of calls equal to 75%**, which is pretty valid result, that has been shown by the strategy **RSI Index and Stochastic Oscillator**.
12. Based on obtained data from the combination of RSI and Stochastic, we have tried to train different models that would determine the truthfulness of an obtained call of a particular trading strategy (here: RSI Index & Stochastic Oscillator): **Logistic Regression, RandomForestClassifier and GradientBoostingClassifier**
13. For each trained model we have measured the following metrics: **Accuracy, Precision, AUC-ROC, Recall, F1**. Our main goal is to maximize **F1** score and keep **AUC-ROC** higher than 0.5. This is necessary for our model to predict better than a fair coin (with probability higher that 50%, therefore there would be a profit in the long run)
14. With proper combinations of hyperparameters all models have passed validation test. The best score showed the model of **GradientBoostingClassifier** with **F1 equal to 81% and AUC-ROC equal to 56%**
15. Performance on validation and on training also was accompanied with imbalance of classes. On each step there was a change in training data **(upsampling and downsampling)**, as this is a useful method to increase the metric and adequacy of the model. By the end of training and validation test it has been concluded, that methods of upsampling and downsampling do not work in the way we need, therefore the best model **performs better with the imbalance of classes**
16. The best model has been tested and showed results that were approved by superviser of the project, **Vadim Artemov**