<a href="https://colab.research.google.com/github/vadaliah/CS5260/blob/master/VWAP%20ML%20Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Forex VWAP(Volume Weighted Average Price) ML Solution using scikit-learn**

This notebook demonstrates Machine Learning solution to predict VWAP direction for given currency pair based on historical volume dataset

**Problem Formulation**

In this example, we will use Historical Currencypair price volume dataset provided from FOREX Tester APP, available here: https://forextester.com/data/datasources.

The dataset contains Hourly pricing data of 6 currency pair (EURUSD, GBPUSD, AUDUSD, NZSUSD, USDJPY and USDCHF) for April 2018.

EDA Validation:
1. Ensure data is complete, verify presence of  data for all 6 currency pair is present in the volume dataset
2.  Verify count of data by currencypair, businessdate combination.
  Any currencypair with insufficient data will skew VWAP results and result in inaccurate model outcome
3. Verify Trading dataset is complete, contains currency pair and business dates in scope
4. Esnure Join between CurrencyPair volume and Trading dataset is successful and there is matching attributes in both the datasets


In [None]:
  import pandas as pd
  import numpy as np
  import seaborn as sns
  import matplotlib.pyplot as plt
  from sklearn.model_selection import train_test_split, GridSearchCV
  from sklearn.linear_model import LogisticRegression
  from sklearn.ensemble import RandomForestClassifier
  from sklearn.pipeline import Pipeline
  from sklearn.compose import ColumnTransformer, make_column_selector
  from sklearn.impute import SimpleImputer
  from sklearn.preprocessing import OneHotEncoder, LabelBinarizer, StandardScaler
  from sklearn import config_context
  from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
  from google.colab import files

  fx_volume_file = 'https://raw.githubusercontent.com/vadaliah/CS5260/master/currencypair_volume_datset.csv'
  fx_volume_df = pd.read_csv(fx_volume_file)
  fx_volume_df.sample(5)
  print(fx_volume_df.shape)
  print(fx_volume_df.size)
  fx_volume_df.head().values

  fx_trade_file = 'https://raw.githubusercontent.com/vadaliah/CS5260/master/trade_dataset.csv'
  fx_trade_df = pd.read_csv(fx_trade_file)
  print(fx_trade_df.shape)
  print(fx_trade_df.size)
  fx_trade_df.head().values

(3042, 8)
24336
(14327, 5)
71635


array([['USDJPY', 20180401, '21:00', 2101, 106.167],
       ['USDJPY', 20180401, '21:00', 2102, 106.189],
       ['USDJPY', 20180401, '21:00', 2105, 106.221],
       ['USDJPY', 20180401, '21:00', 2106, 106.219],
       ['USDJPY', 20180401, '21:00', 2107, 106.222]], dtype=object)

In [None]:
fx_volume_df.info()
fx_volume_df.columns
fx_volume_df['Ticker'].value_counts()
fx_volume_df[['Ticker','BusinessDate']].value_counts()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3042 entries, 0 to 3041
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Ticker        3042 non-null   object 
 1   BusinessDate  3042 non-null   int64  
 2   TimeBucket    3042 non-null   object 
 3   Open          3042 non-null   float64
 4   High          3042 non-null   float64
 5   Low           3042 non-null   float64
 6   Close         3042 non-null   float64
 7   Volume        3042 non-null   int64  
dtypes: float64(4), int64(2), object(2)
memory usage: 190.2+ KB


Ticker  BusinessDate
GBPUSD  20180430        24
USDCHF  20180405        24
        20180403        24
        20180402        24
NZDUSD  20180430        24
                        ..
USDCHF  20180422         3
GBPUSD  20180429         3
AUDUSD  20180422         3
USDCHF  20180429         3
*       20180426         1
Length: 157, dtype: int64

In [None]:
fx_trade_df.info()
fx_trade_df.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14327 entries, 0 to 14326
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   TICKER        14327 non-null  object 
 1   BusinessDate  14327 non-null  int64  
 2   Time_Bucket   14327 non-null  object 
 3   TradeTime     14327 non-null  int64  
 4   TradePrice    14327 non-null  float64
dtypes: float64(1), int64(2), object(2)
memory usage: 559.8+ KB


Index(['TICKER', 'BusinessDate', 'Time_Bucket', 'TradeTime', 'TradePrice'], dtype='object')