### Data Collection and Filtering

This file contians the code that gets the the data from the original excel docs, filters the data by removing unecessary data
such as irrelevant table columns and adding more financial data.
The added financial data includes: 
* SP500 weekly, monthly ,3 month percent change and daily SP500 gap percent change.
* Gap percent change of the given instrument.


In [1]:

import numpy as np
import pandas as pd
import fdata
import datetime

### Data Import:



In [5]:
dt1 = pd.read_excel(r'Live day trading 2 fixed.xlsx' , 'TABLE1')
dt2 = pd.read_excel(r'Live day trading 4.xlsx' , 'TABLE1')
dt3 = pd.read_excel(r'Live day trading 5.xlsx' , 'TABLE1')
dt4 = pd.read_excel(r'Live day trading 7.xlsx' , 'TABLE1')

### Data Wrangling:

1) Sign the tables, and remove their nulls under the "Instrument" column.

2) In dt3 and dt4 which are more recent data tables there has been a separation between 'Potential Price' and 'Potential Price in Trade', the latter is equal to 'Potential Price' in dt2 and d1 which determines the poteintial price a trade may reach without dropping back to SL. Same with 'RRR Potential in Trade' in dt3 and dt4 and 'RRR Potential' in dt2 and dt1.

3) Merge all 4 tables.


In [6]:
#1)

dt_all = [dt1,dt2,dt3,dt4]
table_num = 0

# sign the tables by enumeration
for table in dt_all:

    table["Table Number"] = table_num 
    table = table.dropna(subset=["Instrument"])
    table_num +=1

#2)
dt3 =  dt3.drop(['Potential Price', 'Trade Potential'], axis = 1) 
dt4 = dt4.drop(['Potential Price', 'Trade Potential'], axis = 1) 


dt3.rename(columns = {'In Trade Potential Price':'Potential Price'}, inplace = True) 
dt4.rename(columns = {'In Trade Potential Price':'Potential Price'}, inplace = True)

dt3.rename(columns = {'RRR Potential in Trade':'RRR Potential'}, inplace = True) 
dt4.rename(columns = {'RRR Potential in Trade':'RRR Potential'}, inplace = True)

#3)
dt = dt1.append(dt2,ignore_index=True)
dt = dt.append(dt3,ignore_index=True)
dt = dt.append(dt4,ignore_index=True)

dt.sample(5)



Unnamed: 0,Instrument,No,Entry Date,Entry Time,Exit Time,Comissions,Gain/Loss,Quantity,Setup,Buy/Sell,...,Exit 3,lvl 2,Performance Grade,Pivot Description,SandR,Dilution,Highest Price2,Shares,Position Size,Time2
102,SEEL,7,2019-03-27 00:00:00,10:16:00,10:21:00,12.7,14.13,1250.0,BO,BUY,...,,,,,,,,,,
152,BIOC,58,2019-05-20 00:00:00,09:59:00,10:27:00,76.0,-300.0,7500.0,BO,BUY,...,,,,,,,,,,
175,DLTH,11,2019-12-05 00:00:00,09:54:00,09:56:00,,,,FPB,BUY,...,9.3,,,Exhausted,,False,0.333333,133.0,1266.84,00:02:00
89,SEEL,90,2019-03-07 00:00:00,09:40:00,09:49:00,2.02,-18.52,100.0,BO,BUY,...,,,,,,,,,,
16,AXSM,17,2019-01-07 00:00:00,09:48:00,09:49:00,2.02,-23.7,100.0,BO,SELL,...,,,,,,,,,,


### Filtering:



1) Choose only the given features to see if there is a relationship between them and the outcome of a trade.

2) Select only long pattern breakout setups.

3) Drop short trades, due to their irrelevance. This analysis only features long trades.





In [7]:
#1)
dt = dt[["Instrument", "Entry Date","Entry Time","Exit Time","Setup","Buy/Sell","Intended Entry","Entry Price"
,"SL Price", "Exit Price", "Highest Price", "Potential Price","Volume Exit","Wick Exit","Price Behaviour","Sector"
, "Catalyst", "Pattern", "Float","RRR in-trade","Negative RRR in-trade","Time","Outcome","Missed RRR on Entry"
,"Missed RRR","RRR Potential","Hard RRR Potential","RRR Difference","RRR Realized","RRR Volume Exit","RRR Wick Exit"
,"RRR Joint Wick and Volume Exit","Pause Num","VWAP Tag","VWAP","Table Number"]]

#2)
setups = ['BO','BOT','VBO','FPH','VF']

dt = dt[(dt['Buy/Sell'] == "BUY") & (dt['Setup'].isin(setups))].reset_index(drop=True)

#3)

# Buy/Sell column is now irrelevant because all positions in this dataset are Buy
# Setup column is also irrelevant because all setups are Break Outs or a subgroup of a Break Out trade.
dt =  dt.drop(['Buy/Sell','Setup'], axis =1)


### Extracting Additional Financial Data:

With the help of the fdata module which includes the financialData class, get the following from Yahoo Finance:

1) SP500 weekly, monthly ,3 month percent change and the daily gap.

2) The percent change gap of the given symbol at the date the trade took place.

In [8]:

# 1)

#create a new financialData object
SP500 = fdata.financialData('SPY')

week = 7
month = 30
three_months = 90


# apply the function from the fdata module and add the data to dt
dt['SPY Week Change'] = dt['Entry Date'].apply(lambda x: SP500.percentChange(x.date(), week) )
dt['SPY Month Change'] = dt['Entry Date'].apply(lambda x: SP500.percentChange(x.date(), month) )
dt['SPY 3 Month Change'] = dt['Entry Date'].apply(lambda x: SP500.percentChange(x.date(), three_months) )
dt['SPY Gap'] = dt['Entry Date'].apply(lambda x: SP500.getGap(x.date()) )



[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [9]:
# 2)

# import the info for delisted stock from a manualy collected
#local excel file, because the delisted stocks are not presented in Yahoo Finance anymore.
delisted = pd.read_excel(r'Delisted.xlsx' )
delisted['Date']=delisted['Date'].apply(lambda x: x.date())

#extract the data from fdata module (Yahoo Finance) with the 'financialData' class, and from the 'delisted' table
def gap (instrument,entry_date):
    print(instrument)

    gap = 0
    if instrument not in delisted['Instrument'].values :
        
        data = fdata.financialData(instrument)
        gap= data.getGap(entry_date)
        print(gap)
    else:
        print("delisted: " + instrument)
        gap = delisted[(delisted['Date']==entry_date) & (delisted['Instrument']==instrument)]['Gap']
        gap = gap.values[0]*100

    return gap
  
dt['Gap'] = dt.apply(lambda x: gap(x['Instrument'], x['Entry Date'].date()), axis=1)



YECO
[*********************100%***********************]  1 of 1 completed
9.865472050607176
YECO
[*********************100%***********************]  1 of 1 completed
9.865472050607176
ALQA
delisted: ALQA
INPX
[*********************100%***********************]  1 of 1 completed
38.79598347401169
DOVA
delisted: DOVA
ADIL
delisted: ADIL
CCCL
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
18.954245242640503
MRT
delisted: MRT
CCCL
[*********************100%***********************]  1 of 1 completed
-1.449274026921029
FLKS
delisted: FLKS
CCCL
[*********************100%***********************]  1 of 1 completed
-1.449274026921029
RHE
[*********************100%***********************]  1 of 1 completed
0.7812492724042223
INPX
[*********************100%***********************]  1 of 1 completed
29.87804134221927
APHA
[*********************100%***********************]  1 of 1 completed
-2.279636734143715
U

[*********************100%***********************]  1 of 1 completed
11.023621086415284
CTRM
[*********************100%***********************]  1 of 1 completed
0.190109346189201
PULM
[*********************100%***********************]  1 of 1 completed
-11.688311152148007
OPTT
[*********************100%***********************]  1 of 1 completed
72.19999313354492
TNXP
[*********************100%***********************]  1 of 1 completed
20.171677754665843
JMU
delisted: JMU
VTL
delisted: VTL
VTL
delisted: VTL
BLIN
[*********************100%***********************]  1 of 1 completed
10.0
GLG
[*********************100%***********************]  1 of 1 completed
43.84237031761466
GLG
[*********************100%***********************]  1 of 1 completed
43.84237031761466
FCSC
delisted: FCSC
IPWR
[*********************100%***********************]  1 of 1 completed
24.390244469764628
IPWR
[*********************100%***********************]  1 of 1 completed
24.390244469764628
RBZ
delisted: RBZ
AK

### Round Floats:


Round all float values to maximum 2 numbers after the decimal point. 


In [10]:
# list of columns with numeric values
num_colums = ["Intended Entry","Entry Price","SL Price", "Exit Price", "Highest Price", "Potential Price","Volume Exit"
,"Wick Exit","Price Behaviour", "Float","RRR in-trade","Negative RRR in-trade","Time","Missed RRR on Entry","Missed RRR"
,"RRR Potential","Hard RRR Potential","RRR Difference","RRR Realized","RRR Volume Exit","RRR Wick Exit"
,"RRR Joint Wick and Volume Exit","Table Number","Gap","SPY Week Change","SPY Month Change","SPY 3 Month Change"
, "VWAP"]


# special roud up function that also cuts unnecessary zeros
def roundUp (x):
    if isinstance(x, float):
        x = np.around(x,2)
        s = str(x)
        s.rstrip('0').rstrip('.') if '.' in s else s
        x = float(s)
    return x

# round up
for col in num_colums:
    dt[col] = dt[col].apply(lambda x: roundUp(x))

### Export to xlsx:



In [16]:
dt.to_excel("data.xlsx")