# Financial Stock Price Prediction with Knowledge Graph Embeddings

## Introduction

This notebook is aimed to serve as an introduction to the creation of a not personalized recommendation algorithm operating on stock market prices in order to predict future profitability of those assets. The time series are converted into financial technical indicators, and it is enriched with knowledge graph information downloaded from WikiData. 

The notebook covers the processing steps, the calculation of the features fed to the prediction models, and provides the outcome of several profitability prediction models using the combination of both technical features and knowledge graph embeddings.

In [1]:
# Local Mode
#storageDIR = "HugeStockMarketDataset" # creates a dataset directory in the same folder as the notebook
#storageDIRNews = "NewsSentimentDataset"
# Container Mode
storageDIR = "/tmp/data/"


## Dataset

Different types of financial asset recommendation systems use different sources of data to produce their recommendations. The approach we introduce in this notebook is known as Profitability Prediction, where assets that are predicted to gain significant value over the following six months are recommended. This type of approach uses past pricing data, i.e. the price for different assets over time, to identify pricing trends and hence future profitable assets. Hence, as input, we need the price history over time for a range of assets. In addition, we enrich our recommendations with a knowledge graph representing relations between companies and other entities related to them (important people in the company like CEO or board members, products released, awards). 

### Pricing data

For illustration, in this notebook we will use open pricing data, available from Yahoo! Finance. In particular, it contains the historical price and volume data for US-based stocks and ETFs trading on the NYSE, NASDAQ and AMEX markets, and it runs up to the end of March 2022. Each entry of this dataset is comprised of: 
 - Date: The date of the pricing data 
 - Open: Opening price for that day
 - High: The maximum price for that day
 - Low: The minimum price for that day
 - Close: The closing price for that day
 - AdjClose: The adjusted closing price
 - Volume: The amount of the asset that is traded 
 
We introduce here three different ways to download the data. Along with this example, we provide the pricing information for ~ 1700 financial assets, stored in a single file. In case you want to use this file, go to the "Loading the data from a single file" section. If you have a single file per stock, go to the "Loading the data from files" section. Finally, if you want to download the data from an online source (Yahoo! Finance), go to the "Loading the data from Yahoo! Finance section".
 
#### Loading the data from files

In this example, we load the data into a single Pandas Dataframe, which acts like a large data table that makes raw data easier to analyse. In case we do already have the pricing information, it is enough to execute the following code snippet. It assumes that we store every asset in a separate file and combines them. If this is the case, you can skip the remaining steps until the "Knowledge graph" section. Otherwise, ignore this snippet and continue with the tutorial.

In [None]:
import pandas as pd
import numpy as np
import glob, os, random, math

directory = os.path.join(storageDIR, "stocks")
all_files = glob.glob(os.path.join(directory, "*.csv"))
dfs = []

tickers = []
# Iterating through files and only using non-empty files
for f in all_files:
    if os.path.getsize(f) > 0:
        df = pd.read_csv(f)
        ticker = f.split('/')[-1].split('.')[0]
        df['Stock'] = ticker
        tickers.append(ticker)
        dfs.append(df)
prices_df = pd.concat(dfs)

print("Dataset Extraction and Loading as Dataframe Complete")

#### Loading the data from a single file

If we have all the pricing data into a single file, we can use the following snippet. If this is the case, you can skip the following steps until the "Knowledge graph" section. Otherwise, ignore this snippet and continue with the tutorial.



In [2]:
import pandas as pd
import numpy as np
import glob, os, random, math

file_name = "timeseries.csv"
directory = os.path.join(storageDIR, "stocks")

file = os.path.join(directory, file_name)
prices_df = pd.read_csv(file)
prices_df["Date"] = pd.to_datetime(prices_df["Date"])

print("Dataset Extraction and Loading as Dataframe Complete")

Dataset Extraction and Loading as Dataframe Complete


#### Loading the data from Yahoo! Finance

The historical prices to use in this notebook can be also downloaded through <b><a href='https://finance.yahoo.com/'>Yahoo! Finance</a></b>. To download this data, we should first download the set of assets on the NASDAQ, AMEX and NYSE. In order to obtain this information, we can download the asset information from the <a href='https://www.nasdaq.com/market-activity/stocks/screener'>NASDAQ Stock Screener</a> webpage (just use the Download .csv button).

In [None]:
import zipfile
import pandas as pd
import numpy as np
import glob, os, random, math
import datetime

ticker_data = "./nasdaq_screener_1664986396364.csv"
data = pd.read_csv(ticker_data, sep=",")
tickers = data["Symbol"].tolist()
tickers = [ticker for ticker in tickers if not pd.isna(ticker)]
tickers

Once we have a list of tickers, we can then ask Yahoo! Finance for the pricing information. Yahoo! Finance provides a URL for each ticker, from which we can download the data. The URL has the following format:

https://query1.finance.yahoo.com/v7/finance/download/[TICKER]?period1=[START_DATE]&period2=[END_DATE]&interval=1[d,wk,mo]&events=[EVENT_TYPE]&includeAdjustedClose=true

where:
- [TICKER] represents the ticker we want to retrieve.
- [START_DATE] represents the UNIX timestamp of the first date we want to retrieve.
- [END_DATE] represents the UNIX timestamp of the last date we want to retrieve.
- [FREQ] indicates whether we want to retrieve daily (d), weekly (wk) or monthly (mo) information
- [EVENT_TYPE] identifies the type of event that we want to retrieve between: "history" (the pricing history), "div" (the dividends only history), "split" (stock split history) and "capital" (capital games).

In this example, we are taking the history data, and we want to collect all the possible information for each ticker (this meaning daily data from the farthest possible period until today). We define the following function for generating the Yahoo! Finance URLs:

In [None]:
import time
def get_url(ticker, start_date, end_date, freq, event_type):
    start_date_unix = int(time.mktime(start_date.timetuple()))
    end_date_unix = int(time.mktime(end_date.timetuple()))
    
    url = "https://query1.finance.yahoo.com/v7/finance/download/"
    url += ticker
    url += "?period1="
    url += str(start_date_unix)
    url += "&period2="
    url += str(end_date_unix)
    url += "&interval=1"
    url += freq
    url += "&events="
    url += event_type
    url += "&includeAdjustedClose=true"
    return url

Then, we can establish the parameters of the information we want, and collect and store the data. In our case, the parameters will be:
- [START_DATE]: 13/12/1901
- [END_DATE]: today (09/05/2023)
- [FREQ]: d (daily)
- [EVENT_TYPE]: history

In [None]:
start_time = datetime.datetime(1901,12,13)
end_time = datetime.datetime.now()
freq = "d"
event_type = "history"

directory = os.path.join(storageDIR, "stocks")

Then, using the get_url function, we can download the information we seek. Note that, in some cases, the time series for some of the tickers cannot be retrieved (when we tried this 7200 out of 8238 tickers had been retrieved). For the rest, there are many reason why this might happen:
- Yahoo! Finance does not contain the corresponding data. This might happen because the stocks are no longer traded on the market, or there might be some invalid information on them. These stock are invalid, and as such, should be discarded.
- The ticker is not the same as in Yahoo! Finance as in teh NASDAQ file. These assets can be fixed. The procedure to follow is:
    - Change "^" by "-P".
    - Change "\" by "-".
    - Remove extra blank spaces.

In the example below, we first try to collect information from the original tickers, and then, we apply the changes to the tickers to obtain some more. In case some assets could not be retrieved this way, we consider them impossible to retrieve.

We store the files in a directory (one file per stock)

In [None]:
unretrieved = set()

dfs = []

i = 0
for ticker in tickers:
    try:
        url = get_url(ticker, start_time, end_time, freq, event_type)
        ticker_data = pd.read_csv(url, sep=",")
        ticker_data.to_csv(directory + "/" + ticker + ".csv", index=False)
        dfs.append(ticker_data)
        i += 1
        if i%100 == 0:
            print("Retrieved " + str(i) + " tickers ( " + str(len(unretrieved)) + " failed)")
    except Exception as e:
        if e.code == 429:
            time.sleep(3600)
        unretrieved.add(ticker)
        
unretrieved_list = list(list(unretrieved)[1:-1])
unretrieved_list.sort()
unretrieved_list

i = 0
modified = 0
for ticker in unretrieved_list:
    if "^" in ticker:
        ticker = ticker.replace("^","-P")
        unretrieved_list[i] = ticker
        modified += 1
    elif "/" in ticker:
        ticker = ticker.replace("/","-")
        unretrieved_list[i] = ticker
        modified += 1
    elif " " in ticker:
        ticker = ticker.strip()
        unretrieved_list[i] = ticker
        modified += 1
    i = i + 1
print("Modified " + str(modified) + " tickers.")

unretrieved_2 = set()

i = 0
for ticker in unretrieved_list:
    try:
        ticker_data = pd.read_csv(get_url(ticker, start_time, end_time, freq, event_type), sep=",")
        ticker_data.to_csv(directory + "/" + ticker + ".csv", index=False)
        dfs.append(ticker_data)
        i += 1
        if i%100 == 0:
            print("Retrieved " + str(i) + " tickers ( " + str(len(unretrieved_2)) + " failed)")
    except Exception as e:
        if e.code == 429:
            time.sleep(3600)
        unretrieved_2.add(ticker)

prices_df = pd.concat(dfs)

In [3]:
prices_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock
0,1999-11-18,32.546494,35.765381,28.612303,31.473534,26.845928,62546380.0,A
1,1999-11-19,30.713518,30.758226,28.478184,28.880545,24.634195,15234146.0,A
2,1999-11-22,29.551144,31.473534,28.657009,31.473534,26.845928,6577870.0,A
3,1999-11-23,30.400572,31.205294,28.612303,28.612303,24.405386,5975611.0,A
4,1999-11-24,28.701717,29.998213,28.612303,29.372318,25.053663,4843231.0,A


#### Filtering the pricing data

Pandas allows us to perform manipulations on the pricing data so that we can extract only what we need for training the model. We will only use pricing data from 2018 to 2021. We shall consider data until July 2019 as the past, and we shall train models at different points of time.

Lets first filter the dataset to only hold data from the dates we care about:

In [4]:
prices_df['Date'] = pd.to_datetime(prices_df['Date'])
min_date = pd.to_datetime('2018-01-01')
max_date = pd.to_datetime('2021-01-10')
# Selecting only that data from either 2016 or 2017
prices_df = prices_df[prices_df['Date'] >= min_date]
prices_df = prices_df[prices_df['Date'] <= max_date]
print("Filtered the data prices")
prices_df

Filtered the data prices


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock
4558,2018-01-02,67.419998,67.889999,67.339996,67.599998,64.989258,1047800.0,A
4559,2018-01-03,67.620003,69.489998,67.599998,69.320000,66.642807,1698900.0,A
4560,2018-01-04,69.540001,69.820000,68.779999,68.800003,66.142906,2230700.0,A
4561,2018-01-05,68.730003,70.099998,68.730003,69.900002,67.200432,1632500.0,A
4562,2018-01-08,69.730003,70.330002,69.550003,70.050003,67.344604,1613400.0,A
...,...,...,...,...,...,...,...,...
6570695,2021-01-04,2.280000,2.300000,2.250000,2.250000,1.824063,1146900.0,DHY
6570696,2021-01-05,2.250000,2.280000,2.250000,2.250000,1.824063,1296300.0,DHY
6570697,2021-01-06,2.270000,2.280000,2.240000,2.260000,1.832170,1903600.0,DHY
6570698,2021-01-07,2.260000,2.290000,2.250000,2.270000,1.840278,2517800.0,DHY


After this step, we print below the number of stocks.

In [5]:
stocks = prices_df['Stock'].unique().tolist()
print("Num. stocks with data between 2018 and 2021: " + str(len(stocks)))
stocks

Num. stocks with data between 2018 and 2021: 1387


['A',
 'AA',
 'AACG',
 'AADI',
 'AAIC',
 'AAL',
 'AAME',
 'AAN',
 'AAOI',
 'AAON',
 'AAP',
 'AAPL',
 'AAT',
 'AATC',
 'AAU',
 'AAWW',
 'AB',
 'ABB',
 'ABBV',
 'ABC',
 'ABCB',
 'ABCL',
 'ABCM',
 'ABEO',
 'ABEV',
 'ABG',
 'ABIO',
 'ABM',
 'ABMD',
 'ABNB',
 'ABR',
 'ABST',
 'ABT',
 'ABUS',
 'ABVC',
 'AC',
 'ACA',
 'ACAD',
 'ACB',
 'ACCD',
 'ACCO',
 'ACEL',
 'ACER',
 'ACET',
 'ACGL',
 'ACGLO',
 'ACHC',
 'ACHR',
 'ACHV',
 'ACI',
 'ACIU',
 'ACIW',
 'ACLS',
 'ACM',
 'ACMR',
 'ACN',
 'ACNB',
 'ACNT',
 'ACOR',
 'ACP',
 'ACR',
 'ACRE',
 'ACRS',
 'ACRX',
 'ACST',
 'ACTG',
 'ACU',
 'ACV',
 'ADAP',
 'ADBE',
 'ADC',
 'ADCT',
 'ADEA',
 'ADES',
 'ADI',
 'ADIL',
 'ADM',
 'ADMA',
 'ADMP',
 'ADN',
 'ADNT',
 'ADOC',
 'ADP',
 'ADPT',
 'ADSK',
 'ADT',
 'ADTN',
 'ADTX',
 'ADUS',
 'ADV',
 'ADVM',
 'ADX',
 'ADXN',
 'AE',
 'AEE',
 'AEF',
 'AEFC',
 'AEG',
 'AEHL',
 'AEHR',
 'AEI',
 'AEIS',
 'AEL',
 'AEMD',
 'AENZ',
 'AEO',
 'AEP',
 'AEPPZ',
 'AER',
 'AES',
 'AEVA',
 'AEY',
 'AEYE',
 'AEZS',
 'AFB',
 'AFBI',
 'AF

### Knowledge graph

In addition to the pricing data, we use for this work a knowledge graph extracted from Wikidata. We share the different information available in the knowledge graph in the following files. A knowledge graph consists on different elements:
- **Entities:** Objects representing real life objects / people or concepts. For instance, a company is represented with an entity. In Wikidata, an entity contains: a) a unique identifier starting with Q (ex: Q312 for Apple), b) a label or name (e.g. "Apple Inc."), c) a list of alternative names or aliases (e.g. "Apple Computer Inc", "Apple", "Apple Incorporated", etc.), a description (e.g. "American technology company based in Cupertino, California") and an associated Wikipedia page (e.g. "https://en.wikipedia.org/wiki/Apple_Inc."). Entities are represented as nodes in the graph.  
  In our data, entities are shared in the *entities.txt* file. This contains a list of JSON objects (one per line) with each line representing a different entity. The format of an entity JSON is:  
```json
{"id": node_id, "labels": ["StockEntity"], "properties": { "alias" : [alias1, alias2,...aliasM], "description": entity description, "id": Wikidata ID, "label": Entity name, "wikipedia": Wikipedia page}}
```
- **Values:** Objects representing constants (either numerical, dates, strings). Values are represented as nodes in the graph.
  In our data, values are shared in the *values.txt* file. This contains a list of JSON objects (one per line) with each line representing a different value. Note that these values might be repeated. The format of a value JSON is:  
```json
{"id": node_id, "labels": ["StockValue"], "properties": { dictionary containing the necessary values}}
```
- **Properties:** Objects representing the possible types of connections between nodes in the knowledge graph, or the possible types of properties of the edges. In Wikidata, a property is defined by: a) a unique identifier starting with P (e.g.: P452), b) a label or name (e.g.: "industry"), c) a list of alternative names or aliases (e.g. "field of action", "sector", "branch", etc.), and d) a description (e.g.:  "specific industry of company or organization"). Properties are represented as nodes in the knowledge graph.  
  In our data, properties are shared in the *properties.txt* file. This contains a list of JSON objects (one per line) with the following format:
```json
{"id": node_id, "labels": ["StockProperty"], "properties": { "alias" : [alias1, alias2,...aliasM], "description": entity description, "id": Wikidata ID, "label": Entity name}}
```

- **Relations:**  Connections between different entities in the knowledge graph. They are quartets of the form `(head, relation type, tail, properties)` where:
    - head: the initial entity (e.g.: Amazon)
    - relation type: the type of relation (e.g.: has subsidiary)
    - tail: the end entity (e.g.: Twitch Interactive)
    - properties: additional information about the link (e.g.: start time: 25/08/2014)    
  
  Relations appear as directed edges in the knowledge graph.  
  In our data, they are included in the file *value_relations.txt*. This file contains a list of JSON objects (one per line) representing each relation. Every line has the following format:  
```json
{"source": head_node_id, "type": Wikidata relation type ID, "dest": tail_node_id, "properties": { property type 1: value1, ..., property type N: valueN}}
```

- **Property values:** Connections between entities and values in the knowledge graph. They are quartets of the form `(entity, property type, value, properties)` where:
    - entity: the entity who has a property (e.g.: Amazon)
    - property type: the type of property (e.g.: total revenue)
    - value: the value (e.g.: 513,983,000,000 USD)
    - properties: additional information about the property (e.g.: point in time: 2022)    
  
  These connections appear as directed edges in the knowledge graph.  
  In our data, they are included in the file *relations.txt*. This file contains a list of JSON objects (one per line) representing each relation. Every line has the following format:  
```json
{"source": entity_node_id, "type": Wikidata relation type ID, "dest": value_node_id, "properties": { property type 1: value1, ..., property type N: valueN}}
```

In addition to the previous information, we share an additional file, *mapping.txt* which contains, for every ticker, the corresponding Wikidata id of the entiies representing it in the knowledge graph. It has format:

```
ticker:entityID_1,...,entityID_n
```

We first download the information we need for the knowledge graph:

In [6]:
import json
import urllib.request

store = False # Set to true to store the file again
directory = os.path.join(storageDIR, "kg")

In [7]:
# Entities file
entities_file = os.path.join(directory, "entities.txt")
entities_url = "URL"

entities = []
text = ""

# If container mode:
with open(entities_file, "r") as f:
    lines = f.readlines()
# If collab mode / need to download from URL
    #lines = urllib2.urlopen(entities_url)
    for line in lines:
        dictionary = json.loads(line)
        entities.append({"nodeID" : dictionary["id"], 
                         "wikidataID": dictionary["properties"]["id"], 
                         "label" : dictionary["properties"]["label"]})
        if store:
            text += line + "\n"

if store:
    with open(entities_file, "w") as f:
        f.write(text)

entities_df = pd.DataFrame(entities)
entities_df

Unnamed: 0,nodeID,wikidataID,label
0,1,Q30268840,Celyad (Belgium)
1,5,Q1001788,Buenaventura
2,7,Q16858667,Kratos Defense & Security Solutions
3,9,Q6783802,Masonite International
4,10,Q846246,Neonode
...,...,...,...
102734,327123,Q181790,composite material
102735,327124,Q369820,music of Africa
102736,327125,Q11700058,folk-pop
102737,327126,Q42982,allergy


In [8]:
# Values file
values_file = os.path.join(directory, "values.txt")
values_url = "URL"

values = []
text = ""

# If container mode:
with open(values_file, "r") as f:
    lines = f.readlines()
# If collab mode / need to download from URL
    #lines = urllib2.urlopen(values_url)
    for line in lines:
        dictionary = json.loads(line)
        values.append({"nodeID" : dictionary["id"], 
                         "value": dictionary["properties"]})
        if store:
            text += line + "\n"

if store:
    with open(values_file, "w") as f:
        f.write(text)

values_df = pd.DataFrame(values)
values_df

Unnamed: 0,nodeID,value
0,0,"{'amount': 367771.0, 'unit': 'unit'}"
1,2,"{'amount': 125988209.0, 'unit': 'unit'}"
2,3,{'value': '+2004-01-01T00:00:00Z'}
3,4,"{'amount': 126004305.0, 'unit': 'unit'}"
4,6,{'value': '+1953-01-01T00:00:00Z'}
...,...,...
223053,318697,{'value': '+2021-00-00T00:00:00Z'}
223054,318699,{'value': '+1914-01-01T00:00:00Z'}
223055,318701,{'value': '+2016-06-30T00:00:00Z'}
223056,318702,{'value': '+1940-09-17T00:00:00Z'}


In [9]:
# Properties file
properties_file = os.path.join(directory, "properties.txt")
properties_url = "URL"

properties = []
text = ""

# If container mode:
with open(properties_file, "r") as f:
    lines = f.readlines()
# If collab mode / need to download from URL
    #lines = urllib2.urlopen(properties_url)
    for line in lines:
        dictionary = json.loads(line)
        properties.append({"nodeID" : dictionary["id"], 
                         "wikidataID": dictionary["properties"]["id"], 
                         "label" : dictionary["properties"]["label"]})
        if store:
            text += line + "\n"

if store:
    with open(properties_file, "w") as f:
        f.write(text)

properties_df = pd.DataFrame(properties)
properties_df

Unnamed: 0,nodeID,wikidataID,label
0,548,P246,element symbol
1,549,P1082,population
2,550,P2054,density
3,551,P3095,practiced by
4,552,P452,industry
...,...,...,...
109,657,P106,occupation
110,658,P2138,total liabilities
111,659,P2397,YouTube channel ID
112,660,P69,educated at


In [10]:
# Relations file
relations_file = os.path.join(directory, "relations.txt")
relations_url = "URL"

relations = []
text = ""

# If container mode:
with open(relations_file, "r") as f:
    lines = f.readlines()
# If collab mode / need to download from URL
    #lines = urllib2.urlopen(relations_url)
    for line in lines:
        dictionary = json.loads(line)
        relations.append({"source" : dictionary["source"], 
                           "dest": dictionary["dest"], 
                           "type" : dictionary["type"],
                           "properties": dictionary["properties"]})
        if store:
            text += line + "\n"

if store:
    with open(relations_file, "w") as f:
        f.write(text)

relations_df = pd.DataFrame(relations)
relations_df

Unnamed: 0,source,dest,type,properties
0,50283,24,P1889,{}
1,128884,47,P108,"{'P580': '+2011-00-00T00:00:00Z', 'P582': '+20..."
2,15073,47,P1830,{}
3,50321,64,P127,{}
4,50347,64,P127,{}
...,...,...,...,...
457753,327107,327123,P279,{}
457754,327117,327124,P279,{}
457755,327118,327125,P1889,{}
457756,327122,327126,P1889,{}


In [11]:
# Relations with values file
valuerelations_file = os.path.join(directory, "value_relations.txt")
valuerelations_url = "URL"

valuerelations = []
text = ""

# If container mode:
with open(relations_file, "r") as f:
    lines = f.readlines()
# If collab mode / need to download from URL
    #lines = urllib2.urlopen(relations_url)
    for line in lines:
        dictionary = json.loads(line)
        valuerelations.append({"source" : dictionary["source"], 
                           "dest": dictionary["dest"], 
                           "type" : dictionary["type"],
                           "properties": dictionary["properties"]})
        if store:
            text += line + "\n"

if store:
    with open(relations_file, "w") as f:
        f.write(text)

valuerelations_df = pd.DataFrame(valuerelations)
valuerelations_df

Unnamed: 0,source,dest,type,properties
0,50283,24,P1889,{}
1,128884,47,P108,"{'P580': '+2011-00-00T00:00:00Z', 'P582': '+20..."
2,15073,47,P1830,{}
3,50321,64,P127,{}
4,50347,64,P127,{}
...,...,...,...,...
457753,327107,327123,P279,{}
457754,327117,327124,P279,{}
457755,327118,327125,P1889,{}
457756,327122,327126,P1889,{}


In [12]:
mapping_file = os.path.join(directory, "mapping.txt")
mapping = dict()
with open(mapping_file, "r") as f:
    for line in f.readlines():
        aux = line.split(":")
        ticker = aux[0]
        entities = aux[1].strip().split(",")
        mapping[ticker] = entities
mapping

{'ALEX': ['Q135281'],
 'UMC': ['Q143616'],
 'BP': ['Q152057'],
 'BPMP': ['Q152057'],
 'BPT': ['Q4836297'],
 'GER': ['Q193326'],
 'GJS': ['Q193326'],
 'GMZ': ['Q193326'],
 'GS': ['Q193326'],
 'GSBD': ['Q193326'],
 'GSC': ['Q193326'],
 'AIG': ['Q212235'],
 'BCS': ['Q245343'],
 'FFEU': ['Q245343'],
 'FIYY': ['Q245343'],
 'GAZ': ['Q245343'],
 'GSP': ['Q245343'],
 'ACCO': ['Q288129'],
 'NOC': ['Q329953'],
 'AMBC': ['Q456563'],
 'WMT': ['Q483551'],
 'LEA': ['Q502344'],
 'IHIT': ['Q522617'],
 'IHTA': ['Q522617'],
 'IIM': ['Q522617'],
 'IQI': ['Q522617'],
 'IVZ': ['Q522617'],
 'OIA': ['Q522617'],
 'VBF': ['Q522617'],
 'VCV': ['Q522617'],
 'VGM': ['Q522617'],
 'VKI': ['Q522617'],
 'VKQ': ['Q522617'],
 'VLT': ['Q522617'],
 'VMO': ['Q522617'],
 'VPV': ['Q522617'],
 'VTA': ['Q522617'],
 'VTN': ['Q522617'],
 'VVR': ['Q522617'],
 'IVR': ['Q522617'],
 'BBY': ['Q533415'],
 'KGC': ['Q546880'],
 'CI': ['Q642271'],
 'ING': ['Q645708'],
 'STM': ['Q661845'],
 'MCO': ['Q675585'],
 'MTD': ['Q680186'],
 'MGA'

### Data cleaning

Now that we have retrieved and loaded the pricing information and the knowledge graph, we just keep all those stocks with both.

In [13]:
stocks = list(set(stocks) & mapping.keys())

Once we do have the intersection between the assets for which we do have time series and for which we do have knowledge graph information, we just clean the data by getting only the allowed tickers.

In [14]:
print("Number of stocks to consider: " + str(len(stocks)))

Number of stocks to consider: 819


In [15]:
stocks

['AQN',
 'ASB',
 'BCS',
 'BABA',
 'CBSH',
 'CPZ',
 'DEX',
 'AON',
 'AAL',
 'BRKR',
 'CPA',
 'AGRX',
 'ARVN',
 'CNA',
 'AIN',
 'CCM',
 'ACV',
 'DAN',
 'AVA',
 'ASTE',
 'CSX',
 'CALM',
 'CMRE',
 'BSAC',
 'CRL',
 'CALX',
 'BA',
 'AMZN',
 'BSL',
 'AP',
 'BTU',
 'AEF',
 'CAMP',
 'DBX',
 'CAPR',
 'CSCO',
 'BJ',
 'AMD',
 'CSSEP',
 'BLW',
 'ADSK',
 'CLS',
 'CRSP',
 'CACC',
 'CORT',
 'BNY',
 'CTRN',
 'BBAR',
 'AWI',
 'CASI',
 'CINF',
 'ANGO',
 'ATTO',
 'AYI',
 'ADPT',
 'BKT',
 'BIG',
 'BHE',
 'ACRX',
 'CII',
 'CSTR',
 'CSII',
 'CB',
 'CDE',
 'CRNT',
 'CVS',
 'BGB',
 'ACGLO',
 'AMTD',
 'CMSC',
 'CSGP',
 'CNTG',
 'CRON',
 'CSSE',
 'BHF',
 'BCRX',
 'BGFV',
 'AMRN',
 'AXP',
 'ATVI',
 'CCK',
 'CRAI',
 'AFT',
 'AIG',
 'CAF',
 'CLFD',
 'ABB',
 'AXR',
 'CHGG',
 'CNXN',
 'ABT',
 'BCE',
 'CWH',
 'CUK',
 'AGE',
 'BPT',
 'AXNX',
 'CE',
 'BBN',
 'CTHR',
 'CYBR',
 'BCOW',
 'CMCSA',
 'APA',
 'ALTR',
 'ATI',
 'CLPT',
 'ACM',
 'CNHI',
 'BRFS',
 'DHR',
 'CBRL',
 'BLFS',
 'DECK',
 'DDT',
 'CFG',
 'ARLO',
 'ALV',


In [16]:
import datetime as dt
pricedfs = []
i = 0
timea = dt.datetime.now()
for s in stocks:
    df = prices_df[prices_df['Stock'] == s]
    pricedfs.append(df)
    i += 1
    if i % 100 == 0:
        print("Processed " + str(i) + " stocks (" + str((dt.datetime.now() - timea).seconds) + " s)")
print("Dataset Filtering Complete")

Processed 100 stocks (4 s)
Processed 200 stocks (9 s)
Processed 300 stocks (13 s)
Processed 400 stocks (18 s)
Processed 500 stocks (22 s)
Processed 600 stocks (27 s)
Processed 700 stocks (32 s)
Processed 800 stocks (36 s)
Dataset Filtering Complete


In [17]:
pricedfs

[              Date       Open       High        Low      Close  Adj Close  \
 1528084 2018-01-02  11.160000  11.180000  11.060000  11.120000   8.552684   
 1528085 2018-01-03  11.140000  11.140000  10.770000  10.780000   8.291181   
 1528086 2018-01-04  10.780000  10.930000  10.580000  10.660000   8.198886   
 1528087 2018-01-05  10.730000  10.930000  10.710000  10.870000   8.360404   
 1528088 2018-01-08  10.900000  10.930000  10.790000  10.890000   8.375783   
 ...            ...        ...        ...        ...        ...        ...   
 1528840 2021-01-04  16.480000  16.600000  16.110001  16.250000  14.338088   
 1528841 2021-01-05  16.209999  16.360001  16.160000  16.340000  14.417500   
 1528842 2021-01-06  16.330000  16.959999  16.260000  16.580000  14.629262   
 1528843 2021-01-07  16.590000  16.930000  16.559999  16.809999  14.832203   
 1528844 2021-01-08  16.930000  17.240000  16.850000  17.170000  15.149846   
 
             Volume Stock  
 1528084    92300.0   AQN  
 15280

## Feature Creation for the Model

Now that we have collected the pricing data and the knowledge graph, we can craft the features we can use in our model. We distinguish to kind of features: price-based technical indicators and knowledge graph embeddings.

### Technical indicators

Now that we have the pricing data in a more useful form, we can now convert that data into additional indicators that a machine learned model can use for identifying patterns/trends. In effect, we want to capture how the price for an asset changed in the recent past, for use as indicators for future performance (of course past performance is not always a good indicator, and more advanced approaches may mix in other sources of evidence here). We convert the pricing data into 14 different indicator (feature) types:

**NOTE:** In the following equations, the sub-index $t$ indicates the time of computation of the metric. $t-1$ might indicate, then, the previous day, and so on.

1. <b>True range</b>: The average true range (ATR) is a market volatility indicator. The true range indicator is taken as the greatest of the following: current high less the current low; the absolute value of the current high less the previous close; and the absolute value of the current low less the previous close. The ATR is a moving average of the true ranges. Usually, it is computed over 14 days ($n=14$)

\begin{equation}
\text{TR}_t = \max{\left(\text{High}_t - \text{Low}_t, |\text{High}_t - \text{Close}_{t-1}|, |\text{Low}_t - \text{Close}_{t-1}|\right)}
\end{equation}

\begin{equation}
    \text{ATR}_t(n) = \frac{(n-1)\cdot\text{ATR}_{t-1}   + \text{TR}_t}{n}
\end{equation}

2. <b>Average directional index </b>: The average directional index (ADX) is a technical analysis indicator used by some traders to determine the strength of a trend. The ADX makes use of a positive (+DI) and negative (-DI) directional indicator in addition to the trendline. The ADX identifies a strong trend when it is over 25 and a weak trend when it is below 20. Crossovers of the -DI and +DI lines can be used to generate trade signals. Usually, it is computed over a period of 14 days ($n=14$)

\begin{equation}
\text{ADX}_t(n) = \frac{(n-1)\cdot\text{ADX}_{t-1}(n) + \text{DX}_{t}(n)}{n}
\end{equation}

\begin{equation}
\text{DX}_t(n) = 100\cdot\frac{\left|\text{+DI}_t(n) - \text{-DI}_t(n)\right|}{\left|\text{+DI}_t(n) + \text{-DI}_t(n)\right|}
\end{equation}

\begin{equation}
\text{(+/-)DI}_t(n) = 100\cdot\frac{\text{(+/-)SmDM}_t(n)}{\text{ATR}_t(n)}
\end{equation}

\begin{equation}
\text{(+/-)smDM}_t(n) = \sum_{i=1}^{n}\text{(+/-)DM}_{t-i} - \frac{1}{n}\sum_{i=1}^{n}\text{(+/-)DM}_{t-i} + \text{(+/-)DM}_{t}
\end{equation}

\begin{equation}
\text{+DM}_t = \begin{cases}
\text{High}_t - \text{High}_{t-1} & \text{if } \text{High}_t - \text{High}_{t-1} > \text{Low}_{t-1} - \text{Low}_{t}\\
0 & \text{otherwise}
\end{cases}
\end{equation}

\begin{equation}
\text{-DM}_t = \begin{cases}
\text{Low}_{t-1} - \text{Low}_{t} & \text{if } \text{High}_t - \text{High}_{t-1} < \text{Low}_{t-1} - \text{Low}_{t}\\
0 & \text{otherwise}
\end{cases}
\end{equation}

3. <b>Moving average convergence divergence</b>: Moving average convergence divergence (MACD) is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period exponential moving average (EMA) from the 12-period EMA.
\begin{equation}
\text{EMA}_t(n) = \left(\text{Close}_t * \left(\frac{\alpha}{1 + n}\right)\right) + EMA_{t-1}(n) * \left(1 - \left(\frac{\alpha}{1 + n}\right)\right) \\
\end{equation}
where $\alpha$ is an smoothing factor (we take here as $\alpha=2$) and $n$ is the number of days in the period. Then:

\begin{equation}
\text{MACD}_t = \text{EMA}_t(12) - \text{EMA}_t(26)
\end{equation}

4. <b>Momentum</b>: Momentum is the rate of acceleration of a security's price. It refers to the inertia of a price trend to continue either rising or falling for a particular length of time, usually taking into account both price and volume information. Here we calculate momentum as the difference between the close prices over 1, 3, 5, 7, 14, 21, and 28 trading days respectively. If we denote by $n$ the number of trading days:

\begin{equation}
\text{Momentum}_t(n) = \text{Close}_t - \text{Close}_{t-n}
\end{equation}


5. <b>Rate of change</b>: The rate of change (ROC) is the speed at which a variable changes over a specific period of time. ROC is often used when speaking about momentum.

\begin{equation}
\text{ROC}_t(n) = \frac{\text{Momentum}_t(n)}{\text{Close}_t}
\end{equation}

6. <b>Relative strength index</b>: The relative strength index (RSI) is a momentum indicator that measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock or other asset. The RSI is displayed as an oscillator (a line graph that moves between two extremes) and can have a reading from 0 to 100. Here, again, the common period to use is 14 days ($n$ = 14).

\begin{equation}
\text{RSI}_t(n) = 100 - \left(\frac{100}{1 + \text{RS}_t(n)}\right)
\end{equation}

\begin{equation}
\text{RS}_t(n) = \frac{\text{EMAGain}_t(n)}{\text{EMALoss}_t(n)}
\end{equation}

\begin{equation}
\text{EMAGain}_t(n) = \frac{(n-1)\cdot\text{EMAGain}_t(n) + \text{Gain}_t}{n}
\end{equation}

\begin{equation}
\text{Gain}_t = \begin{cases}
                    \text{Close}_t - \text{Close}_{t-1} & \text{if } \text{Close}_t > \text{Close}_{t-1} \\
                    0 & \text{otherwise}
                    \end{cases}
\end{equation}

\begin{equation}
\text{EMALoss}_t(n) = \frac{(n-1)\cdot\text{EMALoss}_t(n) + \text{Loss}_t}{n}
\end{equation}

\begin{equation}
\text{Loss}_t = \begin{cases}
                    \text{Close}_{t-1} - \text{Close}_t & \text{if } \text{Close}_t < \text{Close}_{t-1} \\
                    0 & \text{otherwise}
                    \end{cases}
\end{equation}

7. <b>Vortex indicator</b>: A vortex indicator (VI) is an indicator composed of two lines - an uptrend line (VI+) and a downtrend line (VI-). These lines are typically colored green and red respectively. A vortex indicator is used to spot trend reversals and confirm current trends.

\begin{equation}
\text{VI+}_t(n) = \frac{\text{SumVM+}_t(n)}{\text{SumTR}_t(n)}
\end{equation}

\begin{equation}
\text{VI-}_t(n) = \frac{\text{SumVM-}_t(n)}{\text{SumTR}_t(n)}
\end{equation}

\begin{equation}
\text{SumTR}_t(n) = \sum_{i = 0}^{n-1} \text{TR}_{t-i}
\end{equation}

\begin{equation}
\text{SumVM(+/-)}_t(n) = \sum_{i = 0}^{n-1} \text{VM(+/-)}_{t-i}
\end{equation}

\begin{equation}
\text{VM+}_t = \left| \text{High}_t - \text{Low}_{t-1}\right|
\end{equation}
\begin{equation}
\text{VM-}_t = \left| \text{Low}_t - \text{High}_{t-1}\right|
\end{equation}

8. <b>Detrended close oscillator</b>: A detrended price oscillator, used in technical analysis, strips out price trends in an effort to estimate the length of price cycles from peak to peak or trough to trough. Unlike other oscillators, such as the MACD, the DPO is not a momentum indicator. It instead highlights peaks and troughs in price, which are used to estimate buy and sell points in line with the historical cycle.

\begin{equation}
\text{DCO}_t(n) = \text{Close}_{t-(n/2 + 1)} - \text{SMA}_t(n)
\end{equation}

\begin{equation}
\text{SMA}_t(n) = \frac{1}{n}\sum_{i=0}^{n-1} \text{Close}_{t-i}
\end{equation}

9. <b>Returns</b>: The returns on investment (ROI) represent the percentage change between close prices on different dates, across different periods.

\begin{equation}
\text{ROI}_t(n) = \frac{\text{Close}_t - \text{Close}_{t-n}}{\text{Close}_{t-n}}
\end{equation}

10. <b>Volatility</b>: Volatility represents the risk of a stock as expressed by its fluctuations, and is expressed as the standard deviation of the logarithmic returns of the stock. In this case, we take the daily returns.
\begin{equation}
\text{Volatility}_t(N,n) = \sqrt{\frac{1}{N-1} \sum_{i=0}^{N-1} \log^2(\text{ROI}_{t-i}(n)) - \left(\frac{1}{N-1} \sum_{i=0}^{N-1} \log(\text{ROI}_{t-i}(n))\right)^2} * \sqrt{n}
\end{equation}
Here, $N$ represents the number of periods we consider for measuring the Volatility (here, we take $N$ days), and $n$ represents the period of time for computing the ROI (here, we take $n = 1$ day). In the right square root, $n$ is the number of periods covered by the ROI calculation. For instance, if we took a monthly measure of ROI, we should measure $n$ in months. In this example, as each period is equal to a day, we take $n = 1$.


11. <b>Force index</b>: The force index (FI) is a technical indicator that measures the amount of power used to move the price of an asset. The force index uses price and volume to determine the amount of strength behind a price move. The index is an oscillator, fluctuating between positive and negative territory. It is unbounded meaning the index can go up or down indefinitely. It is used for trend and breakout confirmation, as well as spotting potential turning points by looking for divergences.

\begin{equation}
\text{FI}_t(1) = \left(\text{Close}_t - \text{Close}_{t-1}\right) \cdot \text{Volume}_t
\end{equation}

\begin{equation}
\text{FI}_t(n) = \left(\text{FI}_t(1) \cdot \left(\frac{\alpha}{1 + n}\right)\right) + \text{FI}_{t-1}(n) \cdot \left(1 - \left(\frac{\alpha}{1 + n}\right)\right)
\end{equation}

12. <b>Accumulation/Distribution index</b>: The accumulation/distribution indicator (A/D) is a cumulative indicator that uses volume and price to assess whether a stock is being accumulated or distributed. The A/D measure seeks to identify divergences between the stock price and the volume flow. This provides insight into how strong a trend is.
\begin{equation}
\text{A/D}_t = \text{A/D}_{t-1} + \text{MFV}_t
\end{equation}
where the Money Flow Volume (MFV) is
\begin{equation}
\text{MFV}_t = \text{MFM}_t \cdot \text{Volume}_t
\end{equation}
and the Money Flow Multiplier (MFM) is computed as:
\begin{equation}
\text{MFM}_t = \frac{(\text{Close}_t - \text{Low}_t)  - (\text{High}_t - \text{Close}_t)}{\text{High}_t - \text{Low}_t}
\end{equation}

13. <b>Chaikin oscillator</b>: This estimator measures the difference between the three day and ten day exponential moving averages of the accumulation/distribution index. It measures the momentum predicted by oscillations around the accumulation-distribution line.

\begin{equation}
\text{Chaikin}_t = \text{EMAA\D}_t(3) - \text{EMAA\D}_t(10)
\end{equation}

\begin{equation}
\text{EMAA\D}_t(n) = \left(\text{A\D}_t \cdot \left(\frac{\alpha}{1 + n}\right)\right) + \text{EMAA\D}_{t-1}(n) \cdot \left(1 - \left(\frac{\alpha}{1 + n}\right)\right)
\end{equation}

13. <b>Min-max</b>: This presents the minimum and maximum close price over a specific period.






In [18]:
def true_range(df, N=14):
    atr_name = 'atr_' + str(N)
    df['tr'] = np.maximum(df["High"], df["Close"].shift(1)) - np.minimum(df["Low"], df["Close"].shift(1))
    df[atr_name] = df['tr'].ewm(alpha=1/N, min_periods=N).mean()
    
    return df

def average_directional_index(df, N=14):
    adx_name = 'adx_' + str(N)
    atr_name = 'atr_' + str(N)
    
    if not atr_name in df.columns:
        true_range(df, N)

    upmove =  df['High'] - df['High'].shift(1)
    downmove = df['Low'].shift(1) - df['Low']

    df['plus_dm'] = np.where((upmove > downmove) & (upmove > 0), upmove, 0)
    df['down_dm'] = np.where((downmove > upmove) & (downmove > 0), downmove, 0)
    
    upi = 100 * df['plus_dm'].ewm(alpha=1/N, min_periods=N).mean() /  df[atr_name]
    downi = 100 * df['down_dm'].ewm(alpha=1/N, min_periods=N).mean() /  df[atr_name]
    df[adx_name] = 100 * (np.abs(upi - downi) / (upi + downi)).ewm(alpha=1/N, min_periods=14).mean()
    df =  df.drop(['plus_dm', 'down_dm'], axis=1)
    return df

def moving_average_convergence_divergence(df):
    close_EMA_26 = df['Close'].ewm(span=26, adjust=False).mean()
    close_EMA_12 = df['Close'].ewm(span=12, adjust=False).mean()

    df['MACD'] = close_EMA_12 - close_EMA_26
    return df

def momentum(df, periods=[1,3,5,7,14,21,28]):
    for t in periods:
        df[f"m_{t}"] = df['Close'].diff(t)
    return df

def rate_of_change(df, periods=[1,3,5,7,14,21,28]):
    for t in periods:
        df[f"roc_{t}"] = df[f"m_{t}"] / df['Close'].shift(t)
    return df

def relative_strength_index(df, N=14):
    u = df['Close'].diff()
    d = df['Close'].shift(1) - df['Close']
    df['up'] = np.where(u > 0, u, 0)
    df['down'] = np.where(d > 0, d, 0)
    rsi_name = 'rsi_' + str(N)
    df[rsi_name] = 100 - 100 / ( 1 + df['up'].ewm(span=N, adjust=False).mean() / df['down'].ewm(span=N, adjust=False).mean())

    df = df.drop(['up', 'down'], axis=1)
    return df

def vortex_indicator(df, N=14):
    if not 'tr' in df.columns:
        true_range(df, N)
    
    vm_up = np.abs(df['High'] - df['Low'].shift(1))
    vm_down = np.abs(df['Low'] - df['High'].shift(1))

    tr_14 = df['tr'].rolling(window=N).sum()
    vm_up_14 = vm_up.rolling(window=N).sum()
    vm_down_14 = vm_down.rolling(window=N).sum()

    df[f"vi_{N}_plus"] = vm_up_14 / tr_14
    df[f"vi_{N}_neg"] = vm_down_14 / tr_14

    return df

def detrended_close_oscillator(df, N=22):
    dco_name = 'dco_' + str(N)
    mid_index = int(N/2+1)
    df[dco_name] = df['Close'].shift(mid_index) - df['Close'].rolling(window=N).mean()
    return df

def returns(df, periods=[1,3,5,7,14,21,28,63,126]):
    for t in periods:
        df[f"return_{t}"] = (df['Close'] - df['Close'].shift(t)) / df['Close'].shift(t)
   # df['log_return_1'] = np.log(df['close'] / df['close'].shift(1))
    return df

def log_returns(df, periods=[1,3,5,7,14,21,28,63,126]):
    for t in periods:
        df[f"log_return_{t}"] = (df['Close'] - df['Close'].shift(t)) / df['Close'].shift(t)
    return df

def volatility(df, roi_periods = [1], periods=[3,5,7,14,21,28,63,126]):
    for n in roi_periods:
        name = f"log_return_{n}"
        if not name in df.columns:
            log_returns(df, roi_periods)
            break
    
    for t in periods:
        for n in roi_periods:
            df[f"volatility_{t}_{n}"] = df[f"log_return_{n}"].rolling(window=t).std()*np.sqrt(n)

    df['3_28_volatility_ratio'] = df['volatility_3_1'] / df['volatility_28_1']
    return df

def force_index(df):
    df['force_index'] = (df['Close'] - df['Close'].shift(1)) * df['Volume']
    return df

def accumulation_distribution_index(df):
    df['accdist'] = ((2 * df['Close'] - (df['Low'] + df['High'])) / (df['High'] - df['Low'])) * df['Volume']
    df['accdist'] = df['accdist'].expanding().sum()
    return df

def chaikin_oscillator(df):
    if not 'accdist' in df.columns:
        accumulation_distribution_index(df)
    df['chakin_oscillator'] = df['accdist'].ewm(span=3).mean() - df['accdist'].ewm(span=10).mean()
    
    return df

def min_max(df, periods=[3,5,7,14,21,28]):
    for t in periods:
        df[f"min_{t}"] = df['Close'].rolling(window=t).min()
        df[f"max_{t}"] = df['Close'].rolling(window=t).max()

        
        df[f'exp_mean_{t}'] = df['Close'].ewm(span=t).mean()

    return df

def mean_price(df, periods=[3,5,7,14,21,28,63,126]):
    for t in periods:
        df[f'mean_{t}'] = df['Close'].rolling(window=t).mean()
    
    return df


In [19]:
pd.options.mode.chained_assignment = None  # default='warn'

newpricedfs = []
i = 0
timea = dt.datetime.now()
for p in pricedfs:
    if not p.empty:
        p['tp'] = (p['High'] + p['Low'] + p['Close']) / 3
        p1 = true_range(p)
        p1 = average_directional_index(p1)
        p1 = moving_average_convergence_divergence(p1)
        p1 = momentum(p1)
        p1= rate_of_change(p1)
        p1 = relative_strength_index(p1)
        p1 = vortex_indicator(p1)
        p1 = detrended_close_oscillator(p1)
        p1 = returns(p1)
        p1 = volatility(p1)
        p1 = force_index(p1)
        p1 = accumulation_distribution_index(p1)
        p1 = chaikin_oscillator(p1)
        p1 = min_max(p1)
        p1 = mean_price(p1)
        newpricedfs.append(p1.dropna())
        i += 1
        if i % 100 == 0:
            timeb = dt.datetime.now()
            print("Processed " + str(i) + " stocks (" + str((dt.datetime.now() - timea).seconds) + " s)")
print ("Metrics calculated for all stocks")

Processed 100 stocks (3 s)
Processed 200 stocks (7 s)
Processed 300 stocks (10 s)
Processed 400 stocks (14 s)
Processed 500 stocks (17 s)
Processed 600 stocks (21 s)
Processed 700 stocks (24 s)
Processed 800 stocks (28 s)
Metrics calculated for all stocks


In [20]:
newpricedfs[0]

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,max_28,exp_mean_28,mean_3,mean_5,mean_7,mean_14,mean_21,mean_28,mean_63,mean_126
1528210,2018-07-03,9.670000,9.710000,9.640000,9.650000,7.611537,66300.0,AQN,9.666667,0.090000,...,9.910000,9.656489,9.646667,9.626,9.638571,9.606429,9.603333,9.664286,9.752063,10.025873
1528211,2018-07-05,9.650000,9.680000,9.560000,9.640000,7.603651,150600.0,AQN,9.626667,0.120000,...,9.910000,9.655352,9.636667,9.626,9.634286,9.610000,9.600476,9.655000,9.746508,10.016825
1528212,2018-07-06,9.670000,9.720000,9.620000,9.650000,7.611537,97100.0,AQN,9.663333,0.100000,...,9.910000,9.654982,9.646667,9.646,9.631429,9.612143,9.601429,9.646071,9.742381,10.008810
1528213,2018-07-09,9.670000,9.680000,9.530000,9.570000,7.548435,286300.0,AQN,9.593333,0.150000,...,9.910000,9.649121,9.620000,9.626,9.621429,9.612143,9.603810,9.636071,9.736032,9.998492
1528214,2018-07-10,9.590000,9.630000,9.530000,9.590000,7.564211,223000.0,AQN,9.583333,0.100000,...,9.830000,9.645043,9.603333,9.620,9.627143,9.613571,9.604286,9.624643,9.730635,9.988175
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1528840,2021-01-04,16.480000,16.600000,16.110001,16.250000,14.338088,655500.0,AQN,16.320000,0.489999,...,16.610001,16.065604,16.410000,16.424,16.355714,16.204286,16.138095,16.014286,15.801587,14.794762
1528841,2021-01-05,16.209999,16.360001,16.160000,16.340000,14.417500,735700.0,AQN,16.286667,0.200001,...,16.610001,16.084528,16.350000,16.436,16.385714,16.231428,16.153809,16.047143,15.817936,14.821190
1528842,2021-01-06,16.330000,16.959999,16.260000,16.580000,14.629262,1599900.0,AQN,16.600000,0.699999,...,16.610001,16.118698,16.390000,16.430,16.434286,16.257143,16.179048,16.088214,15.838254,14.849841
1528843,2021-01-07,16.590000,16.930000,16.559999,16.809999,14.832203,1063400.0,AQN,16.766666,0.370001,...,16.809999,16.166374,16.576666,16.488,16.510000,16.302857,16.219524,16.132857,15.860476,14.883651


We finally compute the target of our recommendations: return at 6 months into the future (126 financial days)

In [21]:
for i in range(len(newpricedfs)):
    newpricedfs[i]["target"] = newpricedfs[i]["return_126"].shift(-126)
    newpricedfs[i] = newpricedfs[i].dropna()

In [22]:
newpricedfs[0]

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,exp_mean_28,mean_3,mean_5,mean_7,mean_14,mean_21,mean_28,mean_63,mean_126,target
1528210,2018-07-03,9.67,9.71,9.64,9.65,7.611537,66300.0,AQN,9.666667,0.09,...,9.656489,9.646667,9.626,9.638571,9.606429,9.603333,9.664286,9.752063,10.025873,0.040415
1528211,2018-07-05,9.65,9.68,9.56,9.64,7.603651,150600.0,AQN,9.626667,0.12,...,9.655352,9.636667,9.626,9.634286,9.610000,9.600476,9.655000,9.746508,10.016825,0.047718
1528212,2018-07-06,9.67,9.72,9.62,9.65,7.611537,97100.0,AQN,9.663333,0.10,...,9.654982,9.646667,9.646,9.631429,9.612143,9.601429,9.646071,9.742381,10.008810,0.063212
1528213,2018-07-09,9.67,9.68,9.53,9.57,7.548435,286300.0,AQN,9.593333,0.15,...,9.649121,9.620000,9.626,9.621429,9.612143,9.603810,9.636071,9.736032,9.998492,0.084639
1528214,2018-07-10,9.59,9.63,9.53,9.59,7.564211,223000.0,AQN,9.583333,0.10,...,9.645043,9.603333,9.620,9.627143,9.613571,9.604286,9.624643,9.730635,9.988175,0.086548
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1528714,2020-07-06,13.17,13.22,13.00,13.22,11.431006,1146500.0,AQN,13.146667,0.22,...,13.472504,13.113333,13.022,12.978571,13.290714,13.561905,13.719286,13.706349,14.096746,0.229198
1528715,2020-07-07,13.12,13.21,12.96,13.01,11.249424,1206200.0,AQN,13.060000,0.26,...,13.440607,13.093333,13.058,12.971429,13.219286,13.494762,13.692143,13.695873,14.087619,0.255957
1528716,2020-07-08,13.03,13.24,12.96,12.97,11.214837,843700.0,AQN,13.056667,0.28,...,13.408151,13.066667,13.064,13.012857,13.155000,13.421905,13.652500,13.687937,14.078016,0.278335
1528717,2020-07-09,12.60,12.60,12.45,12.55,10.851675,1295300.0,AQN,12.533333,0.52,...,13.348969,12.843333,12.960,12.972857,13.061429,13.337619,13.600714,13.669683,14.066190,0.339442


In [23]:
full_kpis_df = pd.concat(newpricedfs)
full_kpis_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,exp_mean_28,mean_3,mean_5,mean_7,mean_14,mean_21,mean_28,mean_63,mean_126,target
1528210,2018-07-03,9.670000,9.710000,9.640000,9.650000,7.611537,66300.0,AQN,9.666667,0.090000,...,9.656489,9.646667,9.626000,9.638571,9.606429,9.603333,9.664286,9.752063,10.025873,0.040415
1528211,2018-07-05,9.650000,9.680000,9.560000,9.640000,7.603651,150600.0,AQN,9.626667,0.120000,...,9.655352,9.636667,9.626000,9.634286,9.610000,9.600476,9.655000,9.746508,10.016825,0.047718
1528212,2018-07-06,9.670000,9.720000,9.620000,9.650000,7.611537,97100.0,AQN,9.663333,0.100000,...,9.654982,9.646667,9.646000,9.631429,9.612143,9.601429,9.646071,9.742381,10.008810,0.063212
1528213,2018-07-09,9.670000,9.680000,9.530000,9.570000,7.548435,286300.0,AQN,9.593333,0.150000,...,9.649121,9.620000,9.626000,9.621429,9.612143,9.603810,9.636071,9.736032,9.998492,0.084639
1528214,2018-07-10,9.590000,9.630000,9.530000,9.590000,7.564211,223000.0,AQN,9.583333,0.100000,...,9.645043,9.603333,9.620000,9.627143,9.613571,9.604286,9.624643,9.730635,9.988175,0.086548
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1146480,2020-07-06,260.029999,264.970001,253.899994,256.250000,235.653519,3460400.0,AMGN,258.373332,11.070007,...,234.798999,256.536662,247.405997,243.568569,237.806427,232.824283,230.730713,230.657618,223.867777,-0.115473
1146481,2020-07-07,253.600006,258.910004,252.089996,253.149994,232.802673,2365900.0,AMGN,254.716665,6.820008,...,236.064585,255.879995,251.723996,246.145711,239.683570,234.132855,231.742856,231.317460,223.969761,-0.100296
1146482,2020-07-08,253.149994,253.979996,249.250000,251.589996,231.368073,1993700.0,AMGN,251.606664,4.729996,...,237.135303,253.663330,254.869995,248.824282,241.513569,235.363331,232.723570,231.996983,224.077301,-0.072896
1146483,2020-07-09,250.250000,253.830002,248.839996,251.660004,231.432419,1739600.0,AMGN,251.443334,4.990006,...,238.137006,252.133331,254.177997,251.695711,243.037855,236.637617,233.507856,232.512380,224.183967,-0.070095


In [24]:
full_kpis_df.to_csv(os.path.join(storageDIR, "kpis.csv"), index=False)

In order to use the model for learning and testing, we thin the dataset by getting only the data at Mondays.

In [25]:
filtered_kpis_df = full_kpis_df.loc[full_kpis_df['Date'].dt.weekday == 1]
filtered_kpis_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,exp_mean_28,mean_3,mean_5,mean_7,mean_14,mean_21,mean_28,mean_63,mean_126,target
1528210,2018-07-03,9.670000,9.710000,9.640000,9.650000,7.611537,66300.0,AQN,9.666667,0.090000,...,9.656489,9.646667,9.626000,9.638571,9.606429,9.603333,9.664286,9.752063,10.025873,0.040415
1528214,2018-07-10,9.590000,9.630000,9.530000,9.590000,7.564211,223000.0,AQN,9.583333,0.100000,...,9.645043,9.603333,9.620000,9.627143,9.613571,9.604286,9.624643,9.730635,9.988175,0.086548
1528219,2018-07-17,9.580000,9.620000,9.530000,9.570000,7.548435,99300.0,AQN,9.573333,0.090000,...,9.625766,9.580000,9.580000,9.580000,9.605714,9.601429,9.596071,9.710952,9.945635,0.102403
1528224,2018-07-24,9.820000,9.860000,9.770000,9.810000,7.737741,215300.0,AQN,9.813333,0.090000,...,9.673528,9.813333,9.780000,9.724286,9.660714,9.653333,9.633571,9.703968,9.897143,0.102956
1528229,2018-07-31,9.800000,9.920000,9.710000,9.820000,7.745626,166900.0,AQN,9.816667,0.210000,...,9.716171,9.803333,9.816000,9.811429,9.740714,9.695238,9.677500,9.711905,9.852778,0.124236
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1146462,2020-06-09,226.000000,228.309998,224.570007,224.899994,206.823288,2403600.0,AMGN,225.926666,3.739991,...,226.984055,225.439997,223.639999,224.118572,224.924999,228.787142,230.240714,220.496508,224.677222,0.006892
1146467,2020-06-16,222.830002,228.880005,222.619995,226.869995,208.634979,2581600.0,AMGN,226.123332,9.100006,...,225.527667,221.516663,222.113995,223.031424,223.802142,224.768569,227.973213,222.577460,224.178571,0.011416
1146472,2020-06-23,234.869995,239.169998,233.440002,235.750000,216.801224,2171500.0,AMGN,236.120000,5.729996,...,227.865968,236.213333,232.986001,230.225714,226.621427,225.969999,227.258927,225.510317,223.798015,-0.035801
1146477,2020-06-30,233.309998,237.320007,231.800003,235.860001,216.902390,2589000.0,AMGN,234.993337,5.760009,...,229.595666,233.419998,233.575998,233.974285,229.732856,227.871427,227.302142,228.157460,223.471190,-0.038116


### Knowledge graph embeddings

As a second group of features, we consider knowledge graph embeddings for every asset. There are multiple algorithms which summarize the information about entities in knowledge graphs as low-dimension vectors, also known as embeddings. These embeddings encode information about the node and the connections it has with other entities in the knowledge graph.

#### Knowledge graph filtering

As we plan to use embeddings as features for profitability prediction, we need knowledge graph embeddings for every possible date. In order to ensure that we are using correct information, we need to provide some filtering for the knowledge graphs: we need to ensure that, before computing the embeddings, we only have information prior to the date to consider.

In order to do this, we need first to filter the entities and then we need to filter the relations.

For filtering the entities, we take the whole set of entities. Then, we remove two sets of entities:
- **Entities created after the split date:** companies, products, etc. which first appeared after the split date. This is defined by the "inception" value relation (P571 in Wikidata).
- **People who were born after the split date or died before the split date:** This is defined by the "date of birth" relation (P569) and "date of death" relations (P570).

In [26]:
dates = ["+" + pd.to_datetime(str(date)).strftime("%Y-%m-%dT%H:%M:%S") + "Z" for date in filtered_kpis_df['Date'].unique()]

In [27]:
dates

['+2018-07-03T00:00:00Z',
 '+2018-07-10T00:00:00Z',
 '+2018-07-17T00:00:00Z',
 '+2018-07-24T00:00:00Z',
 '+2018-07-31T00:00:00Z',
 '+2018-08-07T00:00:00Z',
 '+2018-08-14T00:00:00Z',
 '+2018-08-21T00:00:00Z',
 '+2018-08-28T00:00:00Z',
 '+2018-09-04T00:00:00Z',
 '+2018-09-11T00:00:00Z',
 '+2018-09-18T00:00:00Z',
 '+2018-09-25T00:00:00Z',
 '+2018-10-02T00:00:00Z',
 '+2018-10-09T00:00:00Z',
 '+2018-10-16T00:00:00Z',
 '+2018-10-23T00:00:00Z',
 '+2018-10-30T00:00:00Z',
 '+2018-11-06T00:00:00Z',
 '+2018-11-13T00:00:00Z',
 '+2018-11-20T00:00:00Z',
 '+2018-11-27T00:00:00Z',
 '+2018-12-04T00:00:00Z',
 '+2018-12-11T00:00:00Z',
 '+2018-12-18T00:00:00Z',
 '+2019-01-08T00:00:00Z',
 '+2019-01-15T00:00:00Z',
 '+2019-01-22T00:00:00Z',
 '+2019-01-29T00:00:00Z',
 '+2019-02-05T00:00:00Z',
 '+2019-02-12T00:00:00Z',
 '+2019-02-19T00:00:00Z',
 '+2019-02-26T00:00:00Z',
 '+2019-03-05T00:00:00Z',
 '+2019-03-12T00:00:00Z',
 '+2019-03-19T00:00:00Z',
 '+2019-03-26T00:00:00Z',
 '+2019-04-02T00:00:00Z',
 '+2019-04-0

In [28]:
entities_date = dict()
for date in dates:
    entities_to_remove = entities_df
    
    rel_type = "P571" ## inception
    rel_df = valuerelations_df[valuerelations_df["type"] == rel_type][["source","dest","properties"]]
    rel_df = rel_df.merge(values_df, left_on="dest", right_on="nodeID")
    if rel_df.shape[0] > 0:
        rel_df = rel_df[rel_df["properties"]["value"] > date]
    
    entities_to_remove = entities_to_remove[entities_to_remove["nodeID"].isin(rel_df["source"])]
    
    rel_type = "P569" ## inception
    rel_df = valuerelations_df[valuerelations_df["type"] == rel_type][["source","dest","properties"]]
    rel_df = rel_df.merge(values_df, left_on="dest", right_on="nodeID")
    if rel_df.shape[0] > 0:
        rel_df = rel_df[rel_df["properties"]["value"] > date]
    
    entities_to_remove = entities_to_remove[entities_to_remove["nodeID"].isin(rel_df["source"])]
    
    rel_type = "P570" ## inception
    rel_df = valuerelations_df[valuerelations_df["type"] == rel_type][["source","dest", "properties"]]
    rel_df = rel_df.merge(values_df, left_on="dest", right_on="nodeID")
    if rel_df.shape[0] > 0:
        rel_df = rel_df[rel_df["properties"]["value"] < date]
    
    entities_to_remove = entities_to_remove[entities_to_remove["nodeID"].isin(rel_df["source"])]
    
    entities_date[date] = entities_df[~entities_df["nodeID"].isin(entities_to_remove)]

Now, for filtering the relations, we use the properties. We remove those edges that satisfy:
- The relation started after the date (P580)
- The relation ended before the date (P582)
- The relation is established by a time point after the date (P583)

We store the remaining relationships into files.

In [29]:
def contains_rel(x, rel, greater, value):
    if rel in x and greater:
        if x[rel] > value:
            return True
    elif rel in x and not greater:
        if x[rel] < value:
            return True
    return False

split_dir = os.path.join(os.path.join(storageDIR, "kg"), "splits")
if not os.path.exists(split_dir):
    os.makedirs(split_dir)

for date in dates:
    relations_def = relations_df[(relations_df["source"].isin(entities_date[date]["nodeID"])) & 
                                 (relations_df["dest"].isin(entities_date[date]["nodeID"]))]
    
    
    relations_with_start_date = relations_def[relations_def["properties"].apply(lambda x: contains_rel(x, "P580", True, date))]    
    relations_def = relations_def.drop(relations_with_start_date.index)
    
    relations_with_end_date = relations_def[relations_def["properties"].apply(lambda x: contains_rel(x, "P582", False, date))]
    relations_def = relations_def.drop(relations_with_end_date.index)
    
    relations_with_point_time = relations_def[relations_def["properties"].apply(lambda x: contains_rel(x, "P583", True, date))]
    relations_def = relations_def.drop(relations_with_point_time.index)

    relations_def = relations_def.merge(entities_date[date], left_on="source", right_on="nodeID")
    relations_def = relations_def[["wikidataID", "type", "dest"]]
    relations_def = relations_def.rename(columns={"wikidataID" : "Source"})
    relations_def = relations_def.merge(entities_date[date], left_on="dest", right_on="nodeID")
    relations_def = relations_def[["Source","type","wikidataID"]]
    relations_def = relations_def.rename(columns={"wikidataID" : "Target"})
    
    relations_def.to_csv(os.path.join(split_dir, "graph_" + pd.to_datetime(date, format="+%Y-%m-%dT%H:%M:%SZ").strftime("%Y-%m-%d") + ".csv"), index=False)

Once we have found the knowledge graphs for every date, we can then train the corresponding knowledge graph embeddings. We are using for that the <a href="https://github.com/pykeen/pykeen">PyKeen</a> library. This library contains multiple knowledge graph embedding methods which we can use. In particular, in this example, we use a knowledge graph embedding method known as TransH.

In [30]:
from pykeen.triples import TriplesFactory
from pykeen.pipeline import pipeline

  warn(f"Failed to load image Python extension: {e}")


In [31]:
def_nodes = []
node_list = []
for node in mapping:
    if len(mapping[node]) == 1:
        def_nodes.append(node)
        node_list.append(mapping[node][0])
node_list

['Q135281',
 'Q143616',
 'Q152057',
 'Q152057',
 'Q4836297',
 'Q193326',
 'Q193326',
 'Q193326',
 'Q193326',
 'Q193326',
 'Q193326',
 'Q212235',
 'Q245343',
 'Q245343',
 'Q245343',
 'Q245343',
 'Q245343',
 'Q288129',
 'Q329953',
 'Q456563',
 'Q483551',
 'Q502344',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q522617',
 'Q533415',
 'Q546880',
 'Q642271',
 'Q645708',
 'Q661845',
 'Q675585',
 'Q680186',
 'Q697311',
 'Q826526',
 'Q836040',
 'Q837982',
 'Q866972',
 'Q905806',
 'Q908324',
 'Q918206',
 'Q920037',
 'Q930919',
 'Q994153',
 'Q1046951',
 'Q1046951',
 'Q1053422',
 'Q1134746',
 'Q1134746',
 'Q1134746',
 'Q1138291',
 'Q1220078',
 'Q1275577',
 'Q1282130',
 'Q20858163',
 'Q1341590',
 'Q1341588',
 'Q1345971',
 'Q1374135',
 'Q2114414',
 'Q2114414',
 'Q2114414',
 'Q2114414',
 'Q1374135',
 'Q1472539',
 'Q1511043',
 'Q1539185',
 'Q

NOTE: This step might be take a while to execute. Also note that, in this case, we are configuring the TransH method with a single epoch: for production purposes, larger numbers of epochs should be considered (but, in this case, it would only increase execution time).

In [34]:
dates = filtered_kpis_df["Date"].unique().flatten()

additional_data = []

for date in dates:
    filename = os.path.join(split_dir, "graph_" + pd.to_datetime(date).strftime("%Y-%m-%d") + ".csv")
    print(filename)
    
    # We read the graph for the given date
    graph = pd.read_csv(filename)
    
    # We create the (head, type, tail) triple set using the graph.
    tf = TriplesFactory.from_labeled_triples(
        graph[['Source', 'type', 'Target']].values,
        create_inverse_triples=False,
        entity_to_id=None,
        relation_to_id=None,
        compact_id=True,
        filter_out_candidate_inverse_relations=True,
        metadata=None,
    )
    
    training, test = tf.split(ratios=0.999)
    
    # We configure the TransH model, with 1 epoch. This number is established for tutorial purposes:
    # A larger number of epochs should be considered for production.
    result = pipeline(
        training=training,
        testing=test,
        model='TransH',
        epochs=1,
        random_seed=0
    )
    
    single_data = dict()
    single_data["Date"] = pd.to_datetime(date)
    
    # We get the list of nodes for which we want to retrieve embeddings
    emb_list=[]
    for node in node_list:
        if node in result.training.entity_to_id:
            emb_list.append(result.training.entity_to_id[node])
        else:
            emb_list.append(-1)
    
    # We retrieve the embeddings
    for i in range(0, len(def_nodes)):
        single_data = dict()
        single_data["Date"] = pd.to_datetime(date)
        single_data["Stock"] = def_nodes[i]
        if emb_list[i] >= 0:
            emb = result.model.entity_representations[0].forward()[emb_list[i]].tolist()
            for j in range(0, 50):
                single_data["emb_" + str(j)] = emb[j]
        else:
            for j in range(0, 50):
                single_data["emb_" + str(j)] = 0.0
        additional_data.append(single_data)
        
embedding_data = pd.DataFrame(additional_data)

aux_embedding_df = aux_embedding_df = embedding_data.copy()
for i in range(0, 50):
    aux_embedding_df["emb_"+str(i)] = aux_embedding_df["emb_" + str(i)].apply(lambda x: float(x))
    if i % 10 == 0:
        print(i)

/nfs/notebooks/KGE/data2/kg/splits/graph_2018-07-03.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339485, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-07-10.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339487, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-07-17.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339489, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-07-24.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339491, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-07-31.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339496, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-08-07.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339497, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-08-14.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339499, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-08-21.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339498, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-08-28.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339498, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-09-04.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339498, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-09-11.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339502, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-09-18.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339509, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-09-25.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339514, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-10-02.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339520, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-10-09.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339523, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-10-16.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339528, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-10-23.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339525, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-10-30.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339524, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1727 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-11-06.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339534, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-11-13.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339533, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-11-20.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339537, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-11-27.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339535, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-12-04.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339544, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.66s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-12-11.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339542, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.74s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2018-12-18.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339543, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-01-08.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339600, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-01-15.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339606, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-01-22.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339608, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-01-29.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339609, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-02-05.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339616, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-02-12.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339616, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-02-19.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339617, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-02-26.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339619, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-03-05.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339625, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.67s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-03-12.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339632, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-03-19.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339630, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-03-26.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339630, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-04-02.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339635, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-04-09.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339638, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-04-16.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339637, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.67s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-04-23.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339638, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.65s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-04-30.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339641, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-05-07.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339642, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-05-14.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339643, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-05-21.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339643, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.67s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-05-28.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339644, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-06-04.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339646, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-06-11.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339650, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-06-18.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339654, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-06-25.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339654, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-07-02.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339665, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-07-09.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339666, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-07-16.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339665, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-07-23.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339663, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-07-30.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339663, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-08-06.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339672, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-08-13.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339672, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-08-20.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339673, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-08-27.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339673, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-09-03.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339687, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-09-10.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339688, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.66s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-09-17.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339689, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-09-24.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339691, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-10-01.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339694, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-10-08.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339695, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-10-15.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339696, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-10-22.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339702, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-10-29.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339703, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-11-05.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339707, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-11-12.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339707, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-11-19.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339708, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-11-26.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339709, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-12-03.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339711, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-12-10.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339712, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-12-17.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339714, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-12-24.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339711, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2019-12-31.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339712, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-01-07.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339710, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-01-14.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339709, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.70s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-01-21.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339712, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-01-28.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339715, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-02-04.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339710, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-02-11.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339713, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.74s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-02-18.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339713, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-02-25.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339713, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-03-03.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339731, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-03-10.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339732, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-03-17.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339735, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-03-24.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339733, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-03-31.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339735, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-04-07.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339739, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-04-14.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339738, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-04-21.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339737, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-04-28.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339741, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.75s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-05-05.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339750, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-05-12.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339752, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.73s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-05-19.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339756, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.73s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-05-26.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339756, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-06-02.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339754, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.68s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-06-09.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339751, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-06-16.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339752, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.69s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-06-23.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339753, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.74s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-06-30.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339757, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.71s seconds


/nfs/notebooks/KGE/data2/kg/splits/graph_2020-07-07.csv


INFO:pykeen.triples.splitting:done splitting triples to groups of sizes [339763, 443]
INFO:pykeen.pipeline.api:Using device: None


Training epochs on cuda:0:   0%|          | 0/1 [00:00<?, ?epoch/s]

Training batches on cuda:0:   0%|          | 0/1728 [00:00<?, ?batch/s]

INFO:pykeen.evaluation.evaluator:Starting batch_size search for evaluation now...
INFO:pykeen.evaluation.evaluator:Concluded batch_size search with batch_size=128.


Evaluating on cuda:0:   0%|          | 0.00/443 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 1.72s seconds


0
10
20
30
40


In [35]:
embedding_data

Unnamed: 0,Date,Stock,emb_0,emb_1,emb_2,emb_3,emb_4,emb_5,emb_6,emb_7,...,emb_40,emb_41,emb_42,emb_43,emb_44,emb_45,emb_46,emb_47,emb_48,emb_49
0,2018-07-03,ALEX,-0.006524,-0.026103,-0.044384,-0.019788,0.057481,0.020563,-0.021681,-0.052736,...,-0.038946,-0.052029,0.039808,0.032232,-0.053698,0.047106,0.009810,-0.072519,0.039703,-0.020752
1,2018-07-03,UMC,-0.012185,-0.018843,-0.084067,-0.039007,0.048284,0.026666,-0.071716,-0.039052,...,-0.040821,-0.019323,0.052418,0.013328,-0.033418,0.007900,0.004144,-0.091898,-0.049011,0.009453
2,2018-07-03,BP,-0.002416,-0.056357,-0.082249,-0.105943,0.091706,0.036275,-0.116427,-0.104529,...,-0.052068,-0.023477,0.098355,0.117395,-0.074105,-0.017603,0.018861,-0.084534,-0.012084,-0.032448
3,2018-07-03,BPMP,-0.002416,-0.056357,-0.082249,-0.105943,0.091706,0.036275,-0.116427,-0.104529,...,-0.052068,-0.023477,0.098355,0.117395,-0.074105,-0.017603,0.018861,-0.084534,-0.012084,-0.032448
4,2018-07-03,BPT,-0.022311,-0.041545,-0.027758,0.016261,0.022688,-0.011671,0.010785,-0.054173,...,-0.013111,0.005699,-0.015953,0.018891,0.017659,-0.001198,-0.001721,-0.039837,0.026102,-0.021808
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
387915,2020-07-07,XP,0.010071,0.003291,-0.019107,-0.040694,0.023292,-0.023060,-0.026578,-0.048145,...,0.006886,0.005121,0.036937,0.006838,-0.009337,-0.024992,0.021884,-0.009780,-0.003846,-0.017526
387916,2020-07-07,XRAY,0.028427,-0.039921,-0.025032,-0.009958,0.037584,0.037607,-0.025059,-0.066898,...,-0.041461,-0.026892,0.010143,0.065303,-0.039509,0.035102,-0.000813,-0.049818,0.025319,-0.039857
387917,2020-07-07,XVZ,-0.142858,-0.088120,-0.024537,-0.191846,0.245535,0.114878,-0.044984,0.035975,...,-0.141802,0.147015,0.090973,-0.147548,-0.117695,-0.009450,-0.164969,0.126821,-0.142927,0.219653
387918,2020-07-07,YGRN,0.007717,-0.002709,0.016964,0.024163,0.003132,0.006996,0.020115,0.000584,...,-0.032307,-0.037174,-0.020983,-0.015940,0.030629,-0.000684,0.026519,-0.012271,0.019070,0.021328


In [36]:
aux_embedding_df

Unnamed: 0,Date,Stock,emb_0,emb_1,emb_2,emb_3,emb_4,emb_5,emb_6,emb_7,...,emb_40,emb_41,emb_42,emb_43,emb_44,emb_45,emb_46,emb_47,emb_48,emb_49
0,2018-07-03,ALEX,-0.006524,-0.026103,-0.044384,-0.019788,0.057481,0.020563,-0.021681,-0.052736,...,-0.038946,-0.052029,0.039808,0.032232,-0.053698,0.047106,0.009810,-0.072519,0.039703,-0.020752
1,2018-07-03,UMC,-0.012185,-0.018843,-0.084067,-0.039007,0.048284,0.026666,-0.071716,-0.039052,...,-0.040821,-0.019323,0.052418,0.013328,-0.033418,0.007900,0.004144,-0.091898,-0.049011,0.009453
2,2018-07-03,BP,-0.002416,-0.056357,-0.082249,-0.105943,0.091706,0.036275,-0.116427,-0.104529,...,-0.052068,-0.023477,0.098355,0.117395,-0.074105,-0.017603,0.018861,-0.084534,-0.012084,-0.032448
3,2018-07-03,BPMP,-0.002416,-0.056357,-0.082249,-0.105943,0.091706,0.036275,-0.116427,-0.104529,...,-0.052068,-0.023477,0.098355,0.117395,-0.074105,-0.017603,0.018861,-0.084534,-0.012084,-0.032448
4,2018-07-03,BPT,-0.022311,-0.041545,-0.027758,0.016261,0.022688,-0.011671,0.010785,-0.054173,...,-0.013111,0.005699,-0.015953,0.018891,0.017659,-0.001198,-0.001721,-0.039837,0.026102,-0.021808
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
387915,2020-07-07,XP,0.010071,0.003291,-0.019107,-0.040694,0.023292,-0.023060,-0.026578,-0.048145,...,0.006886,0.005121,0.036937,0.006838,-0.009337,-0.024992,0.021884,-0.009780,-0.003846,-0.017526
387916,2020-07-07,XRAY,0.028427,-0.039921,-0.025032,-0.009958,0.037584,0.037607,-0.025059,-0.066898,...,-0.041461,-0.026892,0.010143,0.065303,-0.039509,0.035102,-0.000813,-0.049818,0.025319,-0.039857
387917,2020-07-07,XVZ,-0.142858,-0.088120,-0.024537,-0.191846,0.245535,0.114878,-0.044984,0.035975,...,-0.141802,0.147015,0.090973,-0.147548,-0.117695,-0.009450,-0.164969,0.126821,-0.142927,0.219653
387918,2020-07-07,YGRN,0.007717,-0.002709,0.016964,0.024163,0.003132,0.006996,0.020115,0.000584,...,-0.032307,-0.037174,-0.020983,-0.015940,0.030629,-0.000684,0.026519,-0.012271,0.019070,0.021328


## Dataset split

### Getting the training / test examples

In order to execute this, we need training and test examples. Basically, we shall use the training examples for our model, and the test examples 

Then, we are doing the following:
1. Choose a recommendation date. For instance, 2020-06-30.
2. We get 6 months for obtaining target values,
3. The previous 6 months are used as training examples. (Essentially from 2019-07-02 to 2019-12-31)
4. Then, we take the targets of the examples at that point as test targets.

In [37]:
train_data = filtered_kpis_df[(filtered_kpis_df["Date"] >= pd.to_datetime("2019-07-02")) &
                              (filtered_kpis_df["Date"] <= pd.to_datetime("2019-12-31"))]
test_data = filtered_kpis_df[filtered_kpis_df["Date"] == pd.to_datetime("2020-06-30")]

In [38]:
train_data.to_csv(os.path.join(storageDIR, "training.csv"), index=False)
test_data.to_csv(os.path.join(storageDIR, "test.csv"), index=False)

In [39]:
train_embeddings = aux_embedding_df[(aux_embedding_df["Date"] >= pd.to_datetime("2019-07-02")) &
                              (aux_embedding_df["Date"] <= pd.to_datetime("2019-12-31"))]
test_embeddings = aux_embedding_df[aux_embedding_df["Date"] == pd.to_datetime("2020-06-30")]

In [40]:
train_embeddings.to_csv(os.path.join(storageDIR, "train-embeddings.csv"), index=False)
test_embeddings.to_csv(os.path.join(storageDIR, "test-embeddings.csv"), index=False)

## Basic model
As a baseline, we provide here the training model. Considering the data, we train a random forest model using, as input, the different technical indicators. We provide two versions: the first one just uses the profitability, volatility and average price of the assets, whereas the second uses all the possible technical indicators.

In [41]:
basic_kpis = ["return_28","return_63","return_126", "volatility_28_1", "volatility_63_1", "volatility_126_1", "mean_28", "mean_63", "mean_126"]
adv_kpis = ["return_126", "volatility_126_1", "mean_126",
            "atr_14", "adx_14", "MACD", "m_14", "m_21", "m_28", "roc_14", "roc_21", "roc_28", "rsi_14", "vi_14_plus", "vi_14_neg",
            "dco_22", "force_index", "chakin_oscillator", "min_14", "min_21", "min_28", "max_14", "max_21", "max_28"]

In [42]:
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

We first read the training and test data from files.

In [43]:
train_data = pd.read_csv(os.path.join(storageDIR, "training.csv"))
test_data = pd.read_csv(os.path.join(storageDIR, "test.csv"))

And we separate the training features and the training targets the algorithm shall learn.

In [44]:
basic_training_data = train_data[basic_kpis]
basic_targets = train_data["target"]

Then, we create our model: a random forest regressor. Any regression algorithm might be configured at this point (for instance, a Linear Regression algorithm can also be tested)

In [45]:
model = RandomForestRegressor()

Finally, we call the function `fit` to train the model. This function receives, as imput, the training data features and the training targets. Once the algorithm is train, we use the `predict` method over the test features to generate the predictions, and we sort the values by descending scores to generate a recommendation ranking

In [46]:
model.fit(basic_training_data, basic_targets)
basic_test_data = test_data[["Stock", "target"]]
basic_test_data["prediction"] = model.predict(test_data[basic_kpis])
basic_ranking = basic_test_data.sort_values( by="prediction", ascending = False).head(10)
basic_ranking

Unnamed: 0,Stock,target,prediction
796,CAN,1.094737,1.121923
585,BLNK,6.113556,1.076205
417,AHPI,-0.567912,1.048614
732,ANY,-0.493007,1.01877
147,AIM,-0.245968,0.955585
516,ARCT,-0.09371,0.871956
212,BW,0.434211,0.870652
727,CFRX,-0.203443,0.730326
376,BLPH,-0.449402,0.724285
352,CGEN,-0.176431,0.718393


And we compute the average profitability of the model:

In [47]:
basic_prof = basic_ranking["target"].mean()

We do the same for the advanced technical indicators.

In [48]:
adv_training_data = train_data[adv_kpis]
adv_targets = train_data["target"]
adv_test_data = test_data[adv_kpis]

In [49]:
model = RandomForestRegressor()

In [50]:
model.fit(adv_training_data, adv_targets)
adv_test_data = test_data[["Stock", "target"]]
adv_test_data["prediction"] = model.predict(test_data[adv_kpis])
adv_test_data.sort_values( by="prediction", ascending = False).head(10)

Unnamed: 0,Stock,target,prediction
717,DHY,0.14,1.76531
739,CIF,0.181395,1.568055
417,AHPI,-0.567912,1.369878
99,CTHR,0.726027,1.343309
141,CRIS,5.280992,1.229393
643,ADAP,-0.481518,1.169614
796,CAN,1.094737,1.160609
516,ARCT,-0.09371,1.127056
147,AIM,-0.245968,1.075598
234,BLU,-0.713314,1.02154


In [51]:
adv_ranking = adv_test_data.sort_values( by="prediction", ascending = False).head(10)
adv_prof = adv_ranking["target"].mean()

We show here the results for our two algorithms, in terms of profitability (return on investment) at 6 months. As we can see, the model with advanced KPIs (6.4) is showing improvements with respect to the basic model (1.8) and the market average (0.37)

In [52]:
profitability_df = pd.DataFrame([{"Model" : "Basic KPIs", "RoI@10" : basic_prof}, {"Model" : "Advanced KPIs", "RoI@10" : adv_prof},
                                 {"Model" : "Market average", "RoI@10" : test_data["target"].mean()}])
profitability_df

Unnamed: 0,Model,RoI@10
0,Basic KPIs,0.541263
1,Advanced KPIs,0.532073
2,Market average,0.328342


## Knowledge graph embedding model

For our model with knowledge graph embeddings, we concatenate the knowledge graph embeddings to the feature information. Then, we, again, run a random forest regression algorithm. We consider three variants here:
- **Pure KGE**: We just consider here the knowledge graph embeddings as features.
- **Basic KPIS + KGE**: We concatenate the knowledge graph embeddings to the basic set of KPIs.
- **Adv. KPIS + KGE**: We concatenate the knowledge graph embeddings to the advanced set of KPIs.

In [53]:
train_embeddings = pd.read_csv(os.path.join(storageDIR, "train-embeddings.csv"))
test_embeddings = pd.read_csv(os.path.join(storageDIR, "test-embeddings.csv"))

In [54]:
train_embeddings["Date"] = pd.to_datetime(train_embeddings["Date"])
test_embeddings["Date"] = pd.to_datetime(test_embeddings["Date"])
train_data["Date"] = pd.to_datetime(train_data["Date"])
test_data["Date"] = pd.to_datetime(test_data["Date"])

In [55]:
emb_train = train_data.merge(aux_embedding_df, on=["Date", "Stock"])
emb_test = test_data.merge(test_embeddings, on=["Date","Stock"])

In [56]:
emb_train

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,emb_40,emb_41,emb_42,emb_43,emb_44,emb_45,emb_46,emb_47,emb_48,emb_49
0,2019-07-02,12.150000,12.240000,12.050000,12.180000,10.080317,244100.0,AQN,12.156667,0.200000,...,-0.022007,0.016793,-0.001607,-0.003502,0.029241,-0.050825,0.011642,-0.049604,0.033425,-0.013161
1,2019-07-09,12.320000,12.390000,12.260000,12.350000,10.221010,356200.0,AQN,12.333333,0.130000,...,0.047396,-0.011730,0.013753,-0.038566,-0.004361,-0.016259,0.017535,-0.006330,0.034029,-0.018434
2,2019-07-16,12.390000,12.450000,12.310000,12.430000,10.287220,1042100.0,AQN,12.396667,0.140000,...,-0.005621,0.038773,-0.024181,0.014484,0.001894,-0.017361,-0.000112,-0.052542,-0.003507,-0.016277
3,2019-07-23,12.510000,12.520000,12.400000,12.500000,10.345153,191900.0,AQN,12.473333,0.120000,...,-0.020011,-0.012978,0.023941,0.062949,-0.038519,0.028674,0.022269,-0.059111,0.038498,-0.061962
4,2019-07-30,12.570000,12.630000,12.510000,12.540000,10.378257,171500.0,AQN,12.560000,0.120000,...,0.004806,-0.032123,0.029285,0.079971,-0.012026,0.001759,0.032440,-0.038817,0.020888,-0.093860
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21031,2019-12-03,231.479996,233.179993,230.779999,232.770004,211.122940,1825400.0,AMGN,232.243332,2.699997,...,0.029556,-0.005068,0.007728,0.025956,0.034219,0.016958,-0.015017,-0.054408,0.013953,-0.049529
21032,2019-12-10,231.919998,235.119995,231.509995,233.839996,212.093399,1494700.0,AMGN,233.489995,3.610000,...,-0.023706,-0.007840,0.018126,0.030357,-0.029743,0.019871,0.030797,-0.059287,-0.018930,-0.028992
21033,2019-12-17,243.580002,244.990005,241.279999,242.839996,220.256424,2382500.0,AMGN,243.036667,3.710006,...,-0.052367,0.024773,0.036267,0.034546,-0.022509,-0.007912,0.042483,-0.046954,0.015999,-0.044445
21034,2019-12-24,242.820007,243.100006,241.720001,242.330002,219.793869,612800.0,AMGN,242.383336,1.380005,...,0.052195,0.011874,-0.015224,0.031005,0.028517,0.002254,-0.017373,-0.040064,0.013459,-0.045875


In [57]:
emb_test

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Stock,tp,tr,...,emb_40,emb_41,emb_42,emb_43,emb_44,emb_45,emb_46,emb_47,emb_48,emb_49
0,2020-06-30,12.830000,12.990000,12.780000,12.940000,11.188896,710900.0,AQN,12.903333,0.210000,...,-0.015664,-0.048916,0.092334,0.015398,-0.008517,0.017235,0.092533,-0.069370,0.080700,-0.045376
1,2020-06-30,13.230000,13.760000,13.190000,13.680000,12.321792,1660700.0,ASB,13.543333,0.570000,...,-0.041450,-0.003741,-0.012581,0.043340,0.051862,-0.047080,0.042958,0.000271,0.049115,-0.001087
2,2020-06-30,5.560000,5.680000,5.540000,5.660000,5.215389,3130000.0,BCS,5.626667,0.140000,...,-0.018644,-0.007724,0.050545,0.027860,-0.012522,-0.027435,0.008788,-0.045211,0.000546,-0.004685
3,2020-06-30,215.740005,216.429993,212.889999,215.699997,215.699997,12933800.0,BABA,215.006663,3.539994,...,-0.070949,-0.069433,0.085232,0.059966,-0.114337,0.068149,0.001859,-0.074058,-0.042381,-0.031529
4,2020-06-30,52.380951,54.104309,52.299320,53.941044,51.691570,403184.0,CBSH,53.448224,1.804989,...,-0.007494,0.011216,0.010494,0.021742,0.020974,-0.037514,-0.015808,0.001298,0.025050,0.008886
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
799,2020-06-30,34.790001,35.619999,34.419998,35.299999,33.796753,2938800.0,BWA,35.113332,1.200001,...,-0.079407,-0.047022,0.115204,0.081749,-0.087108,0.001518,0.070251,-0.070315,0.030493,-0.084803
800,2020-06-30,77.510002,83.900002,77.360001,82.279999,82.279999,1290300.0,AXSM,81.180001,7.540001,...,0.005741,0.001742,-0.023270,-0.042275,0.032415,-0.071997,-0.011504,0.026996,-0.025855,0.046815
801,2020-06-30,36.360001,36.799999,36.020000,36.299999,34.346691,415500.0,ABM,36.373333,0.779999,...,-0.006010,-0.061017,0.061057,0.012548,-0.042969,0.082653,0.005060,-0.068639,0.019363,-0.001039
802,2020-06-30,0.570000,0.580000,0.510000,0.540000,0.540000,1736400.0,ADMP,0.543333,0.070000,...,-0.017581,-0.054840,-0.010290,-0.003979,0.017739,0.066956,0.003219,-0.025322,-0.015035,0.008073


In [58]:
emb_feats = []
for i in range(0,50):
    emb_feats.append("emb_"+str(i))

In [59]:
profit_list = []

#### Model 1: Pure embeddings:

In this model, we just take the embeddings: not any other feature:

In [60]:
model = RandomForestRegressor()

In [61]:
emb_train_data = emb_train[emb_feats]
emb_test_data = emb_test[emb_feats]
emb_train_targets = emb_train["target"]

In [62]:
model.fit(emb_train_data, emb_train_targets)
emb_test_data = emb_test[["Stock", "target"]]
emb_test_data["prediction"] = model.predict(emb_test[emb_feats])
emb_ranking = emb_test_data.sort_values(by="prediction", ascending = False).head(10)

In [63]:
emb_ranking

Unnamed: 0,Stock,target,prediction
255,AMAL,0.075949,1.371769
491,AVB,0.016619,1.145782
605,AGI,-0.077825,1.144775
585,APVO,3.173652,1.056074
272,CMPR,0.131648,0.7841
199,BR,0.19954,0.778569
690,CEMI,0.449231,0.659986
576,BWMX,2.311225,0.534906
528,AMRX,-0.054622,0.532212
634,AIV,-0.009556,0.499215


In [64]:
profit_list.append({"Model" : "Pure KGE", "RoI@10" : emb_ranking["target"].mean()})

#### Model 2: Basic KPIs + Embeddings

For this model, we take, as features, (a) the knowledge graph embeddings and (b) the basic technical indicators (RoI, average price and volatility).

In [65]:
model = RandomForestRegressor()

In [66]:
emb_train_data = emb_train[emb_feats + basic_kpis]
emb_train_targets = emb_train["target"]

In [67]:
model.fit(emb_train_data, emb_train_targets)
emb_test_data = emb_test[["Stock", "target"]]
emb_test_data["prediction"] = model.predict(emb_test[emb_feats + basic_kpis])
emb_basic_ranking = emb_test_data.sort_values(by="prediction", ascending = False).head(10)

In [68]:
emb_basic_ranking

Unnamed: 0,Stock,target,prediction
415,AHPI,-0.567912,1.254106
583,BLNK,6.113556,1.193845
11,AGRX,0.035971,1.150282
522,CIDM,-0.638743,1.075328
730,ANY,-0.493007,1.049598
682,BBW,0.853211,0.990734
597,ABUS,0.923077,0.965546
146,AIM,-0.245968,0.938094
210,BW,0.434211,0.882872
793,CAN,1.094737,0.876885


In [69]:
profit_list.append({"Model" : "Basic KPIs + KGE", "RoI@10" : emb_basic_ranking["target"].mean()})

#### Model 3: Advanced KPIs + Embeddings

For this model, we take, as features, (a) the knowledge graph embeddings and (b) the advanced technical indicators.

In [70]:
model = RandomForestRegressor()

In [71]:
emb_train_data = emb_train[emb_feats + adv_kpis]
emb_train_targets = emb_train["target"]

In [72]:
model.fit(emb_train_data, emb_train_targets)
emb_test_data = emb_test[["Stock", "target"]]
emb_test_data["prediction"] = model.predict(emb_test[emb_feats + adv_kpis])
emb_ranking = emb_test_data.sort_values(by="prediction", ascending = False).head(10)

In [73]:
profit_list.append({"Model" : "Advanced KPIs + KGE", "RoI@10" : emb_ranking["target"].mean()})

Finally, we show the performance of our algorithms in terms of RoI at six months of the top 10 recommended assets. 

In [74]:
adv_df = pd.DataFrame(profit_list)

In [75]:
pd.concat([profitability_df, adv_df]).reset_index(drop=True)

Unnamed: 0,Model,RoI@10
0,Basic KPIs,0.541263
1,Advanced KPIs,0.532073
2,Market average,0.328342
3,Pure KGE,0.621586
4,Basic KPIs + KGE,0.750913
5,Advanced KPIs + KGE,0.187828


As we can see, in this test, we show that the knowledge graph embeddings are useful for the recommendations -- increasing in an absolute 21% with respect to the "Basic KPIs" baseline when we add the embeddings to that method -- and therefore, illustrating the potential effectiveness of knowledge-enhanced models.

<!-- In the above table, we are showing the top 10 stocks that were predicted to be profitable. The last two columns report the actual return on investment after 9 months and asset volitility. Note that a return value of 1.5 means a 150% return on investment. 

We notice that predicted returns for the top stocks are exceedingly high, which is not ordinary. However, we can also see that the actual returns for these stocks are similar for several of these instances, i.e. the model is not wrong in predicting these as profitable investments in the short term. However, we can also see that the volitility fo these stocks is very high, i.e. these are 'high-risk' assets that may subsequently crash in price. 

We can also analyse the statistics for this predictions across the dataset. -->

<!-- The returns and volatility for the top stocks, ranked by predicted returns, are far higher than their averages across the test set. This indicates that ranking assets by their predicted returns can produce some highly profitable but risk-laden investment recommendations, which might be suitable for aggressive investors. However, it remains to be seen how much of this is owed to fluctuations and outliers in the data, and perhaps even if there are better ways to capture the returns and volatility of the dataset.

Next, we look at the differences between the actual and predicted returns. -->

<!-- Lastly, we can examine the mean absolute error and mean squared error of the predictions. As these can be quite dependent on the dataset and problem in question, we also assume a simple baseline, by taking the median of all stock returns from the test dataset. We then compare the results of applying these metrics to the baseline and our predictor model. -->

<!-- We can see from this that the random forest model presents an improvement (reduction) in both MAE and MSE. -->