# Run DLCC experiments

This notebook consists of running the DLCC Evaluation Framework with DBpedia gold standards on a list of previously computed embedding files (obtained with the notebook `get_vectors_pipeline.ipynb`). Results are analyzed in a separate notebook (`analyze_dlcc.ipynb`).

### Setup before running this notebook for the first time:

1. Clone the DL TC Generator repo (from Portisch et al.) to the folder `dlcc_testcollections`. The DBpedia gold standard files are located in `results/v1/dbpedia`.

```
git clone https://github.com/janothan/DL-TC-Generator.git dlcc_testcollections
```

2. Clone the DLCC evaluation framework to the folder `dlcc_evaluation_framework`.

```
git clone https://github.com/janothan/dl-evaluation-framework.git dlcc_evaluation_framework
```

3. Move this notebook, the complete dbpedia folder, and the complete embeddings folder to `dlcc_evaluation_framework`

4. Uncomment the embedding variants to run the experiments. The complete execution of all 55 variants may take 12 to 24 hours. Results will be saved in a newly created `results` folder

In [1]:
from datetime import datetime 
import time
from dl_evaluation_framework.evaluation_manager import EvaluationManager, VectorTuple
import warnings
warnings.filterwarnings('ignore')


embedding_models = [
#     'rdf2vec-cbow',
#     'rdf2vec-cbow-oa',
#     'rdf2vec-sg',
#     'rdf2vec-sg-oa',
#     'non-rdf2vec-ComplEx',
#     'non-rdf2vec-DistMult',
#     'non-rdf2vec-RESCAL',
#     'non-rdf2vec-RotatE',
#     'non-rdf2vec-TransE-L1',
#     'non-rdf2vec-TransE-L2',
#     'non-rdf2vec-TransR',
]

model_variants = [
#     '-200-original',
#     '-200-avgbin',
#     '-128-autoencoded',
#     '-256-autoencoded',
#     '-512-autoencoded',
]

embedding_txt_filenames = [em+mv for mv in model_variants for em in embedding_models]

vlist = [VectorTuple(vector_name=embedding, vector_path=f"embeddings/{embedding}.txt") for embedding in embedding_txt_filenames]

test_directory = "dbpedia/"

em = EvaluationManager(test_directory=test_directory)

start_time = datetime.now()
em.evaluate(vector_names_and_files=vlist)
end_time = datetime.now()

print(f"Finished DLCC for all embeddings. Took {end_time - start_time}")