Skip to content

zgornel/knowledge-driven-automl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge-driven AutoML

This is a repository that contains the knowledge bases and experiments for testing out Kdautoml.jl

Installation

The steps detailed below suppose that one has installed docker, julia on a unix-like system such as GNU/Linux.

Installation can be performed by:

  • cloning the repository with git clone https://github.com/zgornel/knowledge-driven-automl/
  • get the Kdautoml sub-module with cd knowledge-driven-automl && git submodule init && git submodule update dev/Kdautoml
  • get the neo4j docker image docker image pull neo4j:4.2.3
  • install/update all Julia package dependencies julia --project=dev/Kdautoml -e "using Pkg; Pkg.add(url=\"https://github.com/dpsanders/SatisfiabilityInterface.jl\"); Pkg.update()"
  • start container for pipeline synthesis knowledge base (replace $FULL_PATH with the actual full path to the current folder):
    docker run \
     --detach \
     --publish=7475:7474 \
     --publish=7687:7687 \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_pipesynthesis_kb/data/:/data \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_pipesynthesis_kb/logs:/logs \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_pipesynthesis_kb/import:/var/lib/neo4j/import \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_pipesynthesis_kb/import:/var/lib/neo4j/import \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_pipesynthesis_kb/conf:/conf \
     --env NEO4J_dbms_memory_pagecache_size=4G \
     --env NEO4J_AUTH=neo4j/test \
     --name neo4j_pipesynthesis_kb \
     neo4j:4.2.3
    
  • start container for feature synthesis knowledge base (replace $FULL_PATH with the actual full path to the current folder):
    docker run \
     --detach \
     --publish=7475:7474 \
     --publish=7688:7687 \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_featuresynthesis_kb/data/:/data \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_featuresynthesis_kb/logs:/logs \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_featuresynthesis_kb/import:/var/lib/neo4j/import \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_featuresynthesis_kb/import:/var/lib/neo4j/import \
     --volume=$FULL_PATH/knowledge-driven-automl/data/db/neo4j_featuresynthesis_kb/conf:/conf \
     --env NEO4J_dbms_memory_pagecache_size=4G \
     --env NEO4J_AUTH=neo4j/test \
     --name neo4j_featuresynthesis_kb \
     neo4j:4.2.3
    
  • NOTE: Containers can be restarted using their name onlu i.e. docker start neo4j_pipesynthesis_kb
  • Add the data to the neo4j graph db instances runing in the containers with:
    julia ./dev/Kdautoml/scripts/fill_pipesynthesis_kb.jl ./data/knowledge/pipe_synthesis.toml &&
    julia ./dev/Kdautoml/scripts/fill_featuresynthesis_kb.jl ./data/knowledge/feature_synthesis.toml
    

Reproducing the experimeriments for KAIS (Knowledge and Information Systems) journal

The experiments need different knowledge bases from the 'generic' ones present in data/knowledge

'XOR' experiment

The 'XOR' problem experiment uses a different feature synthesis kb which has to be imported:

julia ./dev/Kdautoml/scripts/fill_featuresynthesis_kb.jl ./experiments/KAIS/xor-dataset-expriment-1/feature_synthesis_xor_usecase.toml

The pipeline synthesis knowledge base remains the one present in data/knowledge/pipe_synthesis.toml. One can run the experiment with:

./experiments/KAIS/xor-dataset-expriment-1/build_pipes_xor_usecase.jl

'Circles' experiment

The 'Circles' dataset experiment uses a different pipeline synthesis kb which has to be imported:

julia ./dev/Kdautoml/scripts/fill_pipesynthesis_kb.jl ./experiments/KAIS/circles-dataset-experiment-2/pipe_synthesis_circles_usecase.toml

The feature synthesis knowledge base remains the one present in data/knowledge/feature_synthesis.toml. One can run the experiment with:

./experiments/KAIS/circles-dataset-experiment-2/build_pipes_circles_usecase.jl

Printing or accessing the experimental results

Once an experiment job finishes, the results are serialized in the current directory in a file called either _results_xor_experiment.bin or _results_circles_experiment.bin respectively. The print_results.jl scripts corrsponding to each experiment can be ran to print the pipeline space or pipeline space statistics extracted from the serialized data. The file also provides insights into how to access the results. The .bin file needs to be in the same directory as the print_results.jl file.

Monitoring the pipeline space growth

Monitoring the pipeline building progress (pipeline space evolution) can be done with watch -n 0.1 cat __tree__ where __tree__ is a file that is iteratively updated with a printout of the state of the pipelines during synthesis.

License

This code GPL v3, see LICENSE.md.

Publication

The associated paper is "A knowledge-driven AutoML architecture" (arxiv). Cite as:

@misc{cofaru2023knowledgedriven,
  title={A knowledge-driven AutoML architecture},
  author={Corneliu Cofaru and Johan Loeckx},
  year={2023},
  eprint={2311.17124},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Reporting Bugs

At the moment the code is under heavy development and much of the API and features are subject to change ¯\(ツ)/¯. Please file an issue to report a bug or request a feature.

About

Knowledge-driven AutoML (experiments)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published