GitHub - vkaul11/ruler

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
eval		eval
README.md		README.md
phi-model-4k.txt		phi-model-4k.txt
requirements.txt		requirements.txt
run_all_tasks.sh		run_all_tasks.sh
run_hashhop.sh		run_hashhop.sh
run_niah.sh		run_niah.sh
run_qa.sh		run_qa.sh
run_variable_tracking.sh		run_variable_tracking.sh

Repository files navigation

Steps for running the Ruler Eval

Install packages in requirements.txt
Change the default.yaml files in each of the tasks as needed. In particular, change max_seq_length to 64k, 32k etc according to the context length you want. a. https://github.com/vkaul11/ruler/blob/main/data/niah/conf/simulation/default.yaml b. https://github.com/vkaul11/ruler/blob/main/data/qa/conf/simulation/default.yaml c. https://github.com/vkaul11/ruler/blob/main/data/variable_tracking/conf/simulation/default.yaml d. https://github.com/vkaul11/ruler/blob/main/data/hash_hop/conf/config.yaml
Change the model_id, auth_key and url for evaluation https://github.com/vkaul11/ruler/blob/main/eval/conf/config.yaml
Run the bash scripts https://github.com/vkaul11/ruler/blob/main/run_all_tasks.sh for running all the 3 tasks that will print out the metric per example and average metric or Run 1)NIAH task https://github.com/vkaul11/ruler/blob/main/run_niah.sh or 2) Variable tracking task https://github.com/vkaul11/ruler/blob/main/run_variable_tracking.sh or 3) QA task https://github.com/vkaul11/ruler/blob/main/run_qa.sh if you want to run the tasks individually and get the metrics or a different bash file 4) https://github.com/vkaul11/ruler/blob/main/run_hashhop.sh
The eval directory will also have the predictions and errors for each of the tasks outputted.

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages