The code for the paper Automated Leaderboard System for Hackathon Evaluation Using Large Language Models
Overview of the system architecture illustrating the processing pipeline of the Jupyter notebook submissions, from initial raw data intake to the final predicted results. Bland-Altman analysis [22] table reveals a mean difference (bias) of 27.5 points—meaning the LLM scores are, on average, 27.5 points higher than the technical scores and representing roughly 6.9% of the maximum technical score. The 95% limits of agreement (–6.83 to 61.83) indicate that most differences fall within a 68.66-point range, which aligns with typical inter-rater variability in manual grading and supports the reliability of our hybrid evaluation approach.Preparing Kaggle API credentials.
pip install kaggle
Run the file retrieve-competition.py
it will download and convert all the submission files to .md file.
You should have node on your machine. And you are welcome to create a SQLite db file results.db
npm install
node index.js
node mark.js
@software{Li_Automated_Leaderboard_System_2025,
author = {Li, Bowen and Cheng, Bohan and Talyor, Patrick and Osborne, Dale and Han, Fengling and Shen, Robert and Gondal, Iqbal},
doi = {<>},
month = mar,
title = {{Automated Leaderboard System for Hackathon Evaluation Using Large Language Models}},
url = {https://github.com/SkywardAI/hackathon-leaderboard},
version = {1.0.0},
year = {2025}
}