Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C-Eval inference scripts #601

Merged
merged 3 commits into from
Jun 16, 2023
Merged

Add C-Eval inference scripts #601

merged 3 commits into from
Jun 16, 2023

Conversation

iyorke
Copy link
Contributor

@iyorke iyorke commented Jun 15, 2023

Description

This PR adds the scripts for evaluating the models on C-Eval benchmark.

The results presented in the technical report can be reproduced by the scripts.

wiki:

Example Usage

cd scripts/ceval
python eval.py \
    --model_path ${model_path} \
    --cot False \
    --few_shot False \
    --with_prompt True \
    --constrained_decoding True \
    --temperature 0.2 \
    --n_times 1 \
    --ntrain 5 \
    --do_save_csv False \
    --do_test True \
    --output_dir outputs

Related Issue

Close #561

iyorke and others added 2 commits June 15, 2023 14:34
update do_test and do_save_csv options
@ymcui ymcui changed the title ceval evaluation Add C-Eval inference scripts Jun 16, 2023
@ymcui
Copy link
Owner

ymcui commented Jun 16, 2023

GPT-4 with Code Interpreter made some suggestions.
Please make any changes that seem valid and appropriate.


Here's a high-level review of your code.

Code Organization

You've done a good job of breaking down the code into different modules based on their purpose. For instance, eval.py is the main execution script, evaluator.py is the base class of the evaluator, and llama_evaluator.py is a specific implementation of the evaluator.

Code Quality

The code is generally well-written and easy to understand, following most of the standard Python practices. However, there are a few improvements that could be made:

1. Docstrings:

While the functions are named clearly, adding docstrings would help provide additional context on what each function does, what the arguments are, and what is returned. This is essential for maintaining the code in the long run and for anyone else who might need to understand the code.

2. Type Hints:

Consider adding type hints to your function definitions. This will make the code easier to understand and reduce the likelihood of errors.

3. Variable Naming:

Some variable names could be improved to be more descriptive. For example, cot, k, sA_id, etc.

4. Handling Boolean Arguments:

For boolean command-line arguments, you could use action='store_true' or action='store_false'. This way, you won't need to convert them from strings to booleans later.

5. Hardcoded Parameters:

Some parameters are hardcoded in your scripts. Consider making these configurable via command-line arguments or a configuration file.

6. Exception Handling:

There are some places in your code where an error could occur. Adding more robust error handling would make your code more resilient. For example, when loading JSON files or reading CSV files, the file might not exist or could be in an unexpected format.

7. Use of Print Statements:

There are several print statements used for logging purposes. Consider using a dedicated logger such as the logging module in Python, which provides more flexibility and control.

8. Directories and File Paths:

The directories 'data/val', 'data/dev', 'data/test' and the file 'subject_mapping.json' are hardcoded. Make these configurable to improve the flexibility of the script.

9. Code Duplication:

There's some duplicated code in eval_subject method of Llama_Evaluator. For instance, correct_num += 1 and correct = 1 vs correct = 0 could be simplified.

10. Magic Numbers:

There are some magic numbers in your code (e.g., top_k=40, top_p=0.9, repetition_penalty=1.1, max_new_tokens=20). It would be good to either make these configurable parameters or define them as constants at the top of your file.

Overall, it's a well-structured and clean codebase. With a few improvements, it can be even better!

@ymcui ymcui requested a review from airaria June 16, 2023 02:40
@ymcui ymcui merged commit c7e9782 into ymcui:main Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C-Eval 效果评测
3 participants