Add C-Eval inference scripts #601

iyorke · 2023-06-15T06:39:12Z

Description

This PR adds the scripts for evaluating the models on C-Eval benchmark.

The results presented in the technical report can be reproduced by the scripts.

wiki:

Example Usage

cd scripts/ceval
python eval.py \
    --model_path ${model_path} \
    --cot False \
    --few_shot False \
    --with_prompt True \
    --constrained_decoding True \
    --temperature 0.2 \
    --n_times 1 \
    --ntrain 5 \
    --do_save_csv False \
    --do_test True \
    --output_dir outputs

Related Issue

Close #561

update do_test and do_save_csv options

ymcui · 2023-06-16T02:19:04Z

GPT-4 with Code Interpreter made some suggestions.
Please make any changes that seem valid and appropriate.

Here's a high-level review of your code.

Code Organization

You've done a good job of breaking down the code into different modules based on their purpose. For instance, eval.py is the main execution script, evaluator.py is the base class of the evaluator, and llama_evaluator.py is a specific implementation of the evaluator.

Code Quality

The code is generally well-written and easy to understand, following most of the standard Python practices. However, there are a few improvements that could be made:

1. Docstrings:

While the functions are named clearly, adding docstrings would help provide additional context on what each function does, what the arguments are, and what is returned. This is essential for maintaining the code in the long run and for anyone else who might need to understand the code.

2. Type Hints:

Consider adding type hints to your function definitions. This will make the code easier to understand and reduce the likelihood of errors.

3. Variable Naming:

Some variable names could be improved to be more descriptive. For example, cot, k, sA_id, etc.

4. Handling Boolean Arguments:

For boolean command-line arguments, you could use action='store_true' or action='store_false'. This way, you won't need to convert them from strings to booleans later.

5. Hardcoded Parameters:

Some parameters are hardcoded in your scripts. Consider making these configurable via command-line arguments or a configuration file.

6. Exception Handling:

There are some places in your code where an error could occur. Adding more robust error handling would make your code more resilient. For example, when loading JSON files or reading CSV files, the file might not exist or could be in an unexpected format.

7. Use of Print Statements:

There are several print statements used for logging purposes. Consider using a dedicated logger such as the logging module in Python, which provides more flexibility and control.

8. Directories and File Paths:

The directories 'data/val', 'data/dev', 'data/test' and the file 'subject_mapping.json' are hardcoded. Make these configurable to improve the flexibility of the script.

9. Code Duplication:

There's some duplicated code in eval_subject method of Llama_Evaluator. For instance, correct_num += 1 and correct = 1 vs correct = 0 could be simplified.

10. Magic Numbers:

There are some magic numbers in your code (e.g., top_k=40, top_p=0.9, repetition_penalty=1.1, max_new_tokens=20). It would be good to either make these configurable parameters or define them as constants at the top of your file.

Overall, it's a well-structured and clean codebase. With a few improvements, it can be even better!

iyorke and others added 2 commits June 15, 2023 14:34

ceval evaluation

5388b43

Update eval.py

4186d46

update do_test and do_save_csv options

ymcui changed the title ~~ceval evaluation~~ Add C-Eval inference scripts Jun 16, 2023

ymcui requested a review from airaria June 16, 2023 02:40

Update eval.py

5e315ca

airaria approved these changes Jun 16, 2023

View reviewed changes

ymcui merged commit c7e9782 into ymcui:main Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C-Eval inference scripts #601

Add C-Eval inference scripts #601

iyorke commented Jun 15, 2023 •

edited by ymcui

Loading

ymcui commented Jun 16, 2023

Add C-Eval inference scripts #601

Add C-Eval inference scripts #601

Conversation

iyorke commented Jun 15, 2023 • edited by ymcui Loading

Description

Example Usage

Related Issue

ymcui commented Jun 16, 2023

Code Organization

Code Quality

1. Docstrings:

2. Type Hints:

3. Variable Naming:

4. Handling Boolean Arguments:

5. Hardcoded Parameters:

6. Exception Handling:

7. Use of Print Statements:

8. Directories and File Paths:

9. Code Duplication:

10. Magic Numbers:

iyorke commented Jun 15, 2023 •

edited by ymcui

Loading