Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for analyzing evaluators with custom cross-annotations #281

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

rdnfn
Copy link
Contributor

@rdnfn rdnfn commented Apr 17, 2024

Firstly, thanks a lot for creating and sharing the AlpacaEval package! I am finding it very useful (and well documented).

This PR fixes a small bug when analyzing evaluators on a custom (new) cross-annotation dataset. I have found that the main.analyze_evaluators function does not support this use-case yet. In particular, the alpaca_eval.analyze.Analyzer class assumes that the default cross-annotation dataset is being used when computing the correlations. As the generator column is not present in this dataset, it is being extracted/matched from the main annotation dataset (referring to this line). This matching fails if you use a different cross-annotation dataset. Thus, I updated the if-statement to only run if there is no generator columns present. If the generator is present in the custom cross-annotation dataset (as below), it no longer runs the specific matching code - and does not throw an error.

Let me know if this fix would be helpful to add.

Reproducing use-case

Simple code to test the default annotator on a custom cross-annotation dataset:

# note that this assumes that the OpenAI API key is set in client_configs
from alpaca_eval import main, constants

# Default annotator leaderboard on standard AlpacaEval cross-annotated dataset
evaluator_leaderboard, all_crossannotations = main.analyze_evaluators(
    annotators_config= constants.DEFAULT_ANNOTATOR_CONFIG,
    is_return_instead_of_print=True,
    precomputed_leaderboard="tmp_leaderboard.csv",
    is_single_annotator=True,
    analyzer_kwargs={
        "gold_crossannotations": "test_custom_crossannotations.json",
        "gold_annotations": None,
    }
)

It can be run with the following (toy) test_custom_crossannotations.json file:

[
    {
        "instruction": "Do you prefer cats or dogs?",
        "output_1": "Cats",
        "output_2": "Dogs",
        "preference": 1,
        "annotator_index": 15,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_01"
    },
    {
        "instruction": "Do you prefer cats or dogs?",
        "output_1": "Cats",
        "output_2": "Dogs",
        "preference": 1,
        "annotator_index": 0,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_01"
    },
    {
        "instruction": "Do you prefer cats or dogs?",
        "output_1": "Cats",
        "output_2": "Dogs",
        "preference": 1,
        "annotator_index": 9,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_01"
    },
    {
        "instruction": "Do you prefer cats or dogs?",
        "output_1": "Cats",
        "output_2": "Dogs",
        "preference": 2,
        "annotator_index": 7,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_01"
    },
    {
        "instruction": "Should I get the blue or green box?",
        "output_1": "Green",
        "output_2": "Blue",
        "preference": 1,
        "annotator_index": 10,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_02"
    },
    {
        "instruction": "Should I get the blue or green box?",
        "output_1": "Green",
        "output_2": "Blue",
        "preference": 2,
        "annotator_index": 15,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_02"
    },
    {
        "instruction": "Should I get the blue or green box?",
        "output_1": "Green",
        "output_2": "Blue",
        "preference": 1,
        "annotator_index": 0,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_02"
    },
    {
        "instruction": "Should I get the blue or green box?",
        "output_1": "Green",
        "output_2": "Blue",
        "preference": 2,
        "annotator_index": 4,
        "dataset": "custom_dataset",
        "datasplit": "eval",
        "generator": "dummy_model_02"
    }
]

@rdnfn rdnfn changed the title Add support for adding evaluators with custom cross-annotations Add support for analyzing evaluators with custom cross-annotations Apr 17, 2024
@YannDubs
Copy link
Collaborator

YannDubs commented Apr 18, 2024

Great, thanks @rdnfn for the detailed PR and kind words! 💯

@YannDubs YannDubs merged commit d1b3061 into tatsu-lab:main Apr 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants