-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
t-test is two-tailed instead of one-tailed #104
Comments
I am adding a flag for t_test alternative and setting it by default to "less". |
This will be available on the next release |
Perhaps for the sake of not breaking the API, setting the default to |
@erip Thats a good point thanks! The next release will be 2.0 and we are also going to replace the default models with new ones. We will make it clear that scores (with default settings) won't be directly comparable to the previous version (1.1.3). There will be backward compatibility but default options will probably change for all 3 commands: |
I agree with you that people typically want to know if baseline mean score is less than sys1's mean score. I think its a good call to change the t_test to Atm this is only updated in the |
It seems very reasonable to me. I'm hoping there's not some nuance that I've overlooked here. I can look at sacrebleu to see what their alternative hypothesis is in their tests (for sake of consistency more than correctness). |
Perfect! Thanks! |
@erip I looked a bit more into this and indeed two-sided t_test is more usual and results made more sense in my tests. Nonetheless I am keeping the option to change that in the command line. I am going to merge v2.0 into master. The release was delayed but at least master will contain the new changes |
馃悰 Bug
The code to perform paired t-test is two-tailed instead of one-tailed. The alternative hypothesis that users typically care about is that the baseline mean score is less than sys1's mean score, but that is not reflected in the test.
To Reproduce
See here :-)
Expected behaviour
The test should probably use
alternative="less"
or otherwise be configurable.Screenshots
If applicable, add screenshots to help explain your problem.
Environment
OS: [e.g. iOS, Linux, Win] N/A
Packaging [e.g. pip, conda] N/A
Version [e.g. 0.5.2.1] N/A
Additional context
The text was updated successfully, but these errors were encountered: