Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results differ from python library #74

Closed
daniel17903 opened this issue Feb 28, 2019 · 2 comments
Closed

Results differ from python library #74

daniel17903 opened this issue Feb 28, 2019 · 2 comments

Comments

@daniel17903
Copy link

daniel17903 commented Feb 28, 2019

Hi, while porting some python code to java I discovered that the Token Sort and Token Set Ratios calculated by this library oftentimes do not match the ones calculated by the python fuzzywuzzy library.

Here is an example:
Python Code:

from fuzzywuzzy import fuzz 
print(str(fuzz.token_sort_ratio("efwe fwef","wef wefwef"))) 
print(str(fuzz.token_set_ratio("efwe fwef","wef wefwef"))) 

Output:

53
53

Java Code:

import me.xdrop.fuzzywuzzy.FuzzySearch;

public class Main {
	public static void main(String[] args) {
		System.out.println(FuzzySearch.tokenSortRatio("efwe fwef","wef wefwef"));
		System.out.println(FuzzySearch.tokenSetRatio("efwe fwef","wef wefwef"));
	}
}

Output:

84
84

Where is this difference coming from? Shouldn't these two outputs be equal?

@xdrop
Copy link
Owner

xdrop commented Feb 28, 2019

We only ported the python-levenshtein module and not the built-in python difflib (for speed). Are you using the Python library with the python-levenshtein module installed?

ie.
instead of

pip install fuzzywuzzy

use

pip install fuzzywuzzy[speedup]

@daniel17903
Copy link
Author

Thanks. When installing fuzzywuzzy[speedup] the results match. I wasn't aware that using different libraries impacts the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants