Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils: fix autoDecode error for specific sequence #102

Merged
merged 1 commit into from
May 27, 2021
Merged

Conversation

wwade
Copy link
Owner

@wwade wwade commented May 27, 2021

Since we're probably going to be seeing either ASCII or UTF-8 input anyway, bump the
detection confidence requirement from 0.5 to 0.8. In the case of the first test input
shown in utils_test.py, it was coming in at 0.559 for Windows-1254 and just 0.505 for
UTF-8, when in fact, it's UTF-8.

This also makes me question using chardet at all, but it probably won't hurt at the
new confidence threshold.

@wwade
Copy link
Owner Author

wwade commented May 27, 2021

Thanks to @tsiemens for the problem report.

Since we're probably going to be seeing either ASCII or UTF-8 input anyway, bump the
detection confidence requirement from 0.5 to 0.8. In the case of the first test input
shown in utils_test.py, it was coming in at 0.559 for Windows-1254 and just 0.505 for
UTF-8, when in fact, it's UTF-8.

This also makes me question using chardet at all, but it probably won't hurt at the
new confidence threshold.
@wwade wwade enabled auto-merge (rebase) May 27, 2021 23:48
@wwade wwade disabled auto-merge May 27, 2021 23:55
@wwade wwade merged commit 12e5949 into master May 27, 2021
@wwade wwade deleted the fix-decode branch May 27, 2021 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant