You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
馃憢 Hi, thanks for this neat little Python library!
I've been tinkering with it for a bit and noticed a couple of things that you might already be aware of. If you pass content from a language the classifier doesn't know of or if you pass something like null or an empty string, you will get a misidentification. Here's some examples:
echo''| guesslang
# The source code is written in Shell
echo'""'| guesslang
# The source code is written in Python
# This file is written in Assembly
cat fasm.asm | guesslang
# The source code is written in Python
A few questions:
Have you thought about returning null for guesses that don't meet a certain threshold?
Have you thought about returning the probability that a particular guess is correct and letting clients/consumers determine if the threshold is high enough to proceed?
The text was updated successfully, but these errors were encountered:
I'm happy to see that you liked playing with this library.
You have raised some interesting points here:
Have you thought about returning null for guesses that don't meet a certain threshold?
At first I tried setting arbitrary thresholds (at least 10 words, or the difference between the languages probabilities should be bigger than a given value, etc...) with no success.
When the guess_language method will be called with an abnormal/unknown text, it will return a None value.
And if you have an other solution in mind, feel free to share it.
Have you thought about returning the probability that a particular guess is correct and letting clients/consumers determine if the threshold is high enough to proceed?
That's a nice idea 馃憤, I didn't think about that.
By the way I'm already using the probabilities to build the list of probable languages, it will be quite simple to expose them to the consumers
I've made few changes on Guesslang about this issue:
Empty and blank source codes are now detected
Prediction probabilities are given by guess.probabilities(source_code) function.
guess.language_name(source_code) returns None when the detected language probability doesn't reach a certain threshold threshold < 2 * stdev(all_probabilities)
馃憢 Hi, thanks for this neat little Python library!
I've been tinkering with it for a bit and noticed a couple of things that you might already be aware of. If you pass content from a language the classifier doesn't know of or if you pass something like null or an empty string, you will get a misidentification. Here's some examples:
A few questions:
null
for guesses that don't meet a certain threshold?The text was updated successfully, but these errors were encountered: