Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added function for detecting message lang #29

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

emills11
Copy link

I went ahead and made a basic function for detecting the language of a message, in order to identify it as either being typed in the user's target language (so it can be seen by other users) or native language (so it can be ignored).

I did run into an issue concerning the langdetect library; due to the nature of the library's probability-based algorithm, it will occasionally misidentify a message's language if the message contains spelling errors. For example, "Hello World!" will return English, while "Helo Woorld!" will return Dutch. I could use some help coming up with a solution for this problem.

@emills11
Copy link
Author

I may have found a possible solution to the above problem by iterating through the Language objects that are returned when calling detect_langs(), and checking to see if any of the probable languages match either the user's target language or native language. Will push a second commit when I get home.

@ykdojo
Copy link
Owner

ykdojo commented Nov 16, 2018

Thank you. I'll take a look at this after I publish my next video about #22.

Copy link
Owner

@ykdojo ykdojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a comment. Probably not going to use this for the initial version of this app, so I think we should leave the PR open for now.

init_time = time.time()
while True:
# if native/target lang is not detected after five seconds, return None
if (time.time()-init_time) >= 5:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. Just curious, why did you decide to go with the approach of limiting this with time? (As opposed to the number of loops, for example)

# if native/target lang is not detected (due to langdetect probability algorithm) then generate new probabilities
# until native/target lang is detected
else:
init_time = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
init_time = time.time()
prob_list = detect_langs(message_text)
for item in prob_list:
if item.lang == user_target_lang or item.lang == user_native_lang:
return item.lang
return None

the code you have below should work perfectly well by itself, I don't understand the purpose of the while loop in your algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants