Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many videos have the wrong language #93

Closed
lfaucon opened this issue May 7, 2021 · 3 comments
Closed

Many videos have the wrong language #93

lfaucon opened this issue May 7, 2021 · 3 comments
Labels
Backend Back-end code of Tournesol Bug Something isn't working

Comments

@lfaucon
Copy link
Member

lfaucon commented May 7, 2021

For example:

  • "I-n2OU78Ck8" is French, but Tournesol thinks it is English
  • "Ms153kckhkQ" is French, but tournesol thinks it is German

Would Youtube provide more reliable information here ?

@lfaucon lfaucon added the Bug Something isn't working label May 7, 2021
@aidanjungo
Copy link
Collaborator

Yes same for:

  • "ARAQUgkdIvQ" is French, but Tournesol thinks it is English
  • "F7YlkbhYq3s" is French, but Tournesol thinks it is English

@aidanjungo
Copy link
Collaborator

aidanjungo commented May 7, 2021

Also the "langdetect" gives very strange and not consistent results when there is a mix of two languages:

with the code:

from langdetect import detect,detect_langs

t = 'This is fromage, baguette et croissant and also you can imagine ça pourrait être english or french'
for i in range (6):
    print('Test ',i)
    print(detect_langs(t))
    print(detect(t))

It gives:

Test 0
[fr:0.5714279843168742, en:0.4285717922216604]
fr
Test 1
[en:0.7142837816924028, fr:0.28571557060660485]
en
Test 2
[fr:0.5714299506390116, en:0.42856986113627804]
fr
Test 3
[en:0.7142838481684115, fr:0.2857161439607298]
en
Test 4
[en:0.8571394586776105, fr:0.14286018540151038]
en
Test 5
[en:0.8571396261756059, fr:0.1428602973501344]
fr

@sergeivolodin
Copy link

I'll try to get the language from ytdl

@sergeivolodin sergeivolodin self-assigned this May 19, 2021
@sergeivolodin sergeivolodin removed their assignment May 27, 2021
@lfaucon lfaucon closed this as completed Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Back-end code of Tournesol Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants