You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking on the data from the lite dataset this morning and I noticed something weird in the column 'ai_service_2_confidence' from the keywords.tsv000 file.
when I applied some stats on the columns about ai_service the column 'ai_service_2_confidence' seems to have extreme value that are exceeding 100 that is for me the expected max (if I take the ai_service_1_confidence as reference for exemple)
Steps to reproduce the behavior:
Having a python environment (3.6.13) with pandas 1.1.5 installed
Expected behavior
I am expecting to have a value in the column 'ai_service_2_confidence' in keywords.tsv000 file between 0 and 100 or if it's not the case having a more precise description of the value for the 'ai_service_2_confidence' in the description (like the range)
Additional context
I have a list of the keywords that seems to be impacted by these extreme values unsplash_extreme_value.zip
Hope that it will help on your investigation 🕵️♀️ (and I hope that is not just me that is missing something)
PS: your dataset is great by the way (really hope to have access to the full version soon)👍
The text was updated successfully, but these errors were encountered:
@jeanmidevacc I've looked into it and it looks like you can divide the values that are > 100 by 100.
For example, if you see confidence = 9657.65, the actual confidence in a range 0-100 is 96.5765.
This is obviously an issue in the dataset and I'm adding this fix to the next release that's coming up this week.
Thank you for catching it and for describing the issue the way you did!
Describe the bug
Hello ,
I was looking on the data from the lite dataset this morning and I noticed something weird in the column 'ai_service_2_confidence' from the
keywords.tsv000
file.when I applied some stats on the columns about ai_service the column 'ai_service_2_confidence' seems to have extreme value that are exceeding 100 that is for me the expected max (if I take the
ai_service_1_confidence
as reference for exemple)To Reproduce
There is the code to build the stats
Steps to reproduce the behavior:
Having a python environment (3.6.13) with pandas 1.1.5 installed
Expected behavior
I am expecting to have a value in the column 'ai_service_2_confidence' in
keywords.tsv000
file between 0 and 100 or if it's not the case having a more precise description of the value for the 'ai_service_2_confidence' in the description (like the range)Additional context
I have a list of the keywords that seems to be impacted by these extreme values
unsplash_extreme_value.zip
Hope that it will help on your investigation 🕵️♀️ (and I hope that is not just me that is missing something)
PS: your dataset is great by the way (really hope to have access to the full version soon)👍
The text was updated successfully, but these errors were encountered: