Explanation on the ai_service_2_confidence column in keywords.tsv000 (range seems weird) #39

jeanmidevacc · 2021-07-28T14:11:08Z

Describe the bug
Hello ,

I was looking on the data from the lite dataset this morning and I noticed something weird in the column 'ai_service_2_confidence' from the keywords.tsv000 file.

when I applied some stats on the columns about ai_service the column 'ai_service_2_confidence' seems to have extreme value that are exceeding 100 that is for me the expected max (if I take the ai_service_1_confidence as reference for exemple)

To Reproduce

There is the code to build the stats

import pandas as pd
dfp_keywords_raw = pd.read_csv('keywords.tsv000', sep='\t', header=0)
dfp_keywords_raw[['ai_service_1_confidence', 'ai_service_2_confidence']].describe()

Steps to reproduce the behavior:
Having a python environment (3.6.13) with pandas 1.1.5 installed

Expected behavior
I am expecting to have a value in the column 'ai_service_2_confidence' in keywords.tsv000 file between 0 and 100 or if it's not the case having a more precise description of the value for the 'ai_service_2_confidence' in the description (like the range)

Additional context
I have a list of the keywords that seems to be impacted by these extreme values
unsplash_extreme_value.zip

Hope that it will help on your investigation 🕵️‍♀️ (and I hope that is not just me that is missing something)

PS: your dataset is great by the way (really hope to have access to the full version soon)👍

The text was updated successfully, but these errors were encountered:

TimmyCarbone · 2021-07-29T15:13:58Z

@jeanmidevacc I've looked into it and it looks like you can divide the values that are > 100 by 100.
For example, if you see confidence = 9657.65, the actual confidence in a range 0-100 is 96.5765.

This is obviously an issue in the dataset and I'm adding this fix to the next release that's coming up this week.

Thank you for catching it and for describing the issue the way you did!

jeanmidevacc · 2021-08-04T16:05:29Z

Great thanks for the update (and to have handle quickly the issue)

jeanmidevacc added the bug Something isn't working label Jul 28, 2021

lukechesser assigned TimmyCarbone Jul 28, 2021

TimmyCarbone added this to To do in 1.2.0 Jul 29, 2021

TimmyCarbone mentioned this issue Jul 30, 2021

1.2.0 Release #40

Merged

TimmyCarbone moved this from To do to Done in 1.2.0 Jul 30, 2021

TimmyCarbone closed this as completed in #40 Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation on the ai_service_2_confidence column in keywords.tsv000 (range seems weird) #39

Explanation on the ai_service_2_confidence column in keywords.tsv000 (range seems weird) #39

jeanmidevacc commented Jul 28, 2021 •

edited

Loading

TimmyCarbone commented Jul 29, 2021 •

edited

Loading

jeanmidevacc commented Aug 4, 2021

Explanation on the ai_service_2_confidence column in keywords.tsv000 (range seems weird) #39

Explanation on the ai_service_2_confidence column in keywords.tsv000 (range seems weird) #39

Comments

jeanmidevacc commented Jul 28, 2021 • edited Loading

TimmyCarbone commented Jul 29, 2021 • edited Loading

jeanmidevacc commented Aug 4, 2021

jeanmidevacc commented Jul 28, 2021 •

edited

Loading

TimmyCarbone commented Jul 29, 2021 •

edited

Loading