Queries regarding ChartQA dataset #8

shabbie · 2022-12-22T08:59:02Z

There are a few queries that we have, for which your help is needed.

In section 4.1, you mentioned that gold data tables are not available therefore you set up the extraction mechanism to get the underlying tables. We have downloaded the dataset from (https://drive.google.com/file/d/17-aqtiq_KJ16PIGOp30W0y6OJNax6SVT/view) which has table annotations present. Are these extracted tables referred to as 'Gold Data Tables' in the paper?
If there is a separate set of 'Gold Data Tables' not available from the link mentioned above, can you also share those for reproducibility purposes?
And if the extracted tables are the same as the gold data tables, what are the results implying in Table 5 of the paper? How can TaPas predict answers if the tables itself is not provided?

AhmedMasryKU · 2022-12-23T17:10:23Z

Hi @shabbie
Gold Data Tables refer to the ground truth data tables which we crawled with the chart images from different sources. These gold tables are provided in the dataset (the "tables" folder) in this repo. In our experiments, we considered two scenarios:

We used the gold tables as input to our model. However, the main issue of this setup is that it's not end-2-end. In general, chart images won't have their data tables with them. That's why we also considered the second setup
We automatically extracted the data tables from the chart images using the ChartOCR model, and used these extracted data tables as inputs to our models.

Let me know if you have additional questions.

shabbie · 2022-12-24T10:39:43Z

Thanks for the reply @AhmedMasryKU.

The Gold Data Tables that are present in the tables folder have many extraction issues like all/majority of the numerical values are zero, the column name is wrong/incomplete and the floating points are not correctly detected.

If these are also extracted tables (considering the noise present in the data), what are the gold or noise-free data tables?

AhmedMasryKU · 2022-12-27T20:25:58Z

Hi @shabbie,
Yes, the Pew chart images are not 100% clean due to issues crawling the data. However, the OWID, OECD, and Statista (the majority of the dataset) tables are very clean. For reproducibility, the "Gold Data Table" mentioned in the paper refers to the csv files in the tables folder in the dataset.

Let me know if you have any questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries regarding ChartQA dataset #8

Queries regarding ChartQA dataset #8

shabbie commented Dec 22, 2022

AhmedMasryKU commented Dec 23, 2022

shabbie commented Dec 24, 2022

AhmedMasryKU commented Dec 27, 2022

Queries regarding ChartQA dataset #8

Queries regarding ChartQA dataset #8

Comments

shabbie commented Dec 22, 2022

AhmedMasryKU commented Dec 23, 2022

shabbie commented Dec 24, 2022

AhmedMasryKU commented Dec 27, 2022