MAPLE dataset in graph format #2

HoytWen · 2023-03-22T01:22:07Z

Dear MAPLE authors,

Thanks for your amazing work, I feel like this dataset can be transformed into graph form and promote the research of the graph community.
I can't wait to try this dataset for graph learning, and below are one question about this dataset.

It seems each field can be regarded as a sub-graph of the Microsoft Academic Graph. I just tried to transform the papers in each field into a citation graph and find many of their references can not be mapped to the papers within the same field. Does this mean the reference papers in a field may come from other fields? If so, why there are some papers without any reference information?

I am really looking forward to your help to resolve my question.

Best,
Qianlong

yuzhimanhua · 2023-03-22T02:37:03Z

Dear Qianlong,

Thank you very much for your interest in our work!

We agree with your comment that a graph format of MAPLE may increase its usability. We will try to work on that and release it in several weeks. Thanks for the suggestion!

Regarding your question about paper references, for each paper in MAPLE, we include all of its references (represented by IDs) in our dataset. A considerable proportion of these references may not appear as papers in MAPLE (e.g., they are not published in top journals / conferences); some others, as you said, may appear in MAPLE but in a different field. In our paper, the reference ID is used as an input feature to the paper classifier, so we no longer need to know other information about the reference (e.g., text and metadata). However, if you would like to construct a graph, you may need to remove those references not appearing in MAPLE.

Please let us know if you have further questions.

Best,
Yu

HoytWen · 2023-03-22T03:50:14Z

Thanks for your further illustration, I really appreciate it.

Yes, I believe removing the references not appearing in MAPLE is certainly an option, but the constructed graph could also be overly sparse since a large portion of references will be removed (some fields might have 80%~90% unmapped references according to my statistics study). Since MAPLE is constructed from MAG, is there any possibility that we can directly utilize the graph structure in MAG and split it into different sub-graphs (fields) as MAPLE?

Anyway, thanks again for your help, I look forward to you releasing the graph format of MAPLE!

yuzhimanhua · 2023-04-04T14:05:29Z

Hi Qianlong,

We have created a graph format of MAPLE. The data is available at https://zenodo.org/record/7797563.
You can refer to https://github.com/yuzhimanhua/MAPLE/blob/master/README_Graph.md for more details.

We removed the references not appearing in MAPLE to construct the graph. As you mentioned, in some fields (e.g., Art, History), the graph was sparse. We also tried to add all those missing references to the graph (by retrieving their text and metadata from MAG). In this case, the graph certainly became larger, but it did not become denser because the newly added papers brought even more unmapped neighbors.

We agree that directly splitting MAG may solve the problem. Thank you for the suggestion! We will explore it later.

HoytWen · 2023-04-04T21:58:09Z

Thanks for your work and contribution, I really appreciate it!

HoytWen · 2023-04-26T01:24:11Z

Dear MAPLE authors,

I recently did some preliminary experiments on some sub-fields (e.g., CSRankings and Art) of the MAPLE graph dataset and found a interesting phenomenon. In my experiments, I found that MLPs easily outperformed GNNs with the same number of parameters, which was unexpected. Typically, the absence of graph structures results in a 10-40% performance downgrade, but in this dataset, the opposite was observed. This phenomenon suggests that the graph structures used in this task may be detrimental to node classification performance.

Could you please help me resolve my question?

Best,
Qianlong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAPLE dataset in graph format #2

MAPLE dataset in graph format #2

HoytWen commented Mar 22, 2023 •

edited

Loading

yuzhimanhua commented Mar 22, 2023

HoytWen commented Mar 22, 2023

yuzhimanhua commented Apr 4, 2023

HoytWen commented Apr 4, 2023

HoytWen commented Apr 26, 2023

MAPLE dataset in graph format #2

MAPLE dataset in graph format #2

Comments

HoytWen commented Mar 22, 2023 • edited Loading

yuzhimanhua commented Mar 22, 2023

HoytWen commented Mar 22, 2023

yuzhimanhua commented Apr 4, 2023

HoytWen commented Apr 4, 2023

HoytWen commented Apr 26, 2023

HoytWen commented Mar 22, 2023 •

edited

Loading