# ResearchArcade Complete Tutorial

This tutorial demonstrates how to work with the ResearchArcade database, covering all node types and edge relationships.

## Table of Contents
1. [Setup](#Setup)
2. [Node Table Operations](#NodeTableOperations)

    2.1. [openreview_authors](#openreview_authors)

    2.2. [openreview_papers](#openreview_papers)

    2.3. [openreview_reviews](#openreview_reviews)

    2.4. [openreview_revisions](#openreview_revisions)

    2.5. [openreview_paragraphs](#openreview_paragraphs)

    2.6. [arxiv_papers](#arxiv_papers)

    2.7. [arxiv_authors](#arxiv_authors)

    2.8. [arxiv_categories](#arxiv_categories)

    2.9. [arxiv_figures](#arxiv_figures)

    2.10. [arxiv_tables](#arxiv_tables)

    2.11. [arxiv_sections](#arxiv_sections)

    2.12. [arxiv_paragraphs](#arxiv_paragraphs)

3. [Edge Table Operations](#EdgeTableOperations)

    3.1. [openreview_arxiv](#openreview_arxiv)

    3.2. [openreview_papers_authors](#openreview_papers_authors)

    3.3. [openreview_papers_reviews](#openreview_papers_reviews)

    3.4. [openreview_papers_revisions](#openreview_papers_revisions)

    3.5. [openreview_revisions_reviews](#openreview_revisions_reviews)

    3.6. [arxiv_citations](#arxiv_citations)

    3.7. [arxiv_papers_authors](#arxiv_papers_authors)

    3.8. [arxiv_papers_categories](#arxiv_papers_categories)

    3.9. [arxiv_papers_figures](#arxiv_papers_figures)

    3.10. [arxiv_papers_tables](#arxiv_papers_tables)

    3.11. [arxiv_paragraphs_references](#arxiv_paragraphs_references)

    3.12. [arxiv_paragraphs_citations](#arxiv_paragraphs_citations)

4. [Batch Processing](#BatchProcessing)

    4.1 [openreview conference](#batch_openreview_conference)

    4.2 [openreview conference](#batch_arxiv_papers)

5. [Continuous Crawling](#ContinuousCrawling)

    5.1 [arxiv continuous crawling](#arxiv_continuous_crawling)

## Setup

In [3]:
import sys
from tqdm import tqdm
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))
from research_arcade.research_arcade import ResearchArcade



### Choose Database Backend

#### CSV Based

In [4]:
db_type = "csv"
config = {
    "csv_dir": "./csv3",
}

research_arcade = ResearchArcade(db_type=db_type, config=config)

#### SQL Based

In [None]:
db_type = "sql"
config = {
    "host": "localhost",
    "dbname": "DATABASE_NAME",
    "user": "USER_NAME",
    "password": "PASSWORD",
    "port": "5432"
}

research_arcade = ResearchArcade(db_type=db_type, config=config)

## NodeTableOperations

### openreview_authors

#### construct table from api

In [5]:
config = {"venue": "ICLR.cc/2025/Conference"}
research_arcade.construct_table_from_api("openreview_authors", config)

Crawling author data from OpenReview API...


KeyboardInterrupt: 

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_authors.csv"}
research_arcade.construct_table_from_csv("openreview_authors", config)

Reading authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_authors.csv...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 929.30it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_authors.json"}
research_arcade.construct_table_from_json("openreview_authors", config)

Reading authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_authors.json...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 348.01it/s]


#### insert node

In [None]:
new_author = {'venue': 'ICLR.cc/2025/Conference', 
              'author_openreview_id': '~ishmam_zabir1', 
              'author_full_name': 'ishmam zabir', 
              'email': '****@microsoft.com', 
              'affiliation': 'Microsoft', 
              'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 
              'dblp': ''}
research_arcade.insert_node("openreview_authors", node_features=new_author)

('ICLR.cc/2025/Conference', '~ishmam_zabir1')

#### delete specific node by id

In [None]:
author_id = {"author_openreview_id": "~ishmam_zabir1"}
author_features = research_arcade.delete_node_by_id("openreview_authors", author_id)
print(author_features.to_dict(orient="records")[0])

Author with author_openreview_id ~ishmam_zabir1 deleted successfully.
{'venue': 'ICLR.cc/2025/Conference', 'author_openreview_id': '~ishmam_zabir1', 'author_full_name': 'ishmam zabir', 'email': '****@microsoft.com', 'affiliation': 'Microsoft', 'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 'dblp': nan}


#### get all nodes

In [None]:
openreview_authors_df = research_arcade.get_all_node_features("openreview_authors")
print(len(openreview_authors_df))

10


#### get specific node by id

In [None]:
author_id = {"author_openreview_id": "~ishmam_zabir1"}
author_features = research_arcade.get_node_features_by_id("openreview_authors", author_id)
print(author_features.to_dict(orient="records")[0])

{'venue': 'ICLR.cc/2025/Conference', 'author_openreview_id': '~ishmam_zabir1', 'author_full_name': 'ishmam zabir', 'email': '****@microsoft.com', 'affiliation': 'Microsoft', 'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 'dblp': nan}


#### update specific node by id

In [None]:
new_author = {'venue': 'ICLR.cc/2025/Conference', 
              'author_openreview_id': '~ishmam_zabir1', 
              'author_full_name': 'test', 
              'email': '****@microsoft.com', 
              'affiliation': 'Microsoft', 
              'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 
              'dblp': ''}

research_arcade.update_node("openreview_authors", node_features=new_author)
author_id = {"author_openreview_id": "~ishmam_zabir1"}
author_features = research_arcade.get_node_features_by_id("openreview_authors", author_id)
print(author_features.to_dict(orient="records")[0])

Author with author_openreview_id ~ishmam_zabir1 updated successfully.
{'venue': 'ICLR.cc/2025/Conference', 'author_openreview_id': '~ishmam_zabir1', 'author_full_name': 'hii', 'email': '****@microsoft.com', 'affiliation': 'Microsoft', 'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 'dblp': nan}


### openreview_papers

#### construct table from api

In [None]:
config = {"venue": "ICLR.cc/2025/Conference"}
research_arcade.construct_table_from_api("openreview_papers", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers.csv"}
research_arcade.construct_table_from_csv("openreview_papers", config)

Reading paper data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers.csv...
Inserting data into CSV file...


  4%|▎         | 18/508 [00:00<00:05, 88.50it/s]

100%|██████████| 508/508 [00:05<00:00, 96.62it/s] 


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers.json"}
research_arcade.construct_table_from_json("openreview_papers", config)

Reading paper data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers.json...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 77.70it/s]


#### insert node

In [None]:
paper_features = {'venue': 'ICLR.cc/2025/Conference', 
                  'paper_openreview_id': 'zGej22CBnS', 
                  'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 
                  'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.  From this result, we develop a next-byte sampling algorithm  that eliminates tokenization bias without requiring further training or optimization. In other words, this enables zero-shot conversion of tokenized LMs into statistically equivalent token-free ones. We demonstrate its broad applicability with two use cases: fill-in-the-middle (FIM) tasks and model ensembles. In FIM tasks where input prompts may terminate mid-token, leading to out-of-distribution tokenization, our method mitigates performance degradation and achieves 18\\% improvement in FIM coding benchmarks, while consistently outperforming the standard token healing fix. For model ensembles where each model employs a distinct vocabulary, our approach enables seamless integration, resulting in improved performance up to 3.7\\% over individual models across various standard baselines in reasoning, knowledge, and coding. Code is available at:https: //github.com/facebookresearch/Exact-Byte-Level-Probabilities-from-Tokenized-LMs.", 
                  'paper_decision': 'ICLR 2025 Poster', 
                  'paper_pdf_link': '/pdf/cdd2212a20c4034029874cba11a05e081bfdb83e.pdf'}
research_arcade.insert_node("openreview_papers", node_features=paper_features)

('ICLR.cc/2025/Conference', 'zGej22CBnS')

#### delete specific node by id

In [None]:
paper_id = {"paper_openreview_id": "zGej22CBnS"}
paper_features = research_arcade.delete_node_by_id("openreview_papers", paper_id)
print(paper_features.to_dict(orient="records")[0])

Paper with paper_openreview_id zGej22CBnS deleted successfully.
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zGej22CBnS', 'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent b

#### get all nodes

In [None]:
openreview_papers_df = research_arcade.get_all_node_features("openreview_papers")
print(len(openreview_papers_df))

10


#### get specific node by id

In [None]:
paper_id = {"paper_openreview_id": "zGej22CBnS"}
paper_features = research_arcade.get_node_features_by_id("openreview_papers", paper_id)
print(paper_features.to_dict(orient="records")[0])

{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zGej22CBnS', 'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.  From this result, we develop a next-byt

#### update specific node by id

In [None]:
new_paper_features = {'venue': 'ICLR.cc/2025/Conference', 
                  'paper_openreview_id': 'zGej22CBnS', 
                  'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 
                  'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.  From this result, we develop a next-byte sampling algorithm  that eliminates tokenization bias without requiring further training or optimization. In other words, this enables zero-shot conversion of tokenized LMs into statistically equivalent token-free ones. We demonstrate its broad applicability with two use cases: fill-in-the-middle (FIM) tasks and model ensembles. In FIM tasks where input prompts may terminate mid-token, leading to out-of-distribution tokenization, our method mitigates performance degradation and achieves 18\\% improvement in FIM coding benchmarks, while consistently outperforming the standard token healing fix. For model ensembles where each model employs a distinct vocabulary, our approach enables seamless integration, resulting in improved performance up to 3.7\\% over individual models across various standard baselines in reasoning, knowledge, and coding. Code is available at:https: //github.com/facebookresearch/Exact-Byte-Level-Probabilities-from-Tokenized-LMs.", 
                  'paper_decision': 'test', 
                  'paper_pdf_link': '/pdf/cdd2212a20c4034029874cba11a05e081bfdb83e.pdf'}
research_arcade.update_node("openreview_papers", node_features=new_paper_features)
paper_id = {"paper_openreview_id": "zGej22CBnS"}
paper_features = research_arcade.get_node_features_by_id("openreview_papers", paper_id)
print(paper_features.to_dict(orient="records")[0])

Paper with paper_openreview_id zGej22CBnS updated successfully.
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zGej22CBnS', 'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent b

### openreview_reviews

#### construct table from api

In [None]:
config = {"venue": "ICLR.cc/2013/conference"}
research_arcade.construct_table_from_api("openreview_reviews", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_review_example.csv"}
research_arcade.construct_table_from_csv("openreview_reviews", config)

Reading review data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_review_example.csv...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 69.43it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_review_example.json"}
research_arcade.construct_table_from_json("openreview_reviews", config)

Reading review data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_review_example.json...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 139.39it/s]


#### insert node

In [None]:
review_features = {'venue': 'ICLR.cc/2025/Conference', 
                   'review_openreview_id': 'DHwZxFryth', 
                   'replyto_openreview_id': 'Yqbllggrmw', 
                   'writer': 'Authors', 
                   'title': 'Response by Authors', 
                   'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the focus of the AlpacaEval benchmark. Additionally, the preference dataset we used, UltraFeedback, may not provide significant further enhancement in the helpfulness aspect. This is supported by the slight decrease observed in the AlpacaEval score for the standard DPO baseline as well (see Table 3, results on LLama-3). Therefore, we think these AlpacaEval 2.0 results on LLama-3 (8B) may not indicate that SAIL is ineffective; it may be simply caused by an ill-suited combination of base model, finetuning dataset, and evaluation benchmark.We also further conducted experiments on the Zephyr (7B) model as the backbone, whose AlpacaEval 2.0 win-rate is lower. We still train on the UltraFeedback preference dataset and the other experiment setups are unchanged. In this experiment, we see a larger improvement of the SAIL method compared to the standard DPO baseline (Zephyr-7B-Beta).|             | AlpacaEval 2.0 (LC) Win-Rate ||--------------------|------------------------------|| Base (Zephyr-7B-SFT-Full) | 6.4 %                        || DPO (Zephyr-7B-Beta)   | 13.2 %                       || SAIL-PP  | 15.9 %                       |> Authors should compare more advanced preference optimization algorithms like ORPO and SimPO. And current results are not impressive for the alignment community.**Response:** Thank you for raising this insightful point. We see ORPO and SimPO are two recent work which propose a different objective than the standard RLHF, and achieve remarkable improvements in terms of alignment performance and efficiency.Our work focus more on bringing standard RLHF to a bilevel optimization framework and propose an effective and efficient approximate algorithm on top of it. We can see some new preference optimization methods including ORPO and SimPO have one fundamental difference from our approach: they do not explicitly incorporate the KL regularization term. The absence of the KL regularization term allows these methods to optimize more aggressively for the reward function by deviating significantly from the reference model. In contrast, our approach is specifically grounded in the standard RLHF, where the KL regularization term ensures that the model remains aligned with the reference distribution while optimizing for the reward function. This distinction makes direct comparisons with ORPO or SimPO less meaningful theoretically, as those methods omit the KL regularization and adopt a fundamentally different optimization objective design.However, we think our work, although developed adhering to the standard RLHF setup, can be compatible and combined with some recent advanced preference optimization algorithms, despite their differences in optimization setups and objectives. This is because we can reformulate their alignment problem as bilevel optimization, and go through the derivation as done in the paper. Taking SimPO as an example, we can treat their reward model definition (Equation (4) in '
                   'the SimPO paper) as the solution of the upper level optimization (replacing Equation (4) in our manuscript), and adopt their modified Bradley-Terry objective with reward margin (Equation (5) in the SimPO paper) to replace the standard one (Equation (10) in our manuscript). By applying these changes and rederiving the extra gradient terms, we can formulate an adaptation of our method to the SimPO objective. We will implement this combined algorithm, which adapt our methodology to the SimPO objective, and compare with the SimPO as a baseline.Recently many different alignment objectives and algorithms have emerged; it is an interesting question to discuss the compatibility and combination of our method with each objective. We will add more relevant discussions to the appendices, but due to the fact that the compatibility problem with each design is a non-trivial question, this process may incur considerably more work, and we hope the reviewer understands that this effort cannot be fully reflected by the rebuttal period. But we will continue to expand the discussion as the wide compatibility to other designs also strengthens our contribution to the community. We thank the reviewer for raising this insightful point.'}, 
                   'time': '2024-11-26 15:27:26'
}
research_arcade.insert_node("openreview_reviews", node_features=review_features)

('ICLR.cc/2025/Conference', 'DHwZxFryth')

#### delete specific node by id

In [None]:
review_id = {"review_openreview_id": "DHwZxFryth"}
review_features = research_arcade.delete_node_by_id("openreview_reviews", review_id)
print(review_features.to_dict(orient="records")[0])

Review with review_openreview_id DHwZxFryth deleted successfully.
{'venue': 'ICLR.cc/2025/Conference', 'review_openreview_id': 'DHwZxFryth', 'replyto_openreview_id': 'Yqbllggrmw', 'writer': 'Authors', 'title': 'Response by Authors', 'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the f

#### get all nodes

In [None]:
openreview_reviews_df = research_arcade.get_all_node_features("openreview_reviews")
print(len(openreview_reviews_df))

12


#### get specific node by id

In [None]:
review_id = {"review_openreview_id": "DHwZxFryth"}
review_features = research_arcade.get_node_features_by_id("openreview_reviews", review_id)
print(review_features.to_dict(orient="records")[0])

{'venue': 'ICLR.cc/2025/Conference', 'review_openreview_id': 'DHwZxFryth', 'replyto_openreview_id': 'Yqbllggrmw', 'writer': 'Authors', 'title': 'Response by Authors', 'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the focus of the AlpacaEval benchmark. Additionally, the preference dat

#### update specific node by id

In [None]:
new_review_features = {'venue': 'ICLR.cc/2025/Conference', 
                   'review_openreview_id': 'DHwZxFryth', 
                   'replyto_openreview_id': 'Yqbllggrmw', 
                   'writer': 'test', 
                   'title': 'Response by Authors', 
                   'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the focus of the AlpacaEval benchmark. Additionally, the preference dataset we used, UltraFeedback, may not provide significant further enhancement in the helpfulness aspect. This is supported by the slight decrease observed in the AlpacaEval score for the standard DPO baseline as well (see Table 3, results on LLama-3). Therefore, we think these AlpacaEval 2.0 results on LLama-3 (8B) may not indicate that SAIL is ineffective; it may be simply caused by an ill-suited combination of base model, finetuning dataset, and evaluation benchmark.We also further conducted experiments on the Zephyr (7B) model as the backbone, whose AlpacaEval 2.0 win-rate is lower. We still train on the UltraFeedback preference dataset and the other experiment setups are unchanged. In this experiment, we see a larger improvement of the SAIL method compared to the standard DPO baseline (Zephyr-7B-Beta).|             | AlpacaEval 2.0 (LC) Win-Rate ||--------------------|------------------------------|| Base (Zephyr-7B-SFT-Full) | 6.4 %                        || DPO (Zephyr-7B-Beta)   | 13.2 %                       || SAIL-PP  | 15.9 %                       |> Authors should compare more advanced preference optimization algorithms like ORPO and SimPO. And current results are not impressive for the alignment community.**Response:** Thank you for raising this insightful point. We see ORPO and SimPO are two recent work which propose a different objective than the standard RLHF, and achieve remarkable improvements in terms of alignment performance and efficiency.Our work focus more on bringing standard RLHF to a bilevel optimization framework and propose an effective and efficient approximate algorithm on top of it. We can see some new preference optimization methods including ORPO and SimPO have one fundamental difference from our approach: they do not explicitly incorporate the KL regularization term. The absence of the KL regularization term allows these methods to optimize more aggressively for the reward function by deviating significantly from the reference model. In contrast, our approach is specifically grounded in the standard RLHF, where the KL regularization term ensures that the model remains aligned with the reference distribution while optimizing for the reward function. This distinction makes direct comparisons with ORPO or SimPO less meaningful theoretically, as those methods omit the KL regularization and adopt a fundamentally different optimization objective design.However, we think our work, although developed adhering to the standard RLHF setup, can be compatible and combined with some recent advanced preference optimization algorithms, despite their differences in optimization setups and objectives. This is because we can reformulate their alignment problem as bilevel optimization, and go through the derivation as done in the paper. Taking SimPO as an example, we can treat their reward model definition (Equation (4) in the SimPO paper) as the solution of the upper level optimization (replacing Equation (4) in our manuscript), and adopt their modified Bradley-Terry objective with reward margin (Equation (5) in the SimPO paper) to replace the standard one (Equation (10) in our manuscript). By applying these changes and rederiving the extra gradient terms, we can formulate an adaptation of our method to the SimPO objective. We will implement this combined algorithm, which adapt our methodology to the SimPO objective, and compare with the SimPO as a baseline.Recently many different alignment objectives and algorithms have emerged; it is an interesting question to discuss the compatibility and combination of our method with each objective. We will add more relevant discussions to the appendices, but due to the fact that the compatibility problem with each design is a non-trivial question, this process may incur considerably more work, and we hope the reviewer understands that this effort cannot be fully reflected by the rebuttal period. But we will continue to expand the discussion as the wide compatibility to other designs also strengthens our contribution to the community. We thank the reviewer for raising this insightful point.'}, 
                   'time': '2024-11-26 15:27:26'
}
research_arcade.update_node("openreview_reviews", node_features=new_review_features)
review_id = {"review_openreview_id": "DHwZxFryth"}
review_features = research_arcade.get_node_features_by_id("openreview_reviews", review_id)
print(review_features.to_dict(orient="records")[0])

Review with review_openreview_id DHwZxFryth updated successfully.
{'venue': 'ICLR.cc/2025/Conference', 'review_openreview_id': 'DHwZxFryth', 'replyto_openreview_id': 'Yqbllggrmw', 'writer': 'test', 'title': 'Response by Authors', 'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the focu

### openreview_revisions

#### construct table from api

##### get pdfs

In [None]:
import requests
import os
import openreview
import time

def get_paper_pdf(link, pdf_path, log_file):
    pdf_url = "https://openreview.net"+link
    
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    try:
        response = requests.get(pdf_url, headers=headers, timeout=15)
        if response.status_code == 200:
            with open(pdf_path, "wb") as f:
                f.write(response.content)
            print(f"✅ PDF downloaded: {pdf_path}")
        else:
            print(f"❌ Download failed ({response.status_code}) for ID: {id}")
            with open(log_file, "a") as log:
                log.write(f"{link}\n")
    except Exception as e:
        print(f"❌ Exception for ID {link}: {e}")
        with open(log_file, "a") as log:
            log.write(f"{link}\n")
            
def get_revision_pdf(venue, id, pdf_path, log_file):
    if "2024" in venue or "2025" in venue:
        pdf_url = "https://openreview.net/notes/edits/attachment?id="+id+"&name=pdf"
    elif  "EMNLP" in venue:
        pdf_url = "https://openreview.net/attachment?id="+id+"&name=pdf"
    elif "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
        pdf_url = "https://openreview.net/references/pdf?id="+id
    
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    try:
        response = requests.get(pdf_url, headers=headers, timeout=15)
        if response.status_code == 200:
            with open(pdf_path, "wb") as f:
                f.write(response.content)
            print(f"✅ PDF downloaded: {pdf_path}")
        else:
            print(f"❌ Download failed ({response.status_code}) for ID: {id}")
            with open(log_file, "a") as log:
                log.write(f"{id}\n")
    except Exception as e:
        print(f"❌ Exception for ID {id}: {e}")
        with open(log_file, "a") as log:
            log.write(f"{id}\n")

client_v1 = openreview.Client(baseurl='https://api.openreview.net')
client_v2 = openreview.api.OpenReviewClient(baseurl='https://api2.openreview.net')

venue = 'ICLR.cc/2017/conference'
pdf_dir = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/"
log_file = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/log/download_failed_ids_2017.log"
start_idx = 0
end_idx = 5

In [None]:
if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
    if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/Blind_Submission', details='revisions')
    elif "2017" in venue or "2014" in venue or "2013" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/submission', details='revisions')
        
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            # get paper openreview id
            paper_id = submission.id
            if "pdf" in submission.content:
                pdf_link = submission.content["pdf"]
                pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                if os.path.isfile(pdf_path):
                    continue
                else:
                    get_paper_pdf(pdf_link, pdf_path, log_file)
            
            revisions = client_v1.get_references(referent=paper_id, original=True)
            time.sleep(1)
            
            pdf_revisions_ids = []
            for revision in revisions:
                if "pdf" in revision.content:
                    pdf_revisions_ids.append(revision.id)
            
            if len(pdf_revisions_ids) <= 1:
                continue
            else:
                for pdf_revision_id in pdf_revisions_ids:
                    pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                        time.sleep(1)
else:
    submissions = client_v2.get_all_notes(invitation=f'{venue}/-/Submission', details='revisions')
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            decision = submission.content["venueid"]["value"].split('/')[-1]
            if decision == "Withdrawn_Submission":
                continue
            else:
                # get paper openreview id
                paper_id = submission.id
                if "pdf" in submission.content:
                    pdf_link = submission.content["pdf"]["value"]
                    pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_paper_pdf(pdf_link, pdf_path, log_file)
                        
                revisions = client_v2.get_note_edits(note_id=paper_id)
                if len(revisions) <= 1:
                    continue
                else:
                    for revision in revisions:
                        pdf_revision_id = revision.id
                        pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                        if os.path.isfile(pdf_path):
                            continue
                        else:
                            time.sleep(1)
                            get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                            time.sleep(1)

  0%|          | 0/5 [00:00<?, ?it/s]

✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/ryxB0Rtxx.pdf


 20%|██        | 1/5 [00:01<00:05,  1.27s/it]

✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/rywUcQogx.pdf


 40%|████      | 2/5 [00:02<00:03,  1.32s/it]

✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/ryuxYmvel.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/BJRNV4wue.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/S1PHr1IIx.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/SJdZSdr7e.pdf


 60%|██████    | 3/5 [00:07<00:06,  3.05s/it]

✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/ryrGawqex.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/HkzJI95Yl.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/S1nIl5DOx.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/BJKAta8Ue.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/r1Z7SpDWx.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/SyvPXTwWx.pdf
✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/S1SO2N8Wx.pdf


 80%|████████  | 4/5 [00:16<00:05,  5.27s/it]

✅ PDF downloaded: /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/ryjp1c9xg.pdf


100%|██████████| 5/5 [00:17<00:00,  3.55s/it]


#### construct the table

In [None]:
venue = "ICLR.cc/2017/conference"
filter_list = ["Under review as a conference paper at ICLR 2017", "Published as a conference paper at ICLR 2017"]
pdf_dir = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/"
log_file = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/log/failed_ids_2017.log"
config = {"venue": venue, "filter_list": filter_list, "pdf_dir": pdf_dir, "log_file": log_file}
research_arcade.construct_table_from_api("openreview_revisions", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_revision_example.csv"}
research_arcade.construct_table_from_csv("openreview_revisions", config)

Reading revisions data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_revision_example.csv...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 63.16it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_revision_example.json"}
research_arcade.construct_table_from_json("openreview_revisions", config)

Reading revisions data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_revision_example.json...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 46.79it/s]


#### insert node

In [None]:
revision_feature = {'venue': 'ICLR.cc/2025/Conference', 
                    'original_openreview_id': 'pbTVNlX8Ig', 
                    'revision_openreview_id': 'yfHQOp5zWc', 
                    'content': [{'section': '1 INTRODUCTION', 
                                 'after_section': None, 
                                 'context_after': '2 RELATED WORK ', 
                                 'paragraph_idx': 9, 
                                 'before_section': None, 
                                 'context_before': 'Published as a conference paper at ICLR 2025 tograd system in PyTorch, specifically tailored for our experimental setup, which is available at ', 
                                 'modified_lines': 'https://github.com/stephane-rivaud/PETRA. ', 
                                 'original_lines': 'https://github.com/streethagore/PETRA. ', 
                                 'after_paragraph_idx': None, 
                                 'before_paragraph_idx': None}], 
                    'time': '2025-03-14 15:35:37'}
research_arcade.insert_node("openreview_revisions", node_features=revision_feature)

('ICLR.cc/2025/Conference', 'yfHQOp5zWc')

#### delete specific node by id

In [None]:
revision_id = {"revision_openreview_id": "yfHQOp5zWc"}
revision_feature = research_arcade.delete_node_by_id("openreview_revisions", revision_id)
print(revision_feature.to_dict(orient="records")[0])

Revision with revision_openreview_id yfHQOp5zWc deleted successfully.
{'venue': 'ICLR.cc/2025/Conference', 'original_openreview_id': 'pbTVNlX8Ig', 'revision_openreview_id': 'yfHQOp5zWc', 'content': [{'section': '1 INTRODUCTION', 'after_section': None, 'context_after': '2 RELATED WORK ', 'paragraph_idx': 9, 'before_section': None, 'context_before': 'Published as a conference paper at ICLR 2025 tograd system in PyTorch, specifically tailored for our experimental setup, which is available at ', 'modified_lines': 'https://github.com/stephane-rivaud/PETRA. ', 'original_lines': 'https://github.com/streethagore/PETRA. ', 'after_paragraph_idx': None, 'before_paragraph_idx': None}], 'time': '2025-03-14 15:35:37'}


#### get all nodes

In [None]:
openreview_revisions_df = research_arcade.get_all_node_features("openreview_revisions")
print(len(openreview_revisions_df))

11


#### get specific node by id

In [None]:
revision_id = {"revision_openreview_id": "yfHQOp5zWc"}
revision_feature = research_arcade.get_node_features_by_id("openreview_revisions", revision_id)
print(revision_feature.to_dict(orient="records")[0])

{'venue': 'ICLR.cc/2025/Conference', 'original_openreview_id': 'pbTVNlX8Ig', 'revision_openreview_id': 'yfHQOp5zWc', 'content': [{'section': '1 INTRODUCTION', 'after_section': None, 'context_after': '2 RELATED WORK ', 'paragraph_idx': 9, 'before_section': None, 'context_before': 'Published as a conference paper at ICLR 2025 tograd system in PyTorch, specifically tailored for our experimental setup, which is available at ', 'modified_lines': 'https://github.com/stephane-rivaud/PETRA. ', 'original_lines': 'https://github.com/streethagore/PETRA. ', 'after_paragraph_idx': None, 'before_paragraph_idx': None}], 'time': '2025-03-14 15:35:37'}


#### update specific node by id

In [None]:
new_revision_features = {'venue': 'ICLR.cc/2025/Conference', 
                    'original_openreview_id': 'pbTVNlX8Ig', 
                    'revision_openreview_id': 'yfHQOp5zWc', 
                    'content': [{'section': '1 INTRODUCTION', 
                                 'after_section': None, 
                                 'context_after': '2 RELATED WORK ', 
                                 'paragraph_idx': 9, 
                                 'before_section': None, 
                                 'context_before': 'Published as a conference paper at ICLR 2025 tograd system in PyTorch, specifically tailored for our experimental setup, which is available at ', 
                                 'modified_lines': 'https://github.com/stephane-rivaud/PETRA. ', 
                                 'original_lines': 'https://github.com/streethagore/PETRA. ', 
                                 'after_paragraph_idx': None, 
                                 'before_paragraph_idx': None}], 
                    'time': 'test'}
research_arcade.update_node("openreview_revisions", node_features=new_revision_features)
revision_id = {"revision_openreview_id": "yfHQOp5zWc"}
revision_feature = research_arcade.get_node_features_by_id("openreview_revisions", revision_id)
print(revision_feature.to_dict(orient="records")[0])

Revision with revision_openreview_id yfHQOp5zWc updated successfully.
{'venue': 'ICLR.cc/2025/Conference', 'original_openreview_id': 'pbTVNlX8Ig', 'revision_openreview_id': 'yfHQOp5zWc', 'content': [{'section': '1 INTRODUCTION', 'after_section': None, 'context_after': '2 RELATED WORK ', 'paragraph_idx': 9, 'before_section': None, 'context_before': 'Published as a conference paper at ICLR 2025 tograd system in PyTorch, specifically tailored for our experimental setup, which is available at ', 'modified_lines': 'https://github.com/stephane-rivaud/PETRA. ', 'original_lines': 'https://github.com/streethagore/PETRA. ', 'after_paragraph_idx': None, 'before_paragraph_idx': None}], 'time': 'test'}


### openreview_paragraphs

#### construct table from api

##### get pdfs

In [None]:
import requests
import os
import openreview
import time

def get_paper_pdf(link, pdf_path, log_file):
    pdf_url = "https://openreview.net"+link
    
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    try:
        response = requests.get(pdf_url, headers=headers, timeout=15)
        if response.status_code == 200:
            with open(pdf_path, "wb") as f:
                f.write(response.content)
            print(f"✅ PDF downloaded: {pdf_path}")
        else:
            print(f"❌ Download failed ({response.status_code}) for ID: {id}")
            with open(log_file, "a") as log:
                log.write(f"{link}\n")
    except Exception as e:
        print(f"❌ Exception for ID {link}: {e}")
        with open(log_file, "a") as log:
            log.write(f"{link}\n")
            
def get_revision_pdf(venue, id, pdf_path, log_file):
    if "2024" in venue or "2025" in venue:
        pdf_url = "https://openreview.net/notes/edits/attachment?id="+id+"&name=pdf"
    elif  "EMNLP" in venue:
        pdf_url = "https://openreview.net/attachment?id="+id+"&name=pdf"
    elif "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
        pdf_url = "https://openreview.net/references/pdf?id="+id
    
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    try:
        response = requests.get(pdf_url, headers=headers, timeout=15)
        if response.status_code == 200:
            with open(pdf_path, "wb") as f:
                f.write(response.content)
            print(f"✅ PDF downloaded: {pdf_path}")
        else:
            print(f"❌ Download failed ({response.status_code}) for ID: {id}")
            with open(log_file, "a") as log:
                log.write(f"{id}\n")
    except Exception as e:
        print(f"❌ Exception for ID {id}: {e}")
        with open(log_file, "a") as log:
            log.write(f"{id}\n")

client_v1 = openreview.Client(baseurl='https://api.openreview.net')
client_v2 = openreview.api.OpenReviewClient(baseurl='https://api2.openreview.net')

venue = 'ICLR.cc/2017/conference'
pdf_dir = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/"
log_file = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/log/download_failed_ids_2017.log"
start_idx = 0
end_idx = 5

In [None]:
if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
    if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/Blind_Submission', details='revisions')
    elif "2017" in venue or "2014" in venue or "2013" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/submission', details='revisions')
        
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            # get paper openreview id
            paper_id = submission.id
            if "pdf" in submission.content:
                pdf_link = submission.content["pdf"]
                pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                if os.path.isfile(pdf_path):
                    continue
                else:
                    get_paper_pdf(pdf_link, pdf_path, log_file)
            
            revisions = client_v1.get_references(referent=paper_id, original=True)
            time.sleep(1)
            
            pdf_revisions_ids = []
            for revision in revisions:
                if "pdf" in revision.content:
                    pdf_revisions_ids.append(revision.id)
            
            if len(pdf_revisions_ids) <= 1:
                continue
            else:
                for pdf_revision_id in pdf_revisions_ids:
                    pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                        time.sleep(1)
else:
    submissions = client_v2.get_all_notes(invitation=f'{venue}/-/Submission', details='revisions')
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            decision = submission.content["venueid"]["value"].split('/')[-1]
            if decision == "Withdrawn_Submission":
                continue
            else:
                # get paper openreview id
                paper_id = submission.id
                if "pdf" in submission.content:
                    pdf_link = submission.content["pdf"]["value"]
                    pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_paper_pdf(pdf_link, pdf_path, log_file)
                        
                revisions = client_v2.get_note_edits(note_id=paper_id)
                if len(revisions) <= 1:
                    continue
                else:
                    for revision in revisions:
                        pdf_revision_id = revision.id
                        pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                        if os.path.isfile(pdf_path):
                            continue
                        else:
                            time.sleep(1)
                            get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                            time.sleep(1)

##### construct the table

In [None]:
venue = "ICLR.cc/2017/conference"
filter_list = ["Under review as a conference paper at ICLR 2017", "Published as a conference paper at ICLR 2017"]
pdf_dir = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/pdfs/"
log_file = "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/log/failed_ids_2017.log"
config = {"venue": venue, "filter_list": filter_list, "pdf_dir": pdf_dir, "log_file": log_file, "is_paper": True, "is_revision": True, "is_pdf_delete": False}
research_arcade.construct_table_from_api("openreview_paragraphs", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_paragraphs_example.csv"}
research_arcade.construct_table_from_csv("openreview_paragraphs", config)

Reading paragraph data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/csv_data/csv_openreview_paragraphs_example.csv...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 107.64it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_paragraphs_example.json"}
research_arcade.construct_table_from_json("openreview_paragraphs", config)

Reading paper data from /home/jingjunx/openreview_benchmark/Code/paper-crawler/examples/json_data/json_openreview_paragraphs_example.json...
Inserting data into CSV file...


100%|██████████| 1/1 [00:00<00:00, 183.69it/s]


#### insert node

In [None]:
paragraph_feature = {'venue': 'xujj_test', 
                    'paper_openreview_id': 'xujj_test', 
                    'paragraph_idx': 1, 
                    'section': "xujj_test", 
                    'content': "xujj_test"}
research_arcade.insert_node("openreview_paragraphs", node_features=paragraph_feature)

('xujj_test', 'xujj_test', 1)

#### delete specific node by id

In [None]:
paper_id = {"paper_openreview_id": "xujj_test"}
paragraph_feature = research_arcade.delete_node_by_id("openreview_paragraphs", paper_id)
print(len(paragraph_feature))
print(paragraph_feature.to_dict(orient="records")[0])

Paragraphs with paper_openreview_id xujj_test deleted successfully.
1
{'venue': 'xujj_test', 'paper_openreview_id': 'xujj_test', 'paragraph_idx': 1, 'section': 'xujj_test', 'content': 'xujj_test'}


#### get all nodes

In [None]:
openreview_paragraphs_df = research_arcade.get_all_node_features("openreview_paragraphs")
print(len(openreview_paragraphs_df))

10


#### get specific node by id

In [None]:
paper_id = {"paper_openreview_id": "xujj_test"}
paragraph_feature = research_arcade.get_node_features_by_id("openreview_paragraphs", paper_id)
print(paragraph_feature.to_dict(orient="records")[0])

{'venue': 'xujj_test', 'paper_openreview_id': 'xujj_test', 'paragraph_idx': 1, 'section': 'xujj_test', 'content': 'xujj_test'}


### arxiv_papers

#### Table Schema

- `id` (SERIAL PK)
- `arxiv_id` (VARCHAR, unique) - e.g., 1802.08773v3
- `base_arxiv_id` (VARCHAR) - e.g., 1802.08773
- `version` (INT) - e.g., 3
- `title` (TEXT)
- `abstract` (TEXT)
- `submit_date` (DATE)
- `metadata` (JSONB)

#### Construct Table from API

In [6]:
config = {"arxiv_ids": ["1806.08804v4", "1903.03894v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_papers", config)

Downloaded
Downloaded


#### Construct Table from CSV

In [7]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_papers.csv"}
research_arcade.construct_table_from_csv("arxiv_papers", config)

Error: CSV file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_papers.csv does not exist.


#### Construct Table from JSON

In [8]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_papers.json"}
research_arcade.construct_table_from_json("arxiv_papers", config)

Error: JSON file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_papers.json does not exist.


#### Insert a Paper

In [9]:
# Example 1: Insert the famous "Attention is All You Need" paper
new_paper = {
    'arxiv_id': '1706.03762v7',
    'base_arxiv_id': '1706.03762',
    'version': 7,
    'title': 'Attention Is All You Need',
    'abstract': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.',
    'submit_date': '2017-06-12',
    'metadata': {'venue': 'NeurIPS 2017', 'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf'}
}

research_arcade.insert_node("arxiv_papers", node_features=new_paper)
print("Paper inserted successfully!")

Paper inserted successfully!


In [10]:
# Example 2: Insert BERT paper
bert_paper = {
    'arxiv_id': '1810.04805v2',
    'base_arxiv_id': '1810.04805',
    'version': 2,
    'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding',
    'abstract': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.',
    'submit_date': '2018-10-11',
    'metadata': {'venue': 'NAACL 2019', 'citations': 50000}
}

research_arcade.insert_node("arxiv_papers", node_features=bert_paper)
print("BERT paper inserted successfully!")

BERT paper inserted successfully!


#### Get All Papers

In [11]:
arxiv_papers_df = research_arcade.get_all_node_features("arxiv_papers")
print(f"Total papers in database: {len(arxiv_papers_df)}")
print("\nFirst 5 papers:")
print(arxiv_papers_df.head())

Total papers in database: 4

First 5 papers:
   id      arxiv_id base_arxiv_id  version  \
0   1  1806.08804v4    1806.08804        4   
1   2  1903.03894v4    1903.03894        4   
2   4  1810.04805v2    1810.04805        2   
3   5  1706.03762v7    1706.03762        7   

                                               title  \
0  Hierarchical Graph Representation Learning wit...   
1  GNNExplainer: Generating Explanations for Grap...   
2  BERT: Pre-training of Deep Bidirectional Trans...   
3                          Attention Is All You Need   

                                            abstract  \
0  Recently, graph neural networks (GNNs) have re...   
1  Graph Neural Networks (GNNs) are a powerful to...   
2  We introduce a new language representation mod...   
3  The dominant sequence transduction models are ...   

                 submit_date  \
0  2018-06-22 18:04:46+00:00   
1  2019-03-10 00:56:26+00:00   
2                 2018-10-11   
3                 2017-06-12   

 

#### Get Specific Paper by ID

In [12]:
paper_id = {"arxiv_id": "1810.04805v2"}
paper_features = research_arcade.get_node_features_by_id("arxiv_papers", paper_id)
print("Paper details:")
print(paper_features.to_dict(orient="records")[0])

Paper details:
{'id': 4, 'arxiv_id': '1810.04805v2', 'base_arxiv_id': '1810.04805', 'version': 2, 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'abstract': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.', 'submit_date': '2018-10-11', 'metadata': '{"venue": "NAACL 2019", "citations": 50000}'}


#### Update a Paper

In [13]:
# Update metadata for a paper
updated_paper = {
    'arxiv_id': '1706.03762v7',
    'metadata': {
        'venue': 'NeurIPS 2017',
        'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf',
        'citations': 75000,
        'influential': True
    }
}

research_arcade.update_node("arxiv_papers", node_features=updated_paper)
print("Paper updated successfully!")

Paper updated successfully!


#### Delete a Paper

In [14]:
# Delete a paper by ID
paper_id = {"arxiv_id": "1706.03762v7"}
deleted_paper = research_arcade.delete_node_by_id("arxiv_papers", paper_id)
print("Deleted paper:")
print(deleted_paper)

Deleted paper:
True


### arxiv_authors

#### Table Schema

- `id` (SERIAL PK)
- `semantic_scholar_id` (VARCHAR, unique)
- `name` (VARCHAR)
- `homepage` 

#### Construct Table from API

In [15]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_authors", config)

base_arxiv_id: 1903.03894
INFO:httpx:HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/ARXIV:1903.03894?fields=abstract%2Cauthors%2Cauthors.affiliations%2Cauthors.authorId%2Cauthors.citationCount%2Cauthors.externalIds%2Cauthors.hIndex%2Cauthors.homepage%2Cauthors.name%2Cauthors.paperCount%2Cauthors.url%2CcitationCount%2CcitationStyles%2Ccitations%2Ccitations.abstract%2Ccitations.authors%2Ccitations.citationCount%2Ccitations.citationStyles%2Ccitations.corpusId%2Ccitations.externalIds%2Ccitations.fieldsOfStudy%2Ccitations.influentialCitationCount%2Ccitations.isOpenAccess%2Ccitations.journal%2Ccitations.openAccessPdf%2Ccitations.paperId%2Ccitations.publicationDate%2Ccitations.publicationTypes%2Ccitations.publicationVenue%2Ccitations.referenceCount%2Ccitations.s2FieldsOfStudy%2Ccitations.title%2Ccitations.url%2Ccitations.venue%2Ccitations.year%2CcorpusId%2Cembedding%2CexternalIds%2CfieldsOfStudy%2CinfluentialCitationCount%2CisOpenAccess%2Cjournal%2CopenAccessPdf%2CpaperId%2C

KeyboardInterrupt: 

INFO:httpx:HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/ARXIV:1903.03894?fields=abstract%2Cauthors%2Cauthors.affiliations%2Cauthors.authorId%2Cauthors.citationCount%2Cauthors.externalIds%2Cauthors.hIndex%2Cauthors.homepage%2Cauthors.name%2Cauthors.paperCount%2Cauthors.url%2CcitationCount%2CcitationStyles%2Ccitations%2Ccitations.abstract%2Ccitations.authors%2Ccitations.citationCount%2Ccitations.citationStyles%2Ccitations.corpusId%2Ccitations.externalIds%2Ccitations.fieldsOfStudy%2Ccitations.influentialCitationCount%2Ccitations.isOpenAccess%2Ccitations.journal%2Ccitations.openAccessPdf%2Ccitations.paperId%2Ccitations.publicationDate%2Ccitations.publicationTypes%2Ccitations.publicationVenue%2Ccitations.referenceCount%2Ccitations.s2FieldsOfStudy%2Ccitations.title%2Ccitations.url%2Ccitations.venue%2Ccitations.year%2CcorpusId%2Cembedding%2CexternalIds%2CfieldsOfStudy%2CinfluentialCitationCount%2CisOpenAccess%2Cjournal%2CopenAccessPdf%2CpaperId%2CpublicationDate%2Cpublicat

#### Construct Table from CSV

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_authors.csv"}
research_arcade.construct_table_from_csv("arxiv_authors", config)

Error: CSV file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_authors.csv does not exist.


#### Construct Table from JSON

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_authors.json"}
research_arcade.construct_table_from_json("arxiv_authors", config)

Error: JSON file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_authors.json does not exist.


#### Insert Authors

In [16]:
# Insert authors from the Transformer paper
authors = [
    {
        'semantic_scholar_id': 'ss_ashish_vaswani',
        'name': 'Ashish Vaswani',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_noam_shazeer',
        'name': 'Noam Shazeer',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_niki_parmar',
        'name': 'Niki Parmar',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_jakob_uszkoreit',
        'name': 'Jakob Uszkoreit',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_llion_jones',
        'name': 'Llion Jones',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    }
]

for author in authors:
    research_arcade.insert_node("arxiv_authors", node_features=author)
    print(f"Inserted author: {author['name']}")

Inserted author: Ashish Vaswani
Inserted author: Noam Shazeer
Inserted author: Niki Parmar
Inserted author: Jakob Uszkoreit
Inserted author: Llion Jones


#### Get All Authors

In [17]:
authors_df = research_arcade.get_all_node_features("arxiv_authors")
print(f"Total authors in database: {len(authors_df)}")
print("\nAll authors:")
print(authors_df)

Total authors in database: 16

All authors:
    id semantic_scholar_id                 name  \
0    1            83539859             Rex Ying   
1    2            40974349      Dylan Bourgeois   
2    3           145829303          Jiaxuan You   
3    4             2095762            M. Zitnik   
4    5             1702139          J. Leskovec   
5    6            83539859             Rex Ying   
6    7           145829303          Jiaxuan You   
7    8           143622465   Christopher Morris   
8    9           145201124            Xiang Ren   
9   10            49437682  William L. Hamilton   
10  11             1702139          J. Leskovec   
11  12   ss_ashish_vaswani       Ashish Vaswani   
12  13     ss_noam_shazeer         Noam Shazeer   
13  14      ss_niki_parmar          Niki Parmar   
14  15  ss_jakob_uszkoreit      Jakob Uszkoreit   
15  16      ss_llion_jones          Llion Jones   

                                             homepage  
0     https://www.semanticschola

#### Get Specific Author by ID

In [18]:
author_id = {"semantic_scholar_id": "2375099373"}
author_features = research_arcade.get_node_features_by_id("arxiv_authors", author_id)
print("Author details:")
print(author_features)

Author details:
None


#### Update an Author

In [19]:
updated_author = {
    'semantic_scholar_id': 'ss_ashish_vaswani',
    'homepage': 'https://ashishvaswani.com'
}

research_arcade.update_node("arxiv_authors", node_features=updated_author)
print("Author updated successfully!")

Author updated successfully!


### arxiv_categories

#### Table Schema
- `id` (SERIAL PK)
- `name` (VARCHAR, unique)
- `description` (TEXT)

#### Insert From API

In [20]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_categories", config)

{'id': '1903.03894v4', 'title': 'GNNExplainer: Generating Explanations for Graph Neural Networks', 'abstract': "Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs.GNNs combine node feature information with the graph structure by recursively passing neural messages along edges of the input graph. However, incorporating both graph structure and feature information leads to complex models, and explaining predictions made by GNNs remains unsolved. Here we propose GNNExplainer, the first general, model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task. Given an instance, GNNExplainer identifies a compact subgraph structure and a small subset of node features that have a crucial role in GNN's prediction. Further, GNNExplainer can generate consistent and concise explanations for an entire class of instances. We formulate GNNExplainer as an optimization task that maximizes the 

#### Construct Table from CSV

In [21]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_categories.csv"}
research_arcade.construct_table_from_csv("arxiv_categories", config)

Error: CSV file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/arxiv_categories.csv does not exist.


#### Construct Table from JSON

In [22]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_categories.json"}
research_arcade.construct_table_from_json("arxiv_categories", config)

Error: JSON file /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/arxiv_categories.json does not exist.


#### Insert Categories

In [23]:
categories = [
    {
        'name': 'cs.CL',
        'description': 'Computation and Language (Natural Language Processing)'
    },
    {
        'name': 'cs.LG',
        'description': 'Machine Learning'
    },
    {
        'name': 'cs.AI',
        'description': 'Artificial Intelligence'
    },
    {
        'name': 'cs.CV',
        'description': 'Computer Vision and Pattern Recognition'
    },
    {
        'name': 'stat.ML',
        'description': 'Machine Learning (Statistics)'
    }
]

for category in categories:
    research_arcade.insert_node("arxiv_categories", node_features=category)
    print(f"Inserted category: {category['name']}")

Inserted category: cs.CL
Inserted category: cs.LG
Inserted category: cs.AI
Inserted category: cs.CV
Inserted category: stat.ML


#### Get All Categories

In [24]:
categories_df = research_arcade.get_all_node_features("arxiv_categories")
print(f"Total categories: {len(categories_df)}")
print("\nAll categories:")
print(categories_df)

Total categories: 7

All categories:
   id     name                                        description
0   1    cs.LG                                                NaN
1   2  stat.ML                                                NaN
2   3    cs.NE                                                NaN
3   4    cs.SI                                                NaN
4   5    cs.CL  Computation and Language (Natural Language Pro...
5   6    cs.AI                            Artificial Intelligence
6   7    cs.CV            Computer Vision and Pattern Recognition


### arxiv_figures

#### Table Schema

- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `name` (TEXT)

#### Construct Table from CSV

#### Construct Table from JSON

#### Insert From API

In [25]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_figures", config)

#### Insert Figures

In [26]:
# Insert figures for the Transformer paper
figures = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/transformer_architecture.png',
        'caption': 'The Transformer model architecture. The left side shows the encoder stack and the right side shows the decoder stack.',
        'label': 'fig:architecture',
        'name': 'Figure 1'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/scaled_dot_product_attention.png',
        'caption': 'Scaled Dot-Product Attention and Multi-Head Attention mechanisms.',
        'label': 'fig:attention',
        'name': 'Figure 2'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/positional_encoding.png',
        'caption': 'Positional encoding visualization showing sine and cosine functions of different frequencies.',
        'label': 'fig:positional',
        'name': 'Figure 3'
    }
]

for figure in figures:
    research_arcade.insert_node("arxiv_figures", node_features=figure)
    print(f"Inserted {figure['name']}")

Inserted Figure 1
Inserted Figure 2
Inserted Figure 3


#### Get All Figures

In [27]:
figures_df = research_arcade.get_all_node_features("arxiv_figures")
print(f"Total figures: {len(figures_df)}")
print("\nAll figures:")
print(figures_df[['name', 'caption', 'label']])

Total figures: 68

All figures:
   name                                            caption  \
0   NaN                             Updated figure caption   
1   NaN  \caption{\textbf{A.} \gnn computation graph $G...   
2   NaN  \caption{For $v$'s explanation $G_S$ (in green...   
3   NaN  \caption{For $v$'s explanation $G_S$ (in green...   
4   NaN  \caption{Evaluation of single-instance explana...   
..  ...                                                ...   
63  NaN                                         \caption{}   
64  NaN                                         \caption{}   
65  NaN                                         \caption{}   
66  NaN                                         \caption{}   
67  NaN                                         \caption{}   

                                   label  
0            \label{fig:explainer-intro}  
1   \label{fig:definition-node-features}  
2    \label{fig:including-node-features}  
3    \label{fig:including-node-features}  
4       

#### delete specific node by id

In [28]:
figure_id = {"id": 1}
research_arcade.delete_node_by_id("arxiv_figures", figure_id)

False

#### update specific node by id

In [29]:
updated_figure = {
    'id': 2,
    'paper_arxiv_id': 1453.1644,
    'path': 'path',
    'caption': 'Updated figure caption'
}
research_arcade.update_node("arxiv_figures", node_features=updated_figure)

True

#### get specific node by id

In [30]:
figure_id = {"id": 2}
figure = research_arcade.get_node_features_by_id("arxiv_figures", figure_id)
print(figure.to_dict(orient="records")[0])

{'id': 2, 'paper_arxiv_id': '1453.1644', 'path': 'path', 'caption': 'Updated figure caption', 'label': '\\label{fig:explainer-intro}', 'name': nan}


### arxiv_tables

#### Table Schema

- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `table_text` (TEXT)

#### Insert From API

In [31]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_tables", config)

#### Construct Table from CSV

In [32]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_tables_example.csv"}
research_arcade.construct_table_from_csv("arxiv_tables", config)

Error: CSV file ./examples/csv_data/csv_arxiv_tables_example.csv does not exist.


#### Construct Table from JSON

In [33]:
config = {"json_file": "./examples/json_data/json_arxiv_tables_example.json"}
research_arcade.construct_table_from_json("arxiv_tables", config)

Error: JSON file ./examples/json_data/json_arxiv_tables_example.json does not exist.


#### Insert Tables

In [34]:
tables = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'caption': 'Performance comparison',
        'label': 'label',
        'table_text': 'Table content here'
    }
]

for table in tables:
    research_arcade.insert_node("arxiv_tables", node_features=table)
    print(f"Inserted table: {table}")

Inserted table: {'paper_arxiv_id': '1706.03762v7', 'caption': 'Performance comparison', 'label': 'label', 'table_text': 'Table content here'}


#### Get All Categories

In [35]:
categories_df = research_arcade.get_all_node_features("arxiv_categories")
print(f"Total categories: {len(categories_df)}")
print("\nAll categories:")
print(categories_df)

Total categories: 7

All categories:
   id     name                                        description
0   1    cs.LG                                                NaN
1   2  stat.ML                                                NaN
2   3    cs.NE                                                NaN
3   4    cs.SI                                                NaN
4   5    cs.CL  Computation and Language (Natural Language Pro...
5   6    cs.AI                            Artificial Intelligence
6   7    cs.CV            Computer Vision and Pattern Recognition


#### delete specific node by id

In [36]:
table_id = {"id": 1}
research_arcade.delete_node_by_id("arxiv_tables", table_id)

False

#### update specific node by id

In [37]:
updated_table = {
    'id': 2,
    'paper_arxiv_id': '1706.03762v7',
    'caption': 'Performance comparison',
    'label': 'label',
    'table_text': 'Table content here'
}
research_arcade.update_node("arxiv_tables", node_features=updated_table)

True

#### get all nodes

In [38]:
tables_df = research_arcade.get_all_node_features("arxiv_tables")
print(f"Total tables: {len(tables_df)}")

Total tables: 12


#### get specific node by id

In [39]:
table_id = {"id": 2}
table = research_arcade.get_node_features_by_id("arxiv_tables", table_id)
print(table.to_dict(orient="records")[0])

{'id': 2, 'paper_arxiv_id': '1706.03762v7', 'path': nan, 'caption': 'Performance comparison', 'label': 'label', 'table_text': 'Table content here'}


### arxiv_sections

#### Table Schema

- `id` (SERIAL PK)
- `content` (TEXT)
- `title` (TEXT)
- `appendix` (BOOLEAN)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `section_in_paper_id` (INT)

#### Insert From API

In [40]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_sections", config)

#### Construct Table from CSV

In [41]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_sections_example.csv"}
research_arcade.construct_table_from_csv("arxiv_sections", config)

Error: CSV file ./examples/csv_data/csv_arxiv_sections_example.csv does not exist.


#### Construct Table from JSON

In [42]:
config = {"json_file": "./examples/json_data/json_arxiv_sections_example.json"}
research_arcade.construct_table_from_json("arxiv_sections", config)

Error: JSON file ./examples/json_data/json_arxiv_sections_example.json does not exist.


#### Insert Sections

In [43]:
# Insert sections for the Transformer paper
sections = [
    {
        'content': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...',
        'title': 'Introduction',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 1
    },
    {
        'content': 'Most competitive neural sequence transduction models have an encoder-decoder structure. Here, the encoder maps an input sequence of symbol representations...',
        'title': 'Background',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 2
    },
    {
        'content': 'Most neural sequence transduction models have an encoder-decoder structure. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers...',
        'title': 'Model Architecture',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 3
    },
    {
        'content': 'In this section we describe the training regime for our models...',
        'title': 'Training',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 4
    },
    {
        'content': 'On the WMT 2014 English-to-German translation task, the big transformer model outperforms the best previously reported models...',
        'title': 'Results',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 5
    },
    {
        'content': 'In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers...',
        'title': 'Conclusion',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 6
    }
]

for section in sections:
    research_arcade.insert_node("arxiv_sections", node_features=section)
    print(f"Inserted section: {section['title']}")

Inserted section: Introduction
Inserted section: Background
Inserted section: Model Architecture
Inserted section: Training
Inserted section: Results
Inserted section: Conclusion


#### delete specific node by id

In [44]:
section_id = {"id": 1}
research_arcade.delete_node_by_id("arxiv_sections", section_id)

False

#### update specific node by id

In [45]:
updated_section = {
    'id': 1,
    'title': 'Updated Section Title',
    'content': 'Updated content'
}
research_arcade.update_node("arxiv_sections", node_features=updated_section)

False

#### get specific node by id

In [46]:
section_id = {"id": 2}
section = research_arcade.get_node_features_by_id("arxiv_sections", section_id)
print(section.to_dict(orient="records")[0])

{'id': 2, 'content': "\n\\label{sec:related}\n\n\n\nAlthough the problem of explaining GNNs is not well-studied, the related problems of interpretability and neural debugging received substantial attention in machine learning. At a high level, we can group those interpretability methods for non-graph neural networks into two main families.  \n\nMethods in the first family formulate simple proxy models of full neural networks. This can be done in a model-agnostic way, usually by learning a locally faithful approximation around the prediction, for example through linear models~\\citep{ribeiro_why_2016} or sets of rules, representing sufficient conditions on the prediction~\\citep{augasta_reverse_2012,lakkaraju_interpretable_2017,calders_deepred_2016}. Methods in the second family identify important aspects of the computation, for example, through feature gradients~\\citep{Erhan2009VisualizingHF,fleet_visualizing_2014}, backpropagation of neurons' contributions to the input features~\\cit

#### Get All Sections

In [47]:
sections_df = research_arcade.get_all_node_features("arxiv_sections")
print(f"Total sections: {sections_df}")
print("\nAll sections:")
print(sections_df[['title', 'section_in_paper_id', 'appendix']])

Total sections:     id                                            content  \
0    2  \n\label{sec:related}\n\n\n\nAlthough the prob...   
1    3  \n\label{sec:explainer}\n\n\n\begin{figure*}[t...   
2    4  \n\label{sec:exp}\n\n\n\n\n\n\hide{\nResults i...   
3    5  \n\label{sec:conclusion}\n\nWe present \longna...   
4    6  \n\nThe problem of multi-instance explanations...   
5    7  \n\nIn the context of multi-instance explanati...   
6    8  \n\n\xhdr{Training details}Training details\nW...   
7    9  \n\label{sec:intro}\nIn recent years there has...   
8   10  \n\nOur work builds upon a rich line of recent...   
9   11  \n\label{sec:proposed}\n\nThe key idea of \nam...   
10  12  \n\label{sec:ex}\n\nWe evaluate the benefits o...   
11  13  \n\nWe introduced a differentiable pooling met...   
12  14  \nThis research has been supported in part by ...   
13  15  The dominant sequence transduction models are ...   
14  16  Most competitive neural sequence transduction ...   
15  17  

### arxiv_paragraphs

#### Table Schema

- `id` (SERIAL PK)
- `paragraph_id` (INT)
- `content` (TEXT)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `paper_section` (TEXT)
- `section_id` (INT)
- `paragraph_in_paper_id` (INT)

#### Insert From API

In [48]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraphs", config)

100%|██████████| 2/2 [00:00<00:00, 523.01it/s]
100%|██████████| 2/2 [00:00<00:00, 92.93it/s]

1903.03894v4
Key to References: {'fig:explainer-motivation': 'figures_3', 'fig:explainer-intro': 'figures_4', 'fig:definition-node-features': 'figures_5', 'fig:including-node-features': 'figures_7', 'fig:subgraph_node': 'figures_8', 'fig:subgraph_graph': 'figures_9', 'fig:prototype': 'figures_12', 'fig:my_label': 'figures_11', 'fig:synth_datasets': 'table_13', 'tab:results_pr': 'table_15'}
No paper found for  cho2011friendship
No paper found for  you2018graph
No paper found for  zitnik2018decagon
No paper found for  zhang_deep_2018
No paper found for  zhou_graph_2018
No paper found for  graphsage
No paper found for  kipf2016semi
No paper found for  ying2018hierarchical
No paper found for  zhang2018link
No paper found for  doshi-velez_towards_2017
No paper found for  lakkaraju_interpretable_2017
No paper found for  ribeiro_why_2016
No paper found for  schmitz_ann-dt:_1999
No paper found for  chen2018learning
No paper found for  Erhan2009VisualizingHF
No paper found for  lundberg_unified


  df = pd.concat([df, new_row], ignore_index=True)


#### Construct Table from CSV

In [49]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paragraphs_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraphs", config)

Error: CSV file ./examples/csv_data/csv_arxiv_paragraphs_example.csv does not exist.


#### Construct Table from JSON

In [50]:
config = {"json_file": "./examples/json_data/json_arxiv_paragraphs_example.json"}
research_arcade.construct_table_from_json("arxiv_paragraphs", config)

Error: JSON file ./examples/json_data/json_arxiv_paragraphs_example.json does not exist.


#### Insert Paragraphs

In [51]:
# Insert paragraphs from the Introduction section
paragraphs = [
    {
        'paragraph_id': 1,
        'content': 'Recurrent neural networks, long short-term memory and gated recurrent neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 1
    },
    {
        'paragraph_id': 2,
        'content': 'Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures. Recurrent models typically factor computation along the symbol positions of the input and output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 2
    },
    {
        'paragraph_id': 3,
        'content': 'Aligning the positions to steps in computation time, they generate a sequence of hidden states h_t, as a function of the previous hidden state h_{t-1} and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 3
    },
    {
        'paragraph_id': 4,
        'content': 'Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 4
    },
    {
        'paragraph_id': 5,
        'content': 'In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 5
    }
]

for paragraph in paragraphs:
    research_arcade.insert_node("arxiv_paragraphs", node_features=paragraph)
    print(f"Inserted paragraph {paragraph['paragraph_id']} from {paragraph['paper_section']}")

Inserted paragraph 1 from Introduction
Inserted paragraph 2 from Introduction
Inserted paragraph 3 from Introduction
Inserted paragraph 4 from Introduction
Inserted paragraph 5 from Introduction


#### delete specific node by id

In [52]:
paragraph_id = {"id": 1}
research_arcade.delete_node_by_id("arxiv_paragraphs", paragraph_id)

False

#### update specific node by id

In [53]:
updated_paragraph = {
    'id': 2,
    'content': 'Updated paragraph content'
}
research_arcade.update_node("arxiv_paragraphs", node_features=updated_paragraph)

True

#### get specific node by id

In [54]:
paragraph_id = {"id": 2}
paragraph = research_arcade.get_node_features_by_id("arxiv_paragraphs", paragraph_id)
print(paragraph.to_dict(orient="records")[0])

{'id': 2, 'paragraph_id': 1, 'content': 'Updated paragraph content', 'paper_arxiv_id': '1903.03894v4', 'paper_section': 'Introduction', 'section_id': nan, 'paragraph_in_paper_id': nan}


#### Get All Paragraphs

In [55]:
paragraphs_df = research_arcade.get_all_node_features("arxiv_paragraphs")
print(f"Total paragraphs: {len(paragraphs_df)}")
print("\nFirst 3 paragraphs:")
print(paragraphs_df[['paragraph_id', 'paper_section', 'content']].head(3))

Total paragraphs: 255

First 3 paragraphs:
   paragraph_id paper_section  \
0             1  Introduction   
1             2  Introduction   
2             3  Introduction   

                                             content  
0                          Updated paragraph content  
1  Despite their strengths, {\gnn}\gnns lack tran...  
2  While currently there are no methods for expla...  


## EdgeTableOperations

### openreview_arxiv

#### construct table from api

In [56]:
config = {"venue": "ICLR.cc/2017/conference"}
research_arcade.construct_table_from_api("openreview_arxiv", config)

Crawling openreview arxiv data for venue: ICLR.cc/2017/conference...


  0%|          | 0/490 [00:04<?, ?it/s]


TypeError: expected string or bytes-like object, got 'NoneType'

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_arxiv.csv"}
research_arcade.construct_table_from_csv("openreview_arxiv", config)

Reading openreview arxiv data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_arxiv.csv...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 367.36it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_arxiv.json"}
research_arcade.construct_table_from_json("openreview_arxiv", config)

Reading openreview arxiv data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_arxiv.json...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 381.26it/s]


#### insert edge

In [None]:
openreview_arxiv = {'venue': 'ICLR.cc/2025/Conference', 
                    'paper_openreview_id': 'zkNCWtw2fd', 
                    'arxiv_id': '2408.10536v1',
                    'title': 'Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval'
}
research_arcade.insert_edge("openreview_arxiv", openreview_arxiv)

#### delete specific edge by id

In [None]:
openreview_id = {"paper_openreview_id": "zkNCWtw2fd"}
openreview_arxiv_df = research_arcade.delete_edge_by_id("openreview_arxiv", openreview_id)
print(openreview_arxiv_df.to_dict(orient="records")[0])

Deleted 1 records from 'openreview_arxiv' with paper_openreview_id = zkNCWtw2fd.
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zkNCWtw2fd', 'arxiv_id': '2408.10536v1', 'title': 'Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval'}


In [None]:
arxiv_id = {"arxiv_id": "http://arxiv.org/abs/2408.10536v1"}
openreview_arxiv_df = research_arcade.delete_edge_by_id("openreview_arxiv", arxiv_id)
print(openreview_arxiv_df.to_dict(orient="records")[0])

Deleted 1 records from 'arxiv_id' with arxiv_id = http://arxiv.org/abs/2408.10536v1.
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zkNCWtw2fd', 'arxiv_id': 'http://arxiv.org/abs/2408.10536v1', 'title': 'Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval'}


In [None]:
openreview_arxiv_id = {"paper_openreview_id": "zkNCWtw2fd", "arxiv_id": "http://arxiv.org/abs/2408.10536v1"}
openreview_arxiv_df = research_arcade.delete_edge_by_id("openreview_arxiv", openreview_arxiv_id)
print(openreview_arxiv_df.to_dict(orient="records")[0])

Deleted 1 records from 'openreview_arxiv' with paper_openreview_id = zkNCWtw2fd and arxiv_id = http://arxiv.org/abs/2408.10536v1.
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zkNCWtw2fd', 'arxiv_id': 'http://arxiv.org/abs/2408.10536v1', 'title': 'Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval'}


#### get all edges

In [None]:
openreview_arxiv_df = research_arcade.get_all_edge_features("openreview_arxiv")
print(len(openreview_arxiv_df))

10


#### get neighborhood by id

In [None]:
openreview_id = {"paper_openreview_id": "DnBjhWLVU1"}
openreview_arxiv_df = research_arcade.get_neighborhood("openreview_arxiv", openreview_id)
print(openreview_arxiv_df.to_dict(orient="records")[0])

arxiv_id = {"arxiv_id": "2507.04683v1"}
openreview_arxiv_df = research_arcade.get_neighborhood("openreview_arxiv", arxiv_id)
print(openreview_arxiv_df.to_dict(orient="records")[0])

{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'DnBjhWLVU1', 'arxiv_id': '2507.04683v1', 'title': 'Recovering Plasticity of Neural Networks via Soft Weight Rescaling'}
{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'DnBjhWLVU1', 'arxiv_id': '2507.04683v1', 'title': 'Recovering Plasticity of Neural Networks via Soft Weight Rescaling'}


### openreview_papers_authors

#### construct table from api

In [None]:
config = {"venue": "ICLR.cc/2025/Conference"}
research_arcade.construct_table_from_api("openreview_papers_authors", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_authors.csv"}
research_arcade.construct_table_from_csv("openreview_papers_authors", config)

Reading papers-authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_authors.csv...
Inserting papers-authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_authors.csv...


100%|██████████| 645/645 [00:01<00:00, 439.35it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_authors.json"}
research_arcade.construct_table_from_json("openreview_papers_authors", config)

Reading papers-authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_authors.json...
Inserting papers-authors data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_authors.json...


100%|██████████| 10/10 [00:00<00:00, 271.20it/s]


#### insert edge

In [None]:
paper_authors = [{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Zaid_Khan1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Jaemin_Cho1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Mohit_Bansal2'}]
for item in paper_authors:
    research_arcade.insert_edge("openreview_papers_authors", item)

author_papers = [{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'Xbl6t6zxZs', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'fDcn3S8oAt', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'j9wBgcxa7N', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zd0iX5xBhA', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2024/Conference', 'paper_openreview_id': 'L4nOxziGf9', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2024/Conference', 'paper_openreview_id': 'qL9gogRepu', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Elias_Stengel-Eskin1'}]
for item in author_papers:
    research_arcade.insert_edge("openreview_papers_authors", item)

paper_author = [{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Elias_Stengel-Eskin1'}]
for item in paper_author:
    research_arcade.insert_edge("openreview_papers_authors", item)

#### delete specific edge by id

In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
openreview_papers_authors = research_arcade.delete_edge_by_id("openreview_papers_authors", paper_id)
print(openreview_papers_authors.to_dict(orient="records"))

The connection for paper 00SnKBGTsz is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Jaemin_Cho1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Mohit_Bansal2'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Zaid_Khan1'}]


In [None]:
author_id = {'author_openreview_id': '~Elias_Stengel-Eskin1'}
openreview_papers_authors = research_arcade.delete_edge_by_id("openreview_papers_authors", author_id)
print(openreview_papers_authors.to_dict(orient="records"))

The connection for author ~Elias_Stengel-Eskin1 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'Xbl6t6zxZs', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'fDcn3S8oAt', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'j9wBgcxa7N', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zd0iX5xBhA', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2024/Conference', 'paper_openreview_id': 'L4nOxziGf9', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2024/Conference', 'paper_openreview_id': 'qL9gogRepu', 'author_openreview_id': '~Elias_Stengel-Eskin1'}]


In [None]:
paper_author = {"paper_openreview_id": "00SnKBGTsz", 'author_openreview_id': '~Elias_Stengel-Eskin1'}
openreview_papers_authors = research_arcade.delete_edge_by_id("openreview_papers_authors", paper_author)
print(openreview_papers_authors.to_dict(orient="records"))

The connection between paper 00SnKBGTsz and author ~Elias_Stengel-Eskin1 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Elias_Stengel-Eskin1'}]


#### get all edges

In [None]:
openreview_papers_authors = research_arcade.get_all_edge_features("openreview_papers_authors")
print(len(openreview_papers_authors))

651


#### get neighborhood by id

In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
openreview_papers_authors = research_arcade.get_neighborhood("openreview_papers_authors", paper_id)
print(openreview_papers_authors.to_dict(orient="records"))

author_id = {'author_openreview_id': '~Elias_Stengel-Eskin1'}
openreview_papers_authors = research_arcade.get_neighborhood("openreview_papers_authors", author_id)
print(openreview_papers_authors.to_dict(orient="records"))

[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Zaid_Khan1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Jaemin_Cho1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'author_openreview_id': '~Mohit_Bansal2'}]
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'Xbl6t6zxZs', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'fDcn3S8oAt', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'j9wBgcxa7N', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': 'zd0iX5xBhA', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2024/Conference', 'paper_openreview_id': 'L4nOxziGf9', 'author_openreview_id': '~Elias_Stengel-Eskin1'}, {'venue': 'ICLR.cc/2024/Conferen

### openreview_papers_reviews

#### construct table from api

In [None]:
config = {"venue": "ICLR.cc/2017/conference"}
research_arcade.construct_table_from_api("openreview_papers_reviews", config)

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_reviews.csv"}
research_arcade.construct_table_from_csv("openreview_papers_reviews", config)

Reading paper-review connection data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_reviews.csv...
Inserting data into CSV file...


100%|██████████| 183/183 [00:00<00:00, 538.84it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_reviews.json"}
research_arcade.construct_table_from_json("openreview_papers_reviews", config)

Reading paper-review connection data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_reviews.json...
Inserting data into CSV file...


100%|██████████| 10/10 [00:00<00:00, 309.29it/s]


#### insert edge

In [None]:
paper_review = {'venue': 'ICLR.cc/2025/Conference', 
                'paper_openreview_id': '00SnKBGTsz', 
                'review_openreview_id': '13mj0Rtn5W', 
                'title': 'Response by Authors', 
                'time': '2024-11-27 17:27:45'}
research_arcade.insert_edge("openreview_papers_reviews", paper_review)

paper_reviews = [{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '7XT4kLWV2f', 'title': 'Official Review by Reviewer_wuGW', 'time': '2024-11-01 14:52:22'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'i3QgWgrJff', 'title': 'Official Review by Reviewer_rVo8', 'time': '2024-11-04 02:37:10'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'GMsjHLXdOx', 'title': 'Official Review by Reviewer_c5nB', 'time': '2024-11-04 09:59:14'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'r8ZflFk3T7', 'title': 'Official Review by Reviewer_VQ9Y', 'time': '2024-11-06 00:15:47'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '4CnQpVCYkF', 'title': 'Response by Authors', 'time': '2024-11-20 22:48:42'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'h1qvpjhRP3', 'title': 'Response by Authors', 'time': '2024-11-20 22:51:07'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'pOR42YNLtU', 'title': 'Response by Authors', 'time': '2024-11-20 22:55:04'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'Aq2tBtB0lt', 'title': 'Response by Authors', 'time': '2024-11-20 22:57:18'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'm1iUqPHpwk', 'title': 'Response by Authors', 'time': '2024-11-20 22:58:29'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '66buacQmRe', 'title': 'Response by Authors', 'time': '2024-11-20 23:02:21'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'Bgr7Ol90m7', 'title': 'Response by Authors', 'time': '2024-11-22 23:11:06'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'H2h2K6a8x5', 'title': 'Response by Reviewer', 'time': '2024-11-23 10:04:58'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'la5jPwJU4g', 'title': 'Response by Authors', 'time': '2024-11-24 19:17:22'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'DjVKsUoFN2', 'title': 'Response by Reviewer', 'time': '2024-11-25 04:00:18'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'C3MhCuKhTf', 'title': 'Response by Authors', 'time': '2024-11-25 19:44:38'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'ZqwAYtcmhv', 'title': 'Response by Authors', 'time': '2024-11-25 19:45:43'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '9OQJoesINr', 'title': 'Response by Reviewer', 'time': '2024-11-25 20:07:51'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'wqTNtVDwef', 'title': 'Response by Authors', 'time': '2024-11-26 03:32:30'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'NEsxOTkkIV', 'title': 'Response by Reviewer', 'time': '2024-11-26 20:00:00'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '13mj0Rtn5W', 'title': 'Response by Authors', 'time': '2024-11-27 17:27:45'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'hWat8aFBRw', 'title': 'Response by Reviewer', 'time': '2024-11-27 11:34:03'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'wnsiUkDh00', 'title': 'Response by Authors', 'time': '2024-11-27 17:28:35'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'zpboemkkjR', 'title': 'Meta Review of Submission11063 by Area_Chair_eoLd', 'time': '2024-12-20 15:14:25'}, 
                 {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'kokKFEn2fw', 'title': 'Paper Decision', 'time': '2025-01-22 05:35:00'}
]
for item in tqdm(paper_reviews):
    research_arcade.insert_edge("openreview_papers_reviews", item)

100%|██████████| 24/24 [00:00<00:00, 281.10it/s]


#### delete specific edge by id

In [None]:
paper_review_id = {"paper_openreview_id": "00SnKBGTsz", "review_openreview_id": "13mj0Rtn5W"}
openreview_papers_reviews = research_arcade.delete_edge_by_id("openreview_papers_reviews", paper_review_id)
print(openreview_papers_reviews.to_dict(orient="records"))

The connection between paper 00SnKBGTsz and review 13mj0Rtn5W is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '13mj0Rtn5W', 'title': 'Response by Authors', 'time': '2024-11-27 17:27:45'}]


In [None]:
review_id = {"review_openreview_id": "13mj0Rtn5W"}
openreview_papers_reviews = research_arcade.delete_edge_by_id("openreview_papers_reviews", review_id)
print(openreview_papers_reviews.to_dict(orient="records"))

The connection between review 13mj0Rtn5W is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '13mj0Rtn5W', 'title': 'Response by Authors', 'time': '2024-11-27 17:27:45'}]


In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
openreview_papers_reviews = research_arcade.delete_edge_by_id("openreview_papers_reviews", paper_id)
print(openreview_papers_reviews.to_dict(orient="records"))

The connection between paper 00SnKBGTsz is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '7XT4kLWV2f', 'title': 'Official Review by Reviewer_wuGW', 'time': '2024-11-01 14:52:22'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'i3QgWgrJff', 'title': 'Official Review by Reviewer_rVo8', 'time': '2024-11-04 02:37:10'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'GMsjHLXdOx', 'title': 'Official Review by Reviewer_c5nB', 'time': '2024-11-04 09:59:14'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'r8ZflFk3T7', 'title': 'Official Review by Reviewer_VQ9Y', 'time': '2024-11-06 00:15:47'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '4CnQpVCYkF', 'title': 'Response by Authors', 'time': '2024-11-20 22:48:42'}, {'

#### get all edges

In [None]:
openreview_papers_reviews = research_arcade.get_all_edge_features("openreview_papers_reviews")
print(len(openreview_papers_reviews))

183


#### get neighborhood by id

In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
openreview_papers_reviews = research_arcade.get_neighborhood("openreview_papers_reviews", paper_id)
print(openreview_papers_reviews.to_dict(orient="records"))

review_id = {"review_openreview_id": "13mj0Rtn5W"}
openreview_papers_reviews = research_arcade.get_neighborhood("openreview_papers_reviews", review_id)
print(openreview_papers_reviews.to_dict(orient="records"))

[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '13mj0Rtn5W', 'title': 'Response by Authors', 'time': '2024-11-27 17:27:45'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': '7XT4kLWV2f', 'title': 'Official Review by Reviewer_wuGW', 'time': '2024-11-01 14:52:22'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'i3QgWgrJff', 'title': 'Official Review by Reviewer_rVo8', 'time': '2024-11-04 02:37:10'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'GMsjHLXdOx', 'title': 'Official Review by Reviewer_c5nB', 'time': '2024-11-04 09:59:14'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'review_openreview_id': 'r8ZflFk3T7', 'title': 'Official Review by Reviewer_VQ9Y', 'time': '2024-11-06 00:15:47'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKB

### openreview_papers_revisions

#### construct table from api

In [None]:
config = {"venue": "ICLR.cc/2025/Conference"}
research_arcade.construct_table_from_api("openreview_papers_revisions", config)

Getting V2 Notes:  51%|█████▏    | 5994/11672 [00:14<00:01, 3320.94it/s]

#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_revisions.csv"}
research_arcade.construct_table_from_csv("openreview_papers_revisions", config)

Reading paper-revision data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_revisions.csv...
Inserting paper-revision data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_papers_revisions.csv into CSV file...


100%|██████████| 10/10 [00:00<00:00, 359.89it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_revisions.json"}
research_arcade.construct_table_from_json("openreview_papers_revisions", config)

Reading revisions data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_revisions.json...
Inserting paper-revision data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_papers_revisions.json into CSV file...


100%|██████████| 10/10 [00:00<00:00, 343.89it/s]


#### insert edge

In [None]:
paper_revision = {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}
research_arcade.insert_edge("openreview_papers_revisions", paper_revision)

paper_revisions = [{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'oT4N28siLO', 'title': 'Camera_Ready_Revision', 'time': '2025-03-02 01:35:16'}, 
                   {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}]
for item in tqdm(paper_revisions):
    research_arcade.insert_edge("openreview_papers_revisions", item)

100%|██████████| 2/2 [00:00<00:00, 208.16it/s]


#### delete specific node by id

In [None]:
paper_revision_id = {"paper_openreview_id": "00SnKBGTsz", "revision_openreview_id": "dzL3IRBnE4"}
paper_revision = research_arcade.delete_edge_by_id("openreview_papers_revisions", paper_revision_id)
print(paper_revision.to_dict(orient="records"))

The connection between paper 00SnKBGTsz and revision dzL3IRBnE4 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}]


In [None]:
revision_id = {"revision_openreview_id": "dzL3IRBnE4"}
paper_revision = research_arcade.delete_edge_by_id("openreview_papers_revisions", revision_id)
print(paper_revision.to_dict(orient="records"))

The connection for revision dzL3IRBnE4 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}]


In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
paper_revision = research_arcade.delete_edge_by_id("openreview_papers_revisions", paper_id)
print(paper_revision.to_dict(orient="records"))

The connection for paper 00SnKBGTsz is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'oT4N28siLO', 'title': 'Camera_Ready_Revision', 'time': '2025-03-02 01:35:16'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}]


#### get all edges

In [None]:
openreview_papers_revisions = research_arcade.get_all_edge_features("openreview_papers_revisions")
print(len(openreview_papers_revisions))

12


#### get neighborhood by id

In [None]:
paper_id = {"paper_openreview_id": "00SnKBGTsz"}
paper_revision = research_arcade.get_neighborhood("openreview_papers_revisions", paper_id)
print(paper_revision.to_dict(orient="records"))

revision_id = {"revision_openreview_id": "dzL3IRBnE4"}
paper_revision = research_arcade.get_neighborhood("openreview_papers_revisions", revision_id)
print(paper_revision.to_dict(orient="records"))

[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}, {'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'oT4N28siLO', 'title': 'Camera_Ready_Revision', 'time': '2025-03-02 01:35:16'}]
[{'venue': 'ICLR.cc/2025/Conference', 'paper_openreview_id': '00SnKBGTsz', 'revision_openreview_id': 'dzL3IRBnE4', 'title': 'Camera_Ready_Revision', 'time': '2025-03-01 03:36:55'}]


### openreview_revisions_reviews

#### construct table based on existing tables

In [None]:
papers_reviews_df = research_arcade.get_all_edge_features("openreview_papers_reviews")
print(len(papers_reviews_df))
papers_revisions_df = research_arcade.get_all_edge_features("openreview_papers_revisions")
print(len(papers_revisions_df))
config = {"papers_reviews_df": papers_reviews_df, "papers_revisions_df": papers_revisions_df}
research_arcade.construct_table_from_api("openreview_revisions_reviews", config)

207
12
Constructing revisions-reviews connections for 3 papers...


100%|██████████| 3/3 [00:00<00:00, 41.32it/s]

Revisions-reviews table construction completed.





#### construct table from csv

In [None]:
config = {"csv_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_revisions_reviews.csv"}
research_arcade.construct_table_from_csv("openreview_revisions_reviews", config)

Reading revisions-reviews data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_revisions_reviews.csv...
Inserting revisions-reviews data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/csv_data/openreview_revisions_reviews.csv...


100%|██████████| 34/34 [00:00<00:00, 553.12it/s]


#### construct table from json

In [None]:
config = {"json_file": "/home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_revisions_reviews.json"}
research_arcade.construct_table_from_json("openreview_revisions_reviews", config)

Reading revisions-reviews data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_revisions_reviews.json...
Inserting revisions-reviews data from /home/jingjunx/openreview_benchmark/Code/research-arcade/examples/json_data/openreview_revisions_reviews.json...


100%|██████████| 10/10 [00:00<00:00, 375.56it/s]


#### insert edge

In [None]:
revision_review = {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}
research_arcade.insert_edge("openreview_revisions_reviews", revision_review)

revision_reviews = [{'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '138cOdBpgA'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'yKh1fQYnUZ'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Pvt0OjNSp2'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'MUhlEYyBD9'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '2mqiS3J8wC'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Er8QTorcyr'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'AvtD9uxRtX'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '2tgxTGynNm'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '5MKJE3sFsd'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wViZ0H4ErF'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '0c1It75dTb'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'PFwia9lcjP'}, 
                    {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'ygCqaGNPee'}]
for item in tqdm(revision_reviews):
    research_arcade.insert_edge("openreview_revisions_reviews", item)

100%|██████████| 14/14 [00:00<00:00, 477.83it/s]


#### delete edge by id

In [None]:
revision_review_id = {'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}
revision_review = research_arcade.delete_edge_by_id("openreview_revisions_reviews", revision_review_id)
print(revision_review.to_dict(orient="records"))

The connection between revision cX02yuzwWI and review wumckDPIQ3 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}]


In [None]:
review_id = {'review_openreview_id': 'wumckDPIQ3'}
revision_review = research_arcade.delete_edge_by_id("openreview_revisions_reviews", review_id)
print(revision_review.to_dict(orient="records"))

The connection for review wumckDPIQ3 is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}]


In [None]:
paper_id = {'revision_openreview_id': 'cX02yuzwWI'}
revision_review = research_arcade.delete_edge_by_id("openreview_revisions_reviews", paper_id)
print(revision_review.to_dict(orient="records"))

The connection for revision cX02yuzwWI is deleted successfully.
[{'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'wumckDPIQ3'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '138cOdBpgA'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'yKh1fQYnUZ'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Pvt0OjNSp2'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'MUhlEYyBD9'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '2mqiS3J8wC'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Er8QTorcyr'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'AvtD9uxRtX'}, {'venue

#### get all edges

In [None]:
openreview_revisions_reviews = research_arcade.get_all_edge_features("openreview_revisions_reviews")
print(len(openreview_revisions_reviews))

47


#### get neighborhood by id

In [None]:
revision_id = {'revision_openreview_id': 'cX02yuzwWI'}
revision_review = research_arcade.get_neighborhood("openreview_revisions_reviews", revision_id)
print(revision_review.to_dict(orient="records"))

review_id = {'review_openreview_id': 'wumckDPIQ3'}
revision_review = research_arcade.get_neighborhood("openreview_revisions_reviews", review_id)
print(revision_review.to_dict(orient="records"))

[{'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '138cOdBpgA'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'yKh1fQYnUZ'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Pvt0OjNSp2'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'MUhlEYyBD9'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '2mqiS3J8wC'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'Er8QTorcyr'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': 'AvtD9uxRtX'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuzwWI', 'review_openreview_id': '2tgxTGynNm'}, {'venue': 'ICLR.cc/2025/Conference', 'revision_openreview_id': 'cX02yuz

### arxiv_citations

#### Insert Citation

In [57]:
citation = {
    'citing_arxiv_id': '1810.04805v2',
    'cited_arxiv_id': '1706.03762v7',
    'bib_title': 'attention is all you need',
    'bib_key': 'something',
    'citing_sections': 'citing_sections',
}
research_arcade.insert_edge("arxiv_citation", edge_features=citation)
research_arcade.insert_edge("arxiv_citation", edge_features=citation)
research_arcade.insert_edge("arxiv_citation", edge_features=citation)
print("Citation created!")

Citation created!


#### Construct Table from CSV

In [58]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_citation_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_citation", config)

Error: CSV file ./examples/csv_data/csv_arxiv_paper_citation_example.csv does not exist.


#### Construct Table from JSON

In [59]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_citation_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_citation", config)

Error: JSON file ./examples/json_data/json_arxiv_paper_citation_example.json does not exist.


#### get all edges

In [60]:
all_citations = research_arcade.get_all_edge_features("arxiv_citation")
print(len(all_citations))

1


#### get papers cited by a paper

In [61]:
citing_id = {"citing_paper_id": "1810.04805v2"}
cited_papers = research_arcade.get_neighborhood("arxiv_citation", citing_id)
print(cited_papers.to_dict(orient="records"))

[{'id': 1, 'citing_arxiv_id': '1810.04805v2', 'cited_arxiv_id': '1706.03762v7', 'bib_title': 'attention is all you need', 'bib_key': 'something', 'citing_sections': '"citing_sections"', 'citing_paragraphs': '[]'}]


#### get papers that cite a paper

In [62]:
cited_id = {"cited_paper_id": "1706.03762v7"}
citing_papers = research_arcade.get_neighborhood("arxiv_citation", cited_id)
print(citing_papers.to_dict(orient="records"))

[{'id': 1, 'citing_arxiv_id': '1810.04805v2', 'cited_arxiv_id': '1706.03762v7', 'bib_title': 'attention is all you need', 'bib_key': 'something', 'citing_sections': '"citing_sections"', 'citing_paragraphs': '[]'}]


#### Get All Citations

In [63]:
all_citations = research_arcade.get_all_edge_features("arxiv_citation")
print(f"Total citations: {len(all_citations)}")
print(all_citations.head())

Total citations: 1
   id citing_arxiv_id cited_arxiv_id                  bib_title    bib_key  \
0   1    1810.04805v2   1706.03762v7  attention is all you need  something   

     citing_sections citing_paragraphs  
0  "citing_sections"                []  


#### Get Cited Papers

In [64]:
citing_paper = {'citing_paper_id': '1810.04805v2'}
cited_papers = research_arcade.get_neighborhood("arxiv_citation", primary_key=citing_paper)
print("Papers cited:")
print(cited_papers)

Papers cited:
   id citing_arxiv_id cited_arxiv_id                  bib_title    bib_key  \
0   1    1810.04805v2   1706.03762v7  attention is all you need  something   

     citing_sections citing_paragraphs  
0  "citing_sections"                []  


#### Get Citing Papers

In [65]:
cited_paper = {'cited_paper_id': '1706.03762v7'}
citing_papers = research_arcade.get_neighborhood("arxiv_citation", primary_key=cited_paper)
print("Papers that cite:")
print(citing_papers)

Papers that cite:
   id citing_arxiv_id cited_arxiv_id                  bib_title    bib_key  \
0   1    1810.04805v2   1706.03762v7  attention is all you need  something   

     citing_sections citing_paragraphs  
0  "citing_sections"                []  


#### Delete Citation

In [66]:
citation_id = {
    'citing_paper_id': '1810.04805v2',
    'cited_paper_id': '1706.03762v7'
}
research_arcade.delete_edge_by_id("arxiv_citation", primary_key=citation_id)
print("Citation deleted!")

Deleted citation: 1810.04805v2 -> 1706.03762v7
Citation deleted!


#### delete by citing_paper_id

In [68]:
citing_id = {"citing_paper_id": "1810.04805v2"}
result = research_arcade.delete_edge_by_id("arxiv_citation", citing_id)
# print(result.to_dict(orient="records"))

#### delete by cited_paper_id

In [70]:
cited_id = {"cited_paper_id": "1706.03762v7"}
result = research_arcade.delete_edge_by_id("arxiv_citation", cited_id)
# print(result.to_dict(orient="records"))

### arxiv_papers_authors

#### Insert Paper-Author Relationships

In [71]:
paper_authors = [
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_ashish_vaswani', 'author_sequence': 1, 'author_name': 'Ashish Vaswani'},
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_noam_shazeer', 'author_sequence': 2, 'author_name': 'Noam Shazeer'},
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_niki_parmar', 'author_sequence': 3, 'author_name': 'Niki Parmar'}
]
for relation in paper_authors:
    research_arcade.insert_edge("arxiv_paper_author", edge_features=relation)
    print(f"Linked author {relation['author_id']} (position {relation['author_sequence']})")

Linked author ss_ashish_vaswani (position 1)
Linked author ss_noam_shazeer (position 2)
Linked author ss_niki_parmar (position 3)


#### Construct Table from CSV

In [72]:
config = {"csv_file": "./csv_data/arxiv_paper_authors.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_author", config)

Successfully imported 10 paper-author relationships from ./csv_data/arxiv_paper_authors.csv


#### Construct Table from JSON

In [73]:
config = {"json_file": "./json_data/arxiv_paper_authors.json"}
research_arcade.construct_table_from_json("arxiv_paper_author", config)

No new paper-author relationships to import


#### Get All Paper-Author Relationships

In [74]:
all_relations = research_arcade.get_all_edge_features("arxiv_paper_author")
print(f"Total relationships: {len(all_relations)}")
print(all_relations.head(10))

Total relationships: 23
  paper_arxiv_id   author_id  author_sequence author_name
0     2502.13728  2166504589                1         NaN
1     2503.09712  2042303571                1         NaN
2     2506.12689  2367197357                4         NaN
3     2504.11669    50997909                1         NaN
4     2504.11367  2258720735                1         NaN
5     2502.07436  2345008510                3         NaN
6       2504.108     1795170                2         NaN
7      2502.1353  2258638253                6         NaN
8     2504.10800     1795170                2         NaN
9     2502.13530  2258638253                6         NaN


#### Get Authors for a Paper

In [75]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
authors = research_arcade.get_neighborhood("arxiv_paper_author", primary_key=paper_id)
print("Authors:")
print(authors.sort_values('author_sequence'))

Authors:
  paper_arxiv_id          author_id  author_sequence     author_name
0   1706.03762v7  ss_ashish_vaswani                1  Ashish Vaswani
1   1706.03762v7    ss_noam_shazeer                2    Noam Shazeer
2   1706.03762v7     ss_niki_parmar                3     Niki Parmar


#### Get Papers by Author

In [76]:
author_id = {'author_id': 'ss_ashish_vaswani'}
papers = research_arcade.get_neighborhood("arxiv_paper_author", primary_key=author_id)
print("Papers by author:")
print(papers)

Papers by author:
  paper_arxiv_id          author_id  author_sequence     author_name
0   1706.03762v7  ss_ashish_vaswani                1  Ashish Vaswani


#### Delete Paper-Author Link

In [77]:
relation_id = {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_ashish_vaswani'}
research_arcade.delete_edge_by_id("arxiv_paper_author", primary_key=relation_id)
print("Relationship deleted!")

Relationship deleted!


#### delete by paper_arxiv_id

In [78]:
paper_id = {"paper_arxiv_id": "2505.21249"}
result = research_arcade.delete_edge_by_id("arxiv_paper_author", paper_id)
# print(result)

#### delete by author_id

In [79]:
author_id = {"author_id": 3129798}
result = research_arcade.delete_edge_by_id("arxiv_paper_author", author_id)
# print(result.to_dict(orient="records"))


### arxiv_papers_categories

#### Insert Paper-Category Relationships

In [80]:
paper_categories = [
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '1'},
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '1'},
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '2'}
]
for relation in paper_categories:
    research_arcade.insert_edge("arxiv_paper_category", edge_features=relation)
    print(f"Linked {relation['category_id']}")

Linked 1
Linked 1
Linked 2


#### Construct Table from CSV

In [81]:
config = {"csv_file": "./csv_data/arxiv_paper_category.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_category", config)

No new paper-category relationships to import


#### Construct Table from JSON

In [82]:
config = {"json_file": "./json_data/arxiv_paper_category.json"}
research_arcade.construct_table_from_json("arxiv_paper_category", config)

Successfully imported 10 paper-category relationships from ./json_data/arxiv_paper_category.json


#### Get All Paper-Category Relationships

In [83]:
all_relations = research_arcade.get_all_edge_features("arxiv_paper_category")
print(f"Total relationships: {len(all_relations)}")
print(all_relations.head())

Total relationships: 33
  paper_arxiv_id  category_id
0     2402.07925           85
1     2505.14516          390
2     2501.18192           29
3     2504.07777           91
4   1912.03049v4          263


#### Get Categories for Paper

In [84]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
categories = research_arcade.get_neighborhood("arxiv_paper_category", primary_key=paper_id)
print("Categories:")
print(categories)

Categories:
  paper_arxiv_id  category_id
0   1706.03762v7            1
1   1706.03762v7            1
2   1706.03762v7            2


#### Get Papers in Category

In [85]:
category_id = {'category_id': 'cs.LG'}
papers = research_arcade.get_neighborhood("arxiv_paper_category", primary_key=category_id)
print("Papers in category:")
print(papers)

Papers in category:
None


#### Delete Paper-Category Link

In [86]:
relation_id = {'paper_arxiv_id': '1706.03762v7', 'category_id': 'cs.AI'}
research_arcade.delete_edge_by_id("arxiv_paper_category", primary_key=relation_id)
print("Relationship deleted!")

Relationship deleted!


#### delete by paper_arxiv_id

In [87]:
paper_id = {"paper_arxiv_id": "1706.03762v7"}
result = research_arcade.delete_edge_by_id("arxiv_paper_category", paper_id)

#### delete by category_id

In [88]:
category_id = {"category_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paper_category", category_id)

### arxiv_papers_figures

#### Insert Paper-Figure Relationships

In [89]:
paper_figures = [
    {'paper_arxiv_id': '1706.03762v7', 'figure_id': 1},
    {'paper_arxiv_id': '1706.03762v7', 'figure_id': 2}
]
for relation in paper_figures:
    research_arcade.insert_edge("arxiv_paper_figure", edge_features=relation)
    print(f"Linked figure {relation['figure_id']})")

Linked figure 1)
Linked figure 2)


#### Construct Table from CSV

In [90]:
config = {"csv_file": "./csv_data/arxiv_paper_figures.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_figure", config)

Successfully imported 2 paper-figure relationships from ./csv_data/arxiv_paper_figures.csv


#### Construct Table from JSON

In [91]:
config = {"json_file": "./json_data/arxiv_paper_figures.json"}
research_arcade.construct_table_from_json("arxiv_paper_figure", config)

Successfully imported 10 paper-figure relationships from ./json_data/arxiv_paper_figures.json


#### construct table from api

In [92]:
config = {"arxiv_ids": ["1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paper_figures", config)

arxiv_id
1806.08804v4
label
\label{fig:illustration}
arxiv_id
1806.08804v4
label
\label{fig:assignment_vis}
arxiv_id
1806.08804v4
label
None


arxiv_id
1806.08804v4
label
None
arxiv_id
1806.08804v4
label
None
arxiv_id
1806.08804v4
label
None
arxiv_id
1806.08804v4
label
None
arxiv_id
1806.08804v4
label
None


  df = pd.concat([df, new_row], ignore_index=True)
  df = pd.concat([df, new_row], ignore_index=True)
  df = pd.concat([df, new_row], ignore_index=True)
  df = pd.concat([df, new_row], ignore_index=True)
  df = pd.concat([df, new_row], ignore_index=True)
  df = pd.concat([df, new_row], ignore_index=True)


#### Get Figures for Paper

In [93]:
paper_id = {'paper_arxiv_id': '1806.08804v4'}
figures = research_arcade.get_neighborhood("arxiv_paper_figure", primary_key=paper_id)
print("Figures:")
print(figures)

Figures:
  paper_arxiv_id  figure_id
0   1806.08804v4       15.0
1   1806.08804v4       16.0
2   1806.08804v4        NaN
3   1806.08804v4        NaN
4   1806.08804v4        NaN
5   1806.08804v4        NaN
6   1806.08804v4        NaN
7   1806.08804v4        NaN


#### delete by paper_arxiv_id

In [94]:
paper_id = {"paper_arxiv_id": "2507.13024"}
result = research_arcade.delete_edge_by_id("arxiv_paper_figure", paper_id)

#### delete by figure_id

In [95]:
figure_id = {"figure_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paper_figure", figure_id)

#### delete by both ids

In [96]:
paper_figure_id = {
    "paper_arxiv_id": "2410.23123v2",
    "figure_id": 476323
}
result = research_arcade.delete_edge_by_id("arxiv_paper_figure", paper_figure_id)

#### get all edges

In [97]:
all_edges = research_arcade.get_all_edge_features("arxiv_paper_figure")
print(len(all_edges))

33


### arxiv_papers_tables

#### Insert Paper-Table Relationships

In [98]:
paper_tables = [
    {'paper_arxiv_id': '1706.03762v7', 'table_id': 1},
    {'paper_arxiv_id': '1706.03762v7', 'table_id': 2}
]
for relation in paper_tables:
    research_arcade.insert_edge("arxiv_paper_table", edge_features=relation)
    print(f"Linked table {relation['table_id']}")

Linked table 1
Linked table 2


#### Construct Table from CSV

In [99]:
config = {"csv_file": "./csv_data/arxiv_paper_tables.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_table", config)

No new paper-table relationships to import


#### Construct Table from JSON

In [100]:
config = {"json_file": "./json_data/arxiv_paper_tables.json"}
research_arcade.construct_table_from_json("arxiv_paper_table", config)

Successfully imported 10 paper-table relationships from ./json_data/arxiv_paper_tables.json


#### construct table from api

In [101]:
config = {"arxiv_ids": ["1706.03762v7"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paper_table", config)

Table arxiv_paper_table does not support construction from API


#### Get Tables for Paper

In [102]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
tables = research_arcade.get_neighborhood("arxiv_paper_table", primary_key=paper_id)
print("Tables:")

Tables:


#### delete by paper_arxiv_id

In [103]:
paper_id = {"paper_arxiv_id": "1706.03762v7"}
result = research_arcade.delete_edge_by_id("arxiv_paper_table", paper_id)

#### delete by table_id

In [104]:
table_id = {"table_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paper_table", table_id)

#### delete by both ids

In [105]:
paper_table_id = {
    "paper_arxiv_id": "1706.03762v7",
    "table_id": 1
}
result = research_arcade.delete_edge_by_id("arxiv_paper_table", paper_table_id)

#### get all edges

In [106]:
all_edges = research_arcade.get_all_edge_features("arxiv_paper_table")
print(len(all_edges))

20


### arxiv_paragraphs_references

#### Insert Paragraph-Reference Relationships

In [107]:
paragraph_references = [
    {'paragraph_id': 1, 'paper_section': 'established approaches', 'paper_arxiv_id': '1706.03762v7', 'reference_label': "{something}", 'reference_type': 'figure'}
]

for relation in paragraph_references:
    research_arcade.insert_edge("arxiv_paragraph_reference", edge_features=relation)

#### Construct Table from CSV

In [108]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paragraph_reference_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraph_reference", config)

Error: CSV file ./examples/csv_data/csv_arxiv_paragraph_reference_example.csv does not exist.


#### Construct Table from JSON

In [109]:
config = {"json_file": "./examples/json_data/json_arxiv_paragraph_reference_example.json"}
research_arcade.construct_table_from_json("arxiv_paragraph_reference", config)

Error: JSON file ./examples/json_data/json_arxiv_paragraph_reference_example.json does not exist.


#### construct table from api

In [111]:
config = {"arxiv_ids": ["1806.08804v4", "1903.03894v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraph_references", config)

#### delete by paragraph_id

In [112]:
paragraph_id = {"paragraph_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_reference", paragraph_id)

#### Get References in Paragraph

In [113]:
paragraph_id = {'paragraph_id': 1}
references = research_arcade.get_neighborhood("arxiv_paragraph_reference", primary_key=paragraph_id)
print("References:")
print(references)

References:
None


#### get all edges

In [114]:

all_edges = research_arcade.get_all_edge_features("arxiv_paragraph_reference")
print(len(all_edges))

67


### arxiv_paragraphs_citations

#### Insert paragraph citation relationship

In [115]:
# Link specific paragraphs to cited papers
paragraph_citations = [
    {
        'paragraph_id': 1,
        'paper_section': 'Introduction',
        'citing_arxiv_id': '1810.04805v2',
        'cited_arxiv_id': '1706.03762v7',
        'bib_key': 'vaswani2017attention'
    },
    {
        'paragraph_id': 4,
        'paper_section': 'Related Work',
        'citing_arxiv_id': '1810.04805v2',
        'cited_arxiv_id': '1706.03762v7',
        'bib_key': 'vaswani2017attention'
    }
]

for relation in paragraph_citations:
    research_arcade.insert_edge("arxiv_paragraph_citation", edge_features=relation)
    print(f"Paragraph {relation['paragraph_id']} cites {relation['cited_arxiv_id']}")

Paragraph 1 cites 1706.03762v7
Paragraph 4 cites 1706.03762v7


#### construct table from csv

In [116]:
config = {"csv_file": "/examples/csv_data/arxiv_paragraph_figures.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraph_figure", config)

Error: CSV file /examples/csv_data/arxiv_paragraph_figures.csv does not exist.


#### construct table from json

In [117]:
config = {"json_file": "/examples/csv_data/arxiv_paragraph_figures.json"}
research_arcade.construct_table_from_json("arxiv_paragraph_figure", config)

Error: JSON file /examples/csv_data/arxiv_paragraph_figures.json does not exist.


#### construct table from api
Notice: API construction requires paragraphs data preprocessed

In [118]:
config = {"arxiv_ids": ["1810.04805v2"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraph_citation", config)

Table arxiv_paragraph_citation does not support construction from API


#### insert edge

In [119]:
paragraph_citation = {
    'paragraph_id': 0,
    'paper_section': 'Introduction',
    'citing_arxiv_id': '1810.04805v2',
    'bib_key': 'vaswani2017attention',
    'cited_arxiv_id': '1706.03762v7',  # optional
    'paragraph_global_id': 1  # optional
}
research_arcade.insert_edge("arxiv_paragraph_citation", paragraph_citation)

np.int64(3)

#### delete by paragraph_id

In [120]:
paragraph_citation = {
    'paragraph_id': 1
}
count = research_arcade.delete_edge_by_id('arxiv_paragraph_citation', paragraph_citation)
print(f"Deleted {count} paragraph citations")

Deleted 1 paragraph citations


#### get all edges

In [121]:
all_citations = research_arcade.get_all_edge_features('arxiv_paragraph_citation')
if all_citations is not None:
    print(f"Total paragraph citations: {len(all_citations)}")

Total paragraph citations: 2


#### get neighborhood by reference_id

In [122]:
paragraph_citation = {
    'reference_id': 1
}
result = research_arcade.get_neighborhood('arxiv_paragraph_citation', paragraph_citation)
if result is not None:
    print(result.to_dict(orient="records"))

For arxiv_paragraph_citation, provide 'paragraph_id', 'paragraph_global_id', 'citing_arxiv_id', or 'cited_arxiv_id'.


### arxiv_paragraph_figure

#### construct table from api
Notice:  paragraph_references, paragraphs, figures, sections tables need to be updated first

In [123]:
config = {"arxiv_ids": ["1706.03762v7"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraph_figures", config)

arxiv_id
1706.03762v7
Empty DataFrame
Columns: [id, paragraph_id, paper_section, paper_arxiv_id, reference_label, reference_type]
Index: []


#### construct table from csv

In [124]:
config = {"csv_file": "/path/to/arxiv_paragraph_figures.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraph_figure", config)

Error: CSV file /path/to/arxiv_paragraph_figures.csv does not exist.


#### construct table from json

In [125]:
config = {"json_file": "/path/to/arxiv_paragraph_figures.json"}
research_arcade.construct_table_from_json("arxiv_paragraph_figure", config)

Error: JSON file /path/to/arxiv_paragraph_figures.json does not exist.


#### insert edge

In [126]:
paragraph_figure = {
    'paragraph_id': 1,      # 全局段落 ID
    'figure_id': 1,         # 图片 ID
    'paper_arxiv_id': '1706.03762v7',
    'paper_section_id': 1   # 章节 ID
}
new_id = research_arcade.insert_edge("arxiv_paragraph_figure", edge_features=paragraph_figure)
print(f"Inserted paragraph-figure relationship with id: {new_id}")


Inserted paragraph-figure relationship with id: 1


#### delete by paragraph_id

In [127]:
paragraph_id = {"paragraph_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_figure", paragraph_id)
print(f"Deleted {result} relationships")

Deleted 1 relationships


#### delete by figure_id

In [128]:
figure_id = {"figure_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_figure", figure_id)
print(f"Deleted: {result}")

Deleted: False


#### delete by both paragraph_id and figure_id

In [129]:
paragraph_figure_id = {
    "paragraph_id": 1,
    "figure_id": 1
}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_figure", paragraph_figure_id)
print(f"Deleted {result} relationships")

Deleted 0 relationships


#### get all edges

In [130]:
all_edges = research_arcade.get_all_edge_features("arxiv_paragraph_figure")
if all_edges is not None:
    print(f"Total paragraph-figure relationships: {len(all_edges)}")

#### get figures for a paragraph

In [131]:
paragraph_id = {"paragraph_id": 1}
figures = research_arcade.get_neighborhood("arxiv_paragraph_figure", paragraph_id)
if figures is not None:
    print("Figures referenced by paragraph:")
    print(figures.to_dict(orient="records"))

#### get paragraphs for a figure

In [132]:
figure_id = {"figure_id": 1}
paragraphs = research_arcade.get_neighborhood("arxiv_paragraph_figure", figure_id)
if paragraphs is not None:
    print("Paragraphs that reference this figure:")
    print(paragraphs.to_dict(orient="records"))

### arxiv_paragraph_table

#### construct table from api
Notice:  paragraph_references, paragraphs, figures, sections tables need to be updated first

In [133]:
config = {"arxiv_ids": ["1706.03762v7"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraph_tables", config)

#### construct table from csv

In [134]:
config = {"csv_file": "/path/to/arxiv_paragraph_tables.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraph_table", config)

Error: CSV file /path/to/arxiv_paragraph_tables.csv does not exist.


#### construct table from json

In [135]:
config = {"json_file": "/path/to/arxiv_paragraph_tables.json"}
research_arcade.construct_table_from_json("arxiv_paragraph_table", config)

Error: JSON file /path/to/arxiv_paragraph_tables.json does not exist.


#### insert edge

In [136]:
paragraph_table = {
    'paragraph_id': 1,     
    'table_id': 1,         
    'paper_arxiv_id': '1706.03762v7',
    'paper_section_id': 1   
}
new_id = research_arcade.insert_edge("arxiv_paragraph_table", edge_features=paragraph_table)
print(f"Inserted paragraph-table relationship with id: {new_id}")

Inserted paragraph-table relationship with id: 1


#### delete by paragraph_id

In [137]:
paragraph_id = {"paragraph_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_table", paragraph_id)
print(f"Deleted {result} relationships")

Deleted 1 relationships


#### delete by table_id

In [138]:
table_id = {"table_id": 1}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_table", table_id)
print(f"Deleted: {result}")

Deleted: False


#### delete by both paragraph_id and table_id

In [139]:
paragraph_table_id = {
    "paragraph_id": 1,
    "table_id": 1
}
result = research_arcade.delete_edge_by_id("arxiv_paragraph_table", paragraph_table_id)
print(f"Deleted {result} relationships")

Deleted 0 relationships


#### get all edges

In [140]:
all_edges = research_arcade.get_all_edge_features("arxiv_paragraph_table")
if all_edges is not None:
    print(f"Total paragraph-table relationships: {len(all_edges)}")

#### get tables for a paragraph

In [141]:
paragraph_id = {"paragraph_id": 1}
tables = research_arcade.get_neighborhood("arxiv_paragraph_table", paragraph_id)
if tables is not None:
    print("Tables referenced by paragraph:")
    print(tables.to_dict(orient="records"))

#### get paragraphs for a table

In [142]:
table_id = {"table_id": 1}
paragraphs = research_arcade.get_neighborhood("arxiv_paragraph_table", table_id)
if paragraphs is not None:
    print("Paragraphs that reference this table:")
    print(paragraphs.to_dict(orient="records"))

## BatchProcessing

### batch_openreview_conference

In [143]:
config = {"venue": "ICLR.cc/2025/Conference"}
research_arcade.construct_tables_from_venue(config)

Crawling openreview arxiv data for venue: {'venue': 'ICLR.cc/2025/Conference'}...


0it [00:00, ?it/s]


No new openreview arxiv data to insert.
Crawling author data from OpenReview API...


0it [00:00, ?it/s]
0it [00:00, ?it/s]


No new author data to insert.
Crawling paper data from OpenReview API...


0it [00:00, ?it/s]


No new paper data to insert.
Crawling review data from OpenReview API...


0it [00:00, ?it/s]

No new review data to insert.





TypeError: CSVOpenReviewParagraphs.construct_paragraphs_table_from_api() missing 3 required positional arguments: 'pdf_dir', 'filter_list', and 'log_file'

### batch_arxiv_papers

In [145]:
# Example papers
arxiv_ids = ['1802.08773', '1806.02473', '2412.17767', '2507.10539', '2511.22036']


config = {
    'arxiv_ids': arxiv_ids,
    'dest_dir': os.getenv('PAPER_FOLDER_PATH')
}

research_arcade.construct_tables_from_arxiv_ids(config)


KeyboardInterrupt: 

## ContinuousCrawling

### arxiv_continuous_crawling

In [None]:
research_arcade.continuous_crawling(interval_days=2, delay_days=2, paper_category='All', dest_dir="./download", arxiv_id_dest="./data")