# ResearchArcade Complete Tutorial

This tutorial demonstrates how to work with the ResearchArcade database, covering all node types and edge relationships.

## Table of Contents
1. [Setup](#setup)
2. [OpenReview Data](#openreview)
3. [ArXiv Papers](#arxiv-papers)
4. [ArXiv Authors](#arxiv-authors)
5. [ArXiv Categories](#arxiv-categories)
6. [ArXiv Figures](#arxiv-figures)
7. [ArXiv Tables](#arxiv-tables)
8. [ArXiv Sections](#arxiv-sections)
9. [ArXiv Paragraphs](#arxiv-paragraphs)
10. [Relationships/Edges](#relationships)
11. [Advanced Queries](#advanced-queries)

## 1. Setup <a name="setup"></a>

In [1]:
import sys
from pathlib import Path
from tqdm import tqdm
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))
from research_arcade.research_arcade import ResearchArcade
import pandas as pd
from datetime import datetime

### Choose Database Backend

#### CSV Based

In [2]:
db_type = "csv"
config = {
    "csv_dir": "../data/my_research_arcade_data/"
}

research_arcade = ResearchArcade(db_type=db_type, config=config)

## 3. ArXiv Papers <a name="arxiv-papers"></a>

### Table Schema
- `id` (SERIAL PK)
- `arxiv_id` (VARCHAR, unique) - e.g., 1802.08773v3
- `base_arxiv_id` (VARCHAR) - e.g., 1802.08773
- `version` (INT) - e.g., 3
- `title` (TEXT)
- `abstract` (TEXT)
- `submit_date` (DATE)
- `metadata` (JSONB)

### Construct Table from API

In [3]:
config = {"arxiv_ids": ["1806.08804v4", "1903.03894v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_papers", config)

#### Construct Table from CSV

In [4]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_papers_example.csv"}
research_arcade.construct_table_from_csv("arxiv_papers", config)

Successfully imported 1 papers from ./examples/csv_data/csv_arxiv_papers_example.csv


#### Construct Table from JSON

In [5]:
config = {"json_file": "./examples/json_data/json_arxiv_papers_example.json"}
research_arcade.construct_table_from_json("arxiv_papers", config)

No new papers to import (all papers already exist)


### Insert a Paper

In [6]:
# Example 1: Insert the famous "Attention is All You Need" paper
new_paper = {
    'arxiv_id': '1706.03762v7',
    'base_arxiv_id': '1706.03762',
    'version': 7,
    'title': 'Attention Is All You Need',
    'abstract': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.',
    'submit_date': '2017-06-12',
    'metadata': {'venue': 'NeurIPS 2017', 'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf'}
}

research_arcade.insert_node("arxiv_papers", node_features=new_paper)
print("Paper inserted successfully!")

Paper inserted successfully!


In [7]:
# Example 2: Insert BERT paper
bert_paper = {
    'arxiv_id': '1810.04805v2',
    'base_arxiv_id': '1810.04805',
    'version': 2,
    'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding',
    'abstract': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.',
    'submit_date': '2018-10-11',
    'metadata': {'venue': 'NAACL 2019', 'citations': 50000}
}

research_arcade.insert_node("arxiv_papers", node_features=bert_paper)
print("BERT paper inserted successfully!")

BERT paper inserted successfully!


### Get All Papers

In [8]:
arxiv_papers_df = research_arcade.get_all_node_features("arxiv_papers")
print(f"Total papers in database: {len(arxiv_papers_df)}")
print("\nFirst 5 papers:")
print(arxiv_papers_df.head())

Total papers in database: 7

First 5 papers:
   id      arxiv_id  base_arxiv_id  version  \
0   2  1810.04805v2     1810.04805        2   
1   3   1409.0473v7     1409.04730        7   
2   4  1512.03385v1     1512.03385        1   
3   5  2010.11929v2     2010.11929        2   
4   6  1806.08804v4     1806.08804        4   

                                               title  \
0  BERT: Pre-training of Deep Bidirectional Trans...   
1  Neural Machine Translation by Jointly Learning...   
2       Deep Residual Learning for Image Recognition   
3  An Image is Worth 16x16 Words: Transformers fo...   
4  Hierarchical Graph Representation Learning wit...   

                                            abstract  \
0  We introduce a new language representation mod...   
1  Neural machine translation is a recently propo...   
2  Deeper neural networks are more difficult to t...   
3  We show that a pure transformer applied direct...   
4  Recently, graph neural networks (GNNs) have re...   

### Get Specific Paper by ID

In [9]:
paper_id = {"arxiv_id": "1810.04805v2"}
paper_features = research_arcade.get_node_features_by_id("arxiv_papers", paper_id)
print("Paper details:")
print(paper_features.to_dict(orient="records")[0])

Paper details:
{'id': 2, 'arxiv_id': '1810.04805v2', 'base_arxiv_id': 1810.04805, 'version': 2, 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'abstract': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.', 'submit_date': '2018-10-11', 'metadata': '{"venue": "NAACL 2019", "citations": 60000}'}


### Update a Paper

In [10]:
# Update metadata for a paper
updated_paper = {
    'arxiv_id': '1706.03762v7',
    'metadata': {
        'venue': 'NeurIPS 2017',
        'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf',
        'citations': 75000,
        'influential': True
    }
}

research_arcade.update_node("arxiv_papers", node_features=updated_paper)
print("Paper updated successfully!")

Paper updated successfully!


### Delete a Paper

In [11]:
# Delete a paper by ID
paper_id = {"arxiv_id": "1706.03762v7"}
deleted_paper = research_arcade.delete_node_by_id("arxiv_papers", paper_id)
print("Deleted paper:")
print(deleted_paper)

Deleted paper:
True


## 4. ArXiv Authors <a name="arxiv-authors"></a>

### Table Schema
- `id` (SERIAL PK)
- `semantic_scholar_id` (VARCHAR, unique)
- `name` (VARCHAR)
- `homepage` (VARCHAR)

### Construct Table from API

In [12]:
# config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
# research_arcade.construct_table_from_api("arxiv_authors", config)

#### Construct Table from CSV

In [13]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_authors_example.csv"}
research_arcade.construct_table_from_csv("arxiv_authors", config)

Successfully imported 10 authors from ./examples/csv_data/csv_arxiv_authors_example.csv


#### Construct Table from JSON

In [14]:
config = {"json_file": "./examples/json_data/json_arxiv_authors_example.json"}
research_arcade.construct_table_from_json("arxiv_authors", config)

No new authors to import (all authors already exist)


### Insert Authors

In [15]:
# Insert authors from the Transformer paper
authors = [
    {
        'semantic_scholar_id': 'ss_ashish_vaswani',
        'name': 'Ashish Vaswani',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_noam_shazeer',
        'name': 'Noam Shazeer',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_niki_parmar',
        'name': 'Niki Parmar',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_jakob_uszkoreit',
        'name': 'Jakob Uszkoreit',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_llion_jones',
        'name': 'Llion Jones',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    }
]

for author in authors:
    research_arcade.insert_node("arxiv_authors", node_features=author)
    print(f"Inserted author: {author['name']}")

Inserted author: Ashish Vaswani
Inserted author: Noam Shazeer
Inserted author: Niki Parmar
Inserted author: Jakob Uszkoreit
Inserted author: Llion Jones


### Get All Authors

In [16]:
authors_df = research_arcade.get_all_node_features("arxiv_authors")
print(f"Total authors in database: {len(authors_df)}")
print("\nAll authors:")
print(authors_df)

Total authors in database: 55

All authors:
    id semantic_scholar_id              name  \
0    1             1234567    Ashish Vaswani   
1    2             2345678      Noam Shazeer   
2    3             3456789       Niki Parmar   
3    4             4567890   Jakob Uszkoreit   
4    5             5678901       Llion Jones   
5    6             6789012    Aidan N. Gomez   
6    7             7890123     Lukasz Kaiser   
7    8             8901234  Illia Polosukhin   
8    9             9012345      Jacob Devlin   
9   10             1234098    Ming-Wei Chang   
10  11             1234567    Ashish Vaswani   
11  12             2345678      Noam Shazeer   
12  13             3456789       Niki Parmar   
13  14             4567890   Jakob Uszkoreit   
14  15             5678901       Llion Jones   
15  16             6789012    Aidan N. Gomez   
16  17             7890123     Lukasz Kaiser   
17  18             8901234  Illia Polosukhin   
18  19             9012345      Jacob Devlin

### Get Specific Author by ID

In [17]:
author_id = {"semantic_scholar_id": 8901234}
author_features = research_arcade.get_node_features_by_id("arxiv_authors", author_id)
print("Author details:")
print(author_features)

Author details:
None


### Update an Author

In [18]:
updated_author = {
    'semantic_scholar_id': 'ss_ashish_vaswani',
    'homepage': 'https://ashishvaswani.com'
}

research_arcade.update_node("arxiv_authors", node_features=updated_author)
print("Author updated successfully!")

Author updated successfully!


## 5. ArXiv Categories <a name="arxiv-categories"></a>

### Table Schema
- `id` (SERIAL PK)
- `name` (VARCHAR, unique)
- `description` (TEXT)

### Insert From API

In [19]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_categories", config)

{'id': '1903.03894v4', 'title': 'GNNExplainer: Generating Explanations for Graph Neural Networks', 'abstract': "Graph Neural Networks (GNNs) are a powerful tool for machine learning on\ngraphs.GNNs combine node feature information with the graph structure by\nrecursively passing neural messages along edges of the input graph. However,\nincorporating both graph structure and feature information leads to complex\nmodels, and explaining predictions made by GNNs remains unsolved. Here we\npropose GNNExplainer, the first general, model-agnostic approach for providing\ninterpretable explanations for predictions of any GNN-based model on any\ngraph-based machine learning task. Given an instance, GNNExplainer identifies a\ncompact subgraph structure and a small subset of node features that have a\ncrucial role in GNN's prediction. Further, GNNExplainer can generate consistent\nand concise explanations for an entire class of instances. We formulate\nGNNExplainer as an optimization task that max

#### Construct Table from CSV

In [20]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_categories_example.csv"}
research_arcade.construct_table_from_csv("arxiv_categories", config)

No new categories to import (all categories already exist)


#### Construct Table from JSON

In [21]:
config = {"json_file": "./examples/json_data/json_arxiv_categories_example.json"}
research_arcade.construct_table_from_json("arxiv_categories", config)

No new categories to import (all categories already exist)


### Insert Categories

In [22]:
categories = [
    {
        'name': 'cs.CL',
        'description': 'Computation and Language (Natural Language Processing)'
    },
    {
        'name': 'cs.LG',
        'description': 'Machine Learning'
    },
    {
        'name': 'cs.AI',
        'description': 'Artificial Intelligence'
    },
    {
        'name': 'cs.CV',
        'description': 'Computer Vision and Pattern Recognition'
    },
    {
        'name': 'stat.ML',
        'description': 'Machine Learning (Statistics)'
    }
]

for category in categories:
    research_arcade.insert_node("arxiv_categories", node_features=category)
    print(f"Inserted category: {category['name']}")

Inserted category: cs.CL
Inserted category: cs.LG
Inserted category: cs.AI
Inserted category: cs.CV
Inserted category: stat.ML


### Get All Categories

In [23]:
categories_df = research_arcade.get_all_node_features("arxiv_categories")
print(f"Total categories: {len(categories_df)}")
print("\nAll categories:")
print(categories_df)

Total categories: 11

All categories:
    id     name                              description
0    1    cs.LG                                      NaN
1    2  stat.ML                                      NaN
2    3    cs.NE                                      NaN
3    4    cs.SI                                      NaN
4    5    cs.AI                  Artificial Intelligence
5    7    cs.CL                 Computation and Language
6    8    cs.CV  Computer Vision and Pattern Recognition
7   11    cs.CR                Cryptography and Security
8   12    cs.DS           Data Structures and Algorithms
9   13    cs.IT                       Information Theory
10  14  math.IT                Information Theory (Math)


## 6. ArXiv Figures <a name="arxiv-figures"></a>

### Table Schema
- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `name` (TEXT)

### Insert Figures

In [24]:
# Insert figures for the Transformer paper
figures = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/transformer_architecture.png',
        'caption': 'The Transformer model architecture. The left side shows the encoder stack and the right side shows the decoder stack.',
        'label': 'fig:architecture',
        'name': 'Figure 1'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/scaled_dot_product_attention.png',
        'caption': 'Scaled Dot-Product Attention and Multi-Head Attention mechanisms.',
        'label': 'fig:attention',
        'name': 'Figure 2'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/positional_encoding.png',
        'caption': 'Positional encoding visualization showing sine and cosine functions of different frequencies.',
        'label': 'fig:positional',
        'name': 'Figure 3'
    }
]

for figure in figures:
    research_arcade.insert_node("arxiv_figures", node_features=figure)
    print(f"Inserted {figure['name']}")

Inserted Figure 1
Inserted Figure 2
Inserted Figure 3


### Get All Figures

In [25]:
figures_df = research_arcade.get_all_node_features("arxiv_figures")
print(f"Total figures: {len(figures_df)}")
print("\nAll figures:")
print(figures_df[['name', 'caption', 'label']])

Total figures: 11

All figures:
        name                                            caption  \
0   Figure 1  The Transformer model architecture. The left s...   
1   Figure 2  Scaled Dot-Product Attention and Multi-Head At...   
2   Figure 3  Positional encoding visualization showing sine...   
3    figure1                 The Transformer model architecture   
4    figure2       Multi-head attention mechanism visualization   
5    figure3         Variations on the Transformer architecture   
6    figure4     BERT model architecture and pre-training tasks   
7    figure5               Fine-tuning BERT for different tasks   
8    figure6                Residual learning: a building block   
9    figure7                  ResNet architectures for ImageNet   
10   figure8            Vision Transformer (ViT) model overview   

               label  
0   fig:architecture  
1      fig:attention  
2     fig:positional  
3   fig:architecture  
4      fig:attention  
5     fig:variations  
6 

## 7. ArXiv Tables <a name="arxiv-tables"></a>

### Table Schema
- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `table_text` (TEXT)

### Insert From API

In [26]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_tables", config)

seed: ['1903.03894v4']
BFS_que.qsize(): 1
current paper: 1903.03894v4
Thread 13079867392 Processing 1903.03894v4


x 000abstract.tex
x 010intro.tex
x 020related.tex
x 030background.tex
x 030formulation.tex
x 030proposed.tex
x 040experiments.tex
x 050conclusion.tex
x 060supplement.tex
x acmart.bib
x acmart.cls
x acmart.dtx
x acmart.ins
x ACM-Reference-Format.bbx
x ACM-Reference-Format.bst
x ACM-Reference-Format.cbx
x ACM-Reference-Format.dbx
x figs/
x figs/explainer-introduction_v2.pdf
x figs/explainer-motivation.pdf
x figs/explainer.pdf
x figs/feature_importance_v2.pdf
x figs/fig3-graph-cls-v2.pdf
x figs/fig3-graph-cls.pdf
x figs/fig3-node-cls-v3.pdf
x figs/fig3-node-cls.pdf
x figs/fig3-v4.pdf
x figs/fig3-v5.pdf
x figs/including-node-features.pdf
x figs/local_subgraph.png
x figs/motivation-node-features.pdf
x figs/prototype.png
x figs/prototype1.png
x figs/single-instance-explanation-final.pdf
x figs/single-instance-explanation2.pdf
x figs/single-instance-explanations.pdf: truncated gzip input
tar: Error exit delayed from previous errors.


Thread 13079867392 Finished processing 1903.03894v4 (1/999999999) Time elapsed: 0.70s
'NoneType' object is not subscriptable
Thread 13079867392 Failed to process 1903.03894v4
Thread 8614781504 Finished processing 1 papers
Error: The file at path './download/output/1903.03894v4.json' was not found.


#### Construct Table from CSV

In [27]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_tables_example.csv"}
research_arcade.construct_table_from_csv("arxiv_tables", config)

Successfully imported 6 tables from ./examples/csv_data/csv_arxiv_tables_example.csv


### Insert Categories

In [28]:
categories = [
    {
        'name': 'cs.CL',
        'description': 'Computation and Language (Natural Language Processing)'
    },
    {
        'name': 'cs.LG',
        'description': 'Machine Learning'
    },
    {
        'name': 'cs.AI',
        'description': 'Artificial Intelligence'
    },
    {
        'name': 'cs.CV',
        'description': 'Computer Vision and Pattern Recognition'
    },
    {
        'name': 'stat.ML',
        'description': 'Machine Learning (Statistics)'
    }
]

for category in categories:
    research_arcade.insert_node("arxiv_categories", node_features=category)
    print(f"Inserted category: {category['name']}")

Inserted category: cs.CL
Inserted category: cs.LG
Inserted category: cs.AI
Inserted category: cs.CV
Inserted category: stat.ML


### Get All Categories

In [29]:
categories_df = research_arcade.get_all_node_features("arxiv_categories")
print(f"Total categories: {len(categories_df)}")
print("\nAll categories:")
print(categories_df)

Total categories: 11

All categories:
    id     name                              description
0    1    cs.LG                                      NaN
1    2  stat.ML                                      NaN
2    3    cs.NE                                      NaN
3    4    cs.SI                                      NaN
4    5    cs.AI                  Artificial Intelligence
5    7    cs.CL                 Computation and Language
6    8    cs.CV  Computer Vision and Pattern Recognition
7   11    cs.CR                Cryptography and Security
8   12    cs.DS           Data Structures and Algorithms
9   13    cs.IT                       Information Theory
10  14  math.IT                Information Theory (Math)


## 6. ArXiv Figures <a name="arxiv-figures"></a>

### Table Schema
- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `name` (TEXT)

### Insert From API

In [30]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_figures", config)

seed: ['1903.03894v4']
BFS_que.qsize(): 1
current paper: 1903.03894v4
Thread 13079867392 Processing 1903.03894v4


x 000abstract.tex
x 010intro.tex
x 020related.tex
x 030background.tex
x 030formulation.tex
x 030proposed.tex
x 040experiments.tex
x 050conclusion.tex
x 060supplement.tex
x acmart.bib
x acmart.cls
x acmart.dtx
x acmart.ins
x ACM-Reference-Format.bbx
x ACM-Reference-Format.bst
x ACM-Reference-Format.cbx
x ACM-Reference-Format.dbx
x figs/
x figs/explainer-introduction_v2.pdf
x figs/explainer-motivation.pdf
x figs/explainer.pdf
x figs/feature_importance_v2.pdf
x figs/fig3-graph-cls-v2.pdf
x figs/fig3-graph-cls.pdf
x figs/fig3-node-cls-v3.pdf
x figs/fig3-node-cls.pdf
x figs/fig3-v4.pdf
x figs/fig3-v5.pdf
x figs/including-node-features.pdf
x figs/local_subgraph.png
x figs/motivation-node-features.pdf
x figs/prototype.png
x figs/prototype1.png
x figs/single-instance-explanation-final.pdf
x figs/single-instance-explanation2.pdf
x figs/single-instance-explanations.pdf: truncated gzip input
tar: Error exit delayed from previous errors.


Thread 13079867392 Finished processing 1903.03894v4 (1/999999999) Time elapsed: 0.71s
'NoneType' object is not subscriptable
Thread 13079867392 Failed to process 1903.03894v4
Thread 8614781504 Finished processing 1 papers
Error: The file with path './download/output/1903.03894v4.json' was not found.


#### Construct Table from CSV

In [31]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_figures_example.csv"}
research_arcade.construct_table_from_csv("arxiv_figures", config)

No new figures to import


#### Construct Table from JSON

In [32]:
config = {"json_file": "./examples/json_data/json_arxiv_figures_example.json"}
research_arcade.construct_table_from_json("arxiv_figures", config)

No new figures to import


### Insert Tables

In [33]:
# Insert tables for the Transformer paper
tables = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/model_variations.tex',
        'caption': 'Variations on the Transformer architecture with different hyperparameters.',
        'label': 'tab:variations',
        'table_text': 'Model | N | d_model | d_ff | h | d_k | d_v | P_drop | train time\nbase | 6 | 512 | 2048 | 8 | 64 | 64 | 0.1 | 12 hrs'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/wmt_results.tex',
        'caption': 'Performance of the Transformer on WMT 2014 English-German and English-French translation tasks.',
        'label': 'tab:wmt',
        'table_text': 'Model | EN-DE BLEU | EN-FR BLEU\nTransformer (base) | 27.3 | 38.1\nTransformer (big) | 28.4 | 41.8'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/parsing_results.tex',
        'caption': 'English constituency parsing results on WSJ test set.',
        'label': 'tab:parsing',
        'table_text': 'Model | WSJ 23 F1\nTransformer | 91.3'
    }
]

for table in tables:
    research_arcade.insert_node("arxiv_tables", node_features=table)
    print(f"Inserted table: {table['label']}")

Inserted table: tab:variations
Inserted table: tab:wmt
Inserted table: tab:parsing


### Get All Tables

In [34]:
tables_df = research_arcade.get_all_node_features("arxiv_tables")
print(f"Total tables: {len(tables_df)}")
print("\nAll tables:")
print(tables_df[['label', 'caption']])

Total tables: 34

All tables:
                   label                                            caption
0        tab:wmt_results    Machine translation performance on WMT datasets
1         tab:variations         Variations on the Transformer architecture
2               tab:glue           BERT performance on GLUE benchmark tasks
3              tab:squad                Results on SQuAD question answering
4           tab:imagenet    Classification error on ImageNet validation set
5           tab:vit_perf          Vision Transformer performance comparison
6        tab:wmt_results    Machine translation performance on WMT datasets
7         tab:variations         Variations on the Transformer architecture
8               tab:glue           BERT performance on GLUE benchmark tasks
9              tab:squad                Results on SQuAD question answering
10          tab:imagenet    Classification error on ImageNet validation set
11          tab:vit_perf          Vision Transformer perfo

## 8. ArXiv Sections <a name="arxiv-sections"></a>

### Table Schema
- `id` (SERIAL PK)
- `content` (TEXT)
- `title` (TEXT)
- `appendix` (BOOLEAN)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `section_in_paper_id` (INT)

### Insert From API

In [35]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_sections", config)

seed: ['1903.03894v4']
BFS_que.qsize(): 1
current paper: 1903.03894v4
Thread 13079867392 Processing 1903.03894v4


x 000abstract.tex
x 010intro.tex
x 020related.tex
x 030background.tex
x 030formulation.tex
x 030proposed.tex
x 040experiments.tex
x 050conclusion.tex
x 060supplement.tex
x acmart.bib
x acmart.cls
x acmart.dtx
x acmart.ins
x ACM-Reference-Format.bbx
x ACM-Reference-Format.bst
x ACM-Reference-Format.cbx
x ACM-Reference-Format.dbx
x figs/
x figs/explainer-introduction_v2.pdf
x figs/explainer-motivation.pdf
x figs/explainer.pdf
x figs/feature_importance_v2.pdf
x figs/fig3-graph-cls-v2.pdf
x figs/fig3-graph-cls.pdf
x figs/fig3-node-cls-v3.pdf
x figs/fig3-node-cls.pdf
x figs/fig3-v4.pdf
x figs/fig3-v5.pdf
x figs/including-node-features.pdf
x figs/local_subgraph.png
x figs/motivation-node-features.pdf
x figs/prototype.png
x figs/prototype1.png
x figs/single-instance-explanation-final.pdf
x figs/single-instance-explanation2.pdf
x figs/single-instance-explanations.pdf: truncated gzip input
tar: Error exit delayed from previous errors.


Thread 13079867392 Finished processing 1903.03894v4 (1/999999999) Time elapsed: 0.65s
'NoneType' object is not subscriptable
Thread 13079867392 Failed to process 1903.03894v4
Thread 8614781504 Finished processing 1 papers
Error: The file at path './download/output/1903.03894v4.json' was not found.


#### Construct Table from CSV

In [36]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_sections_example.csv"}
research_arcade.construct_table_from_csv("arxiv_sections", config)

Successfully imported 8 sections from ./examples/csv_data/csv_arxiv_sections_example.csv


#### Construct Table from JSON

In [37]:
config = {"json_file": "./examples/json_data/json_arxiv_sections_example.json"}
research_arcade.construct_table_from_json("arxiv_sections", config)

Successfully imported 8 sections from ./examples/json_data/json_arxiv_sections_example.json


### Insert Sections

In [38]:
# Insert sections for the Transformer paper
sections = [
    {
        'content': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...',
        'title': 'Introduction',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 1
    },
    {
        'content': 'Most competitive neural sequence transduction models have an encoder-decoder structure. Here, the encoder maps an input sequence of symbol representations...',
        'title': 'Background',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 2
    },
    {
        'content': 'Most neural sequence transduction models have an encoder-decoder structure. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers...',
        'title': 'Model Architecture',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 3
    },
    {
        'content': 'In this section we describe the training regime for our models...',
        'title': 'Training',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 4
    },
    {
        'content': 'On the WMT 2014 English-to-German translation task, the big transformer model outperforms the best previously reported models...',
        'title': 'Results',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 5
    },
    {
        'content': 'In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers...',
        'title': 'Conclusion',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 6
    }
]

for section in sections:
    research_arcade.insert_node("arxiv_sections", node_features=section)
    print(f"Inserted section: {section['title']}")

Inserted section: Introduction
Inserted section: Background
Inserted section: Model Architecture
Inserted section: Training
Inserted section: Results
Inserted section: Conclusion


### Get All Sections

In [39]:
sections_df = research_arcade.get_all_node_features("arxiv_sections")
print(f"Total sections: {sections_df}")
print("\nAll sections:")
print(sections_df[['title', 'section_in_paper_id', 'appendix']])

Total sections:     id                                            content               title  \
0    1  The dominant sequence transduction models are ...        Introduction   
1    2  The goal of reducing sequential computation al...          Background   
2    3  The Transformer follows this overall architect...  Model Architecture   
3    4  We trained on the standard WMT 2014 English-Ge...            Training   
4    5  In this work we presented the Transformer, the...          Conclusion   
5    6  We introduce a new language representation mod...        Introduction   
6    7  Unlike recent language representation models, ...        Related Work   
7    8  BERT uses a multi-layer bidirectional Transfor...  Model Architecture   
8    9  The dominant sequence transduction models are ...        Introduction   
9   10  The goal of reducing sequential computation al...          Background   
10  11  The Transformer follows this overall architect...  Model Architecture   
11  12  We t

## 9. ArXiv Paragraphs <a name="arxiv-paragraphs"></a>

### Table Schema
- `id` (SERIAL PK)
- `paragraph_id` (INT)
- `content` (TEXT)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `paper_section` (TEXT)
- `section_id` (INT)
- `paragraph_in_paper_id` (INT)

### Insert From API

In [40]:
config = {"arxiv_ids": ["1903.03894v4", "1806.08804v4"], "dest_dir": "./download"}
research_arcade.construct_table_from_api("arxiv_paragraphs", config)

seed: ['1903.03894v4']
BFS_que.qsize(): 1
current paper: 1903.03894v4
Thread 13079867392 Processing 1903.03894v4


x 000abstract.tex
x 010intro.tex
x 020related.tex
x 030background.tex
x 030formulation.tex
x 030proposed.tex
x 040experiments.tex
x 050conclusion.tex
x 060supplement.tex
x acmart.bib
x acmart.cls
x acmart.dtx
x acmart.ins
x ACM-Reference-Format.bbx
x ACM-Reference-Format.bst
x ACM-Reference-Format.cbx
x ACM-Reference-Format.dbx
x figs/
x figs/explainer-introduction_v2.pdf
x figs/explainer-motivation.pdf
x figs/explainer.pdf
x figs/feature_importance_v2.pdf
x figs/fig3-graph-cls-v2.pdf
x figs/fig3-graph-cls.pdf
x figs/fig3-node-cls-v3.pdf
x figs/fig3-node-cls.pdf
x figs/fig3-v4.pdf
x figs/fig3-v5.pdf
x figs/including-node-features.pdf
x figs/local_subgraph.png
x figs/motivation-node-features.pdf
x figs/prototype.png
x figs/prototype1.png
x figs/single-instance-explanation-final.pdf
x figs/single-instance-explanation2.pdf
x figs/single-instance-explanations.pdf: truncated gzip input
tar: Error exit delayed from previous errors.


Thread 13079867392 Finished processing 1903.03894v4 (1/999999999) Time elapsed: 0.65s
'NoneType' object is not subscriptable
Thread 13079867392 Failed to process 1903.03894v4
Thread 8614781504 Finished processing 1 papers


100%|██████████| 2/2 [00:00<00:00, 922.94it/s]


Error loading ./download/output/1903.03894v4.json: [Errno 2] No such file or directory: './download/output/1903.03894v4.json'


100%|██████████| 2/2 [00:00<00:00, 281.94it/s]

Error loading ./download/output/1903.03894v4.json: [Errno 2] No such file or directory: './download/output/1903.03894v4.json'
1806.08804v4
Key to References: {'fig:assignment_vis': 'figures_3', 'tab:results': 'table_4', 'tab:results2': 'table_5'}
tab:results
tab:results2
Paper count:  1
Total nodes:  113
Total edges:  210
Paper nodes:  1
Figure nodes:  0
Table nodes:  2
Text nodes:  110
0





#### Construct Table from CSV

In [41]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paragraphs_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraphs", config)

No new paragraphs to import (all paragraphs already exist)


#### Construct Table from JSON

In [42]:
config = {"json_file": "./examples/json_data/json_arxiv_paragraphs_example.json"}
research_arcade.construct_table_from_json("arxiv_paragraphs", config)

No new paragraphs to import (all paragraphs already exist)


### Insert Paragraphs

In [43]:
# Insert paragraphs from the Introduction section
paragraphs = [
    {
        'paragraph_id': 1,
        'content': 'Recurrent neural networks, long short-term memory and gated recurrent neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 1
    },
    {
        'paragraph_id': 2,
        'content': 'Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures. Recurrent models typically factor computation along the symbol positions of the input and output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 2
    },
    {
        'paragraph_id': 3,
        'content': 'Aligning the positions to steps in computation time, they generate a sequence of hidden states h_t, as a function of the previous hidden state h_{t-1} and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 3
    },
    {
        'paragraph_id': 4,
        'content': 'Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 4
    },
    {
        'paragraph_id': 5,
        'content': 'In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 5
    }
]

for paragraph in paragraphs:
    research_arcade.insert_node("arxiv_paragraphs", node_features=paragraph)
    print(f"Inserted paragraph {paragraph['paragraph_id']} from {paragraph['paper_section']}")

Inserted paragraph 1 from Introduction
Inserted paragraph 2 from Introduction
Inserted paragraph 3 from Introduction
Inserted paragraph 4 from Introduction
Inserted paragraph 5 from Introduction


### Get All Paragraphs

In [44]:
paragraphs_df = research_arcade.get_all_node_features("arxiv_paragraphs")
print(f"Total paragraphs: {len(paragraphs_df)}")
print("\nFirst 3 paragraphs:")
print(paragraphs_df[['paragraph_id', 'paper_section', 'content']].head(3))

Total paragraphs: 123

First 3 paragraphs:
   paragraph_id paper_section  \
0             0  Introduction   
1             1  Introduction   
2             2  Introduction   

                                             content  
0  \label{sec:intro}\nIn recent years there has b...  
1  However, a major limitation of current GNN arc...  
2  Here we propose \name, a differentiable graph ...  


## 10. Relationships/Edges <a name="relationships"></a>

This section demonstrates how to create and manage relationships between different entities.

### 10.2 ArXiv Citations (arxiv_citation)

#### Insert Citation

In [45]:
citation = {
    'citing_arxiv_id': '1810.04805v2',
    'cited_arxiv_id': '1706.03762v7',
    'bib_title': 'attention is all you need',
    'bib_key': 'something',
    'citing_sections': 'citing_sections',
}
research_arcade.insert_edge("arxiv_citation", edge_features=citation)
print("Citation created!")

Citation created!


#### Construct Table from CSV

In [46]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_citation_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_citation", config)

No new citations to import (all citations already exist)


#### Construct Table from JSON

In [47]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_citation_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_citation", config)

No new citations to import (all citations already exist)


#### Get All Citations

In [48]:
all_citations = research_arcade.get_all_edge_features("arxiv_citation")
print(f"Total citations: {len(all_citations)}")
print(all_citations.head())

Total citations: 8
   id citing_arxiv_id cited_arxiv_id  \
0   2    1706.03762v7    1409.0473v7   
1   3    1706.03762v7   1508.04025v5   
2   5    1810.04805v2   1802.05365v2   
3   6    2010.11929v2   1706.03762v7   
4   7    2010.11929v2   1810.04805v2   

                                           bib_title               bib_key  \
0  Neural Machine Translation by Jointly Learning...    bahdanau2014neural   
1  Effective Approaches to Attention-based Neural...    luong2015effective   
2           Deep contextualized word representations        peters2018deep   
3                          Attention Is All You Need  vaswani2017attention   
4  BERT: Pre-training of Deep Bidirectional Trans...        devlin2018bert   

                    citing_sections citing_paragraphs  
0  ["introduction", "related_work"]                []  
1                  ["related_work"]                []  
2                  ["related_work"]                []  
3         ["introduction", "model"]            

#### Get Cited Papers

In [49]:
citing_paper = {'citing_paper_id': '1810.04805v2'}
cited_papers = research_arcade.get_neighborhood("arxiv_citation", primary_key=citing_paper)
print("Papers cited:")
print(cited_papers)

Papers cited:
   id citing_arxiv_id cited_arxiv_id  \
2   5    1810.04805v2   1802.05365v2   
7  10    1810.04805v2   1706.03762v7   

                                  bib_title         bib_key  \
2  Deep contextualized word representations  peters2018deep   
7                 attention is all you need       something   

     citing_sections citing_paragraphs  
2   ["related_work"]                []  
7  "citing_sections"                []  


#### Get Citing Papers

In [50]:
cited_paper = {'cited_paper_id': '1706.03762v7'}
citing_papers = research_arcade.get_neighborhood("arxiv_citation", primary_key=cited_paper)
print("Papers that cite:")
print(citing_papers)

Papers that cite:
   id citing_arxiv_id cited_arxiv_id                  bib_title  \
3   6    2010.11929v2   1706.03762v7  Attention Is All You Need   
7  10    1810.04805v2   1706.03762v7  attention is all you need   

                bib_key            citing_sections citing_paragraphs  
3  vaswani2017attention  ["introduction", "model"]                []  
7             something          "citing_sections"                []  


#### Delete Citation

In [51]:
citation_id = {
    'citing_paper_id': '1810.04805v2',
    'cited_paper_id': '1706.03762v7'
}
research_arcade.delete_edge_by_id("arxiv_citation", primary_key=citation_id)
print("Citation deleted!")

Deleted citation: 1810.04805v2 -> 1706.03762v7
Citation deleted!


### 10.3 ArXiv Paper-Author (arxiv_paper_author)

#### Insert Paper-Author Relationships

In [52]:
paper_authors = [
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_ashish_vaswani', 'author_sequence': 1},
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_noam_shazeer', 'author_sequence': 2},
    {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_niki_parmar', 'author_sequence': 3}
]
for relation in paper_authors:
    research_arcade.insert_edge("arxiv_paper_author", edge_features=relation)
    print(f"Linked author {relation['author_id']} (position {relation['author_sequence']})")

Linked author ss_ashish_vaswani (position 1)
Linked author ss_noam_shazeer (position 2)
Linked author ss_niki_parmar (position 3)


#### Construct Table from CSV

In [53]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_author_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_author", config)

Successfully imported 16 paper-author relationships from ./examples/csv_data/csv_arxiv_paper_author_example.csv


#### Construct Table from JSON

In [54]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_author_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_author", config)

No new paper-author relationships to import


#### Get All Paper-Author Relationships

In [55]:
all_relations = research_arcade.get_all_edge_features("arxiv_paper_author")
print(f"Total relationships: {len(all_relations)}")
print(all_relations.head(10))

Total relationships: 35
  paper_arxiv_id        author_id  author_sequence
0   1706.03762v7  ss_noam_shazeer                2
1   1706.03762v7   ss_niki_parmar                3
2   1706.03762v7          1234567                1
3   1706.03762v7          2345678                2
4   1706.03762v7          3456789                3
5   1706.03762v7          4567890                4
6   1706.03762v7          5678901                5
7   1706.03762v7          6789012                6
8   1706.03762v7          7890123                7
9   1706.03762v7          8901234                8


#### Get Authors for a Paper

In [56]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
authors = research_arcade.get_neighborhood("arxiv_paper_author", primary_key=paper_id)
print("Authors:")
print(authors.sort_values('author_sequence'))

Authors:
   paper_arxiv_id          author_id  author_sequence
0    1706.03762v7            1234567                1
1    1706.03762v7  ss_ashish_vaswani                1
2    1706.03762v7            1234567                1
3    1706.03762v7    ss_noam_shazeer                2
4    1706.03762v7            2345678                2
5    1706.03762v7            2345678                2
6    1706.03762v7     ss_niki_parmar                3
7    1706.03762v7            3456789                3
8    1706.03762v7            3456789                3
10   1706.03762v7            4567890                4
9    1706.03762v7            4567890                4
11   1706.03762v7            5678901                5
12   1706.03762v7            5678901                5
13   1706.03762v7            6789012                6
14   1706.03762v7            6789012                6
15   1706.03762v7            7890123                7
16   1706.03762v7            7890123                7
17   1706.03762v7  

#### Get Papers by Author

In [57]:
author_id = {'author_id': 'ss_ashish_vaswani'}
papers = research_arcade.get_neighborhood("arxiv_paper_author", primary_key=author_id)
print("Papers by author:")
print(papers)

Papers by author:
  paper_arxiv_id          author_id  author_sequence
0   1706.03762v7  ss_ashish_vaswani                1


#### Delete Paper-Author Link

In [58]:
relation_id = {'paper_arxiv_id': '1706.03762v7', 'author_id': 'ss_ashish_vaswani'}
research_arcade.delete_edge_by_id("arxiv_paper_author", primary_key=relation_id)
print("Relationship deleted!")

Relationship deleted!


### 10.4 ArXiv Paper-Category (arxiv_paper_category)

#### Insert Paper-Category Relationships

In [59]:
paper_categories = [
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '1'},
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '1'},
    {'paper_arxiv_id': '1706.03762v7', 'category_id': '2'}
]
for relation in paper_categories:
    research_arcade.insert_edge("arxiv_paper_category", edge_features=relation)
    print(f"Linked {relation['category_id']}")

Linked 1
Linked 1
Linked 2


#### Construct Table from CSV

In [60]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_category_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_category", config)

Successfully imported 1 paper-category relationships from ./examples/csv_data/csv_arxiv_paper_category_example.csv


#### Construct Table from JSON

In [61]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_category_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_category", config)

No new paper-category relationships to import


#### Get All Paper-Category Relationships

In [62]:
all_relations = research_arcade.get_all_edge_features("arxiv_paper_category")
print(f"Total relationships: {len(all_relations)}")
print(all_relations.head())

Total relationships: 17
  paper_arxiv_id category_id
0   1706.03762v7           1
1   1706.03762v7           1
2   1706.03762v7           2
3   1706.03762v7       cs.CL
4   1706.03762v7       cs.LG


#### Get Categories for Paper

In [63]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
categories = research_arcade.get_neighborhood("arxiv_paper_category", primary_key=paper_id)
print("Categories:")
print(categories)

Categories:
  paper_arxiv_id category_id
0   1706.03762v7           1
1   1706.03762v7           1
2   1706.03762v7           2
3   1706.03762v7       cs.CL
4   1706.03762v7       cs.LG
5   1706.03762v7       cs.AI


#### Get Papers in Category

In [64]:
category_id = {'category_id': 'cs.LG'}
papers = research_arcade.get_neighborhood("arxiv_paper_category", primary_key=category_id)
print("Papers in category:")
print(papers)

Papers in category:
  paper_arxiv_id category_id
0   1706.03762v7       cs.LG
1   1810.04805v2       cs.LG
2    1409.0473v7       cs.LG
3   1512.03385v1       cs.LG
4   2010.11929v2       cs.LG


#### Delete Paper-Category Link

In [65]:
relation_id = {'paper_arxiv_id': '1706.03762v7', 'category_id': 'cs.AI'}
research_arcade.delete_edge_by_id("arxiv_paper_category", primary_key=relation_id)
print("Relationship deleted!")

Relationship deleted!


### 10.5 ArXiv Paper-Figure (arxiv_paper_figure)

#### Insert Paper-Figure Relationships

In [66]:
paper_figures = [
    {'paper_arxiv_id': '1706.03762v7', 'figure_id': 1},
    {'paper_arxiv_id': '1706.03762v7', 'figure_id': 2}
]
for relation in paper_figures:
    research_arcade.insert_edge("arxiv_paper_figure", edge_features=relation)
    print(f"Linked figure {relation['figure_id']})")

Linked figure 1)
Linked figure 2)


#### Construct Table from CSV

In [67]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_figure_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_figure", config)

No new paper-figure relationships to import


#### Construct Table from JSON

In [68]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_figure_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_figure", config)

No new paper-figure relationships to import


#### Get Figures for Paper

In [69]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
figures = research_arcade.get_neighborhood("arxiv_paper_figure", primary_key=paper_id)
print("Figures:")
print(figures)

Figures:
  paper_arxiv_id  figure_id
0   1706.03762v7          1
1   1706.03762v7          2
2   1706.03762v7          3


### 10.6 ArXiv Paper-Table (arxiv_paper_table)

#### Insert Paper-Table Relationships

In [70]:
paper_tables = [
    {'paper_arxiv_id': '1706.03762v7', 'table_id': 1},
    {'paper_arxiv_id': '1706.03762v7', 'table_id': 2}
]
for relation in paper_tables:
    research_arcade.insert_edge("arxiv_paper_table", edge_features=relation)
    print(f"Linked table {relation['table_id']}")

Linked table 1
Linked table 2


#### Construct Table from CSV

In [71]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paper_table_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paper_table", config)

No new paper-table relationships to import


#### Construct Table from JSON

In [72]:
config = {"json_file": "./examples/json_data/json_arxiv_paper_table_example.json"}
research_arcade.construct_table_from_json("arxiv_paper_table", config)

No new paper-table relationships to import


#### Get Tables for Paper

In [73]:
paper_id = {'paper_arxiv_id': '1706.03762v7'}
tables = research_arcade.get_neighborhood("arxiv_paper_table", primary_key=paper_id)
print("Tables:")


Tables:


### 10.7 ArXiv Paragraph-Reference (arxiv_paragraph_reference)

#### Insert Paragraph-Reference Relationships

In [74]:
paragraph_references = [
    {'paragraph_id': 1, 'paper_section': 'established approaches', 'paper_arxiv_id': '1706.03762v7', 'reference_label': "{something}", 'reference_type': 'figure'}
]

for relation in paragraph_references:
    research_arcade.insert_edge("arxiv_paragraph_reference", edge_features=relation)

#### Construct Table from CSV

In [75]:
config = {"csv_file": "./examples/csv_data/csv_arxiv_paragraph_reference_example.csv"}
research_arcade.construct_table_from_csv("arxiv_paragraph_reference", config)

Successfully imported 10 paragraph-reference relationships from ./examples/csv_data/csv_arxiv_paragraph_reference_example.csv


#### Construct Table from JSON

In [76]:
config = {"json_file": "./examples/json_data/json_arxiv_paragraph_reference_example.json"}
research_arcade.construct_table_from_json("arxiv_paragraph_reference", config)

Error: JSON file ./examples/json_data/json_arxiv_paragraph_reference_example.json does not exist.


#### Get References in Paragraph

In [77]:
paragraph_id = {'paragraph_id': 1}
references = research_arcade.get_neighborhood("arxiv_paragraph_reference", primary_key=paragraph_id)
print("References:")
print(references)

References:
   id  paragraph_id           paper_section paper_arxiv_id  \
0   1             1  established approaches   1706.03762v7   
1   3             1            introduction   1706.03762v7   
2   6             1              background   1706.03762v7   
3   8             1            introduction   1810.04805v2   
4  11             1                   model   1810.04805v2   
5  12             1  established approaches   1706.03762v7   
6  14             1            introduction   1706.03762v7   
7  17             1              background   1706.03762v7   
8  19             1            introduction   1810.04805v2   
9  22             1                   model   1810.04805v2   

      reference_label reference_type  
0         {something}         figure  
1  bahdanau2014neural       citation  
2       fig:attention         figure  
3       fig:bert_arch         figure  
4      peters2018deep       citation  
5         {something}         figure  
6  bahdanau2014neural       cita

## Conclusion

This tutorial has covered:

1. Setting up the ResearchArcade database connection
2. Working with OpenReview data
3. CRUD operations for all ArXiv entity types:
   - Papers
   - Authors
   - Categories
   - Figures
   - Tables
   - Sections
   - Paragraphs
4. Creating relationships between entities:
   - Authorship
   - Citations
   - Paper-Category links
   - Paper-Figure/Table links
   - Paragraph-level references

For more information, refer to the ResearchArcade documentation.