# ResearchArcade Complete Tutorial

This tutorial demonstrates how to work with the ResearchArcade database, covering all node types and edge relationships.

## Table of Contents
1. [Setup](#setup)
2. [OpenReview Data](#openreview)
3. [ArXiv Papers](#arxiv-papers)
4. [ArXiv Authors](#arxiv-authors)
5. [ArXiv Categories](#arxiv-categories)
6. [ArXiv Figures](#arxiv-figures)
7. [ArXiv Tables](#arxiv-tables)
8. [ArXiv Sections](#arxiv-sections)
9. [ArXiv Paragraphs](#arxiv-paragraphs)
10. [Relationships/Edges](#relationships)
11. [Advanced Queries](#advanced-queries)

## 1. Setup <a name="setup"></a>

In [1]:
import sys
from pathlib import Path
from tqdm import tqdm
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))
from research_arcade.research_arcade import ResearchArcade
import pandas as pd
from datetime import datetime

### Choose Database Backend

#### CSV Based

In [5]:
db_type = "csv"
config = {
    "csv_dir": "../data/my_research_arcade_data/"
}

research_arcade = ResearchArcade(db_type=db_type, config=config)

#### SQL Based (PostgreSQL)

In [3]:
db_type = "sql"
config = {
    "host": "localhost",
    "dbname": "iclr_openreview_database",
    "user": "jingjunx",
    "password": "",
    "port": "5432"
}

research_arcade = ResearchArcade(db_type=db_type, config=config)

OperationalError: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied


## 3. ArXiv Papers <a name="arxiv-papers"></a>

### Table Schema
- `id` (SERIAL PK)
- `arxiv_id` (VARCHAR, unique) - e.g., 1802.08773v3
- `base_arxiv_id` (VARCHAR) - e.g., 1802.08773
- `version` (INT) - e.g., 3
- `title` (TEXT)
- `abstract` (TEXT)
- `submit_date` (DATE)
- `metadata` (JSONB)

Construct Table from API

In [None]:
config = {"start time": "", "end time": ""} # TO BE FILLED IN
research_arcade.construct_table_from_api("papers", config)

### Insert a Paper

In [6]:
# Example 1: Insert the famous "Attention is All You Need" paper
new_paper = {
    'arxiv_id': '1706.03762v7',
    'base_arxiv_id': '1706.03762',
    'version': 7,
    'title': 'Attention Is All You Need',
    'abstract': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.',
    'submit_date': '2017-06-12',
    'metadata': {'venue': 'NeurIPS 2017', 'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf'}
}

research_arcade.insert_node("arxiv_papers", node_features=new_paper)
print("Paper inserted successfully!")

Paper inserted successfully!


In [7]:
# Example 2: Insert BERT paper
bert_paper = {
    'arxiv_id': '1810.04805v2',
    'base_arxiv_id': '1810.04805',
    'version': 2,
    'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding',
    'abstract': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.',
    'submit_date': '2018-10-11',
    'metadata': {'venue': 'NAACL 2019', 'citations': 50000}
}

research_arcade.insert_node("arxiv_papers", node_features=bert_paper)
print("BERT paper inserted successfully!")

BERT paper inserted successfully!


### Get All Papers

In [8]:
arxiv_papers_df = research_arcade.get_all_node_features("arxiv_papers")
print(f"Total papers in database: {len(arxiv_papers_df)}")
print("\nFirst 5 papers:")
print(arxiv_papers_df.head())

Total papers in database: 4

First 5 papers:
   id      arxiv_id  base_arxiv_id  version  \
0   2  1810.04805v2     1810.04805        2   
1   3   1409.0473v7     1409.04730        7   
2   4  1512.03385v1     1512.03385        1   
3   5  1706.03762v7     1706.03762        7   

                                               title  \
0  BERT: Pre-training of Deep Bidirectional Trans...   
1  Neural Machine Translation by Jointly Learning...   
2       Deep Residual Learning for Image Recognition   
3                          Attention Is All You Need   

                                            abstract submit_date  \
0  We introduce a new language representation mod...  2018-10-11   
1  Neural machine translation is a recently propo...  2014-09-01   
2  Deeper neural networks are more difficult to t...  2015-12-10   
3  The dominant sequence transduction models are ...  2017-06-12   

                                            metadata  
0        {"venue": "NAACL 2019", "citation

### Get Specific Paper by ID

In [9]:
paper_id = {"arxiv_id": "1706.03762v7"}
paper_features = research_arcade.get_node_features_by_id("arxiv_papers", paper_id)
print("Paper details:")
print(paper_features.to_dict(orient="records")[0])

Paper details:
{'id': 5, 'arxiv_id': '1706.03762v7', 'base_arxiv_id': 1706.03762, 'version': 7, 'title': 'Attention Is All You Need', 'abstract': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 af

### Update a Paper

In [10]:
# Update metadata for a paper
updated_paper = {
    'arxiv_id': '1706.03762v7',
    'metadata': {
        'venue': 'NeurIPS 2017',
        'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf',
        'citations': 75000,
        'influential': True
    }
}

research_arcade.update_node("arxiv_papers", node_features=updated_paper)
print("Paper updated successfully!")

Paper updated successfully!


### Delete a Paper

In [11]:
# Delete a paper by ID
paper_id = {"arxiv_id": "1706.03762v7"}
deleted_paper = research_arcade.delete_node_by_id("arxiv_papers", paper_id)
print("Deleted paper:")
print(deleted_paper)

Deleted paper:
True


## 4. ArXiv Authors <a name="arxiv-authors"></a>

### Table Schema
- `id` (SERIAL PK)
- `semantic_scholar_id` (VARCHAR, unique)
- `name` (VARCHAR)
- `homepage` (VARCHAR)

### Insert Authors

In [12]:
# Insert authors from the Transformer paper
authors = [
    {
        'semantic_scholar_id': 'ss_ashish_vaswani',
        'name': 'Ashish Vaswani',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_noam_shazeer',
        'name': 'Noam Shazeer',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_niki_parmar',
        'name': 'Niki Parmar',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_jakob_uszkoreit',
        'name': 'Jakob Uszkoreit',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    },
    {
        'semantic_scholar_id': 'ss_llion_jones',
        'name': 'Llion Jones',
        'homepage': 'https://scholar.google.com/citations?user=oR9sCGYAAAAJ'
    }
]

for author in authors:
    research_arcade.insert_node("arxiv_authors", node_features=author)
    print(f"Inserted author: {author['name']}")

Inserted author: Ashish Vaswani
Inserted author: Noam Shazeer
Inserted author: Niki Parmar
Inserted author: Jakob Uszkoreit
Inserted author: Llion Jones


### Get All Authors

In [13]:
authors_df = research_arcade.get_all_node_features("arxiv_authors")
print(f"Total authors in database: {len(authors_df)}")
print("\nAll authors:")
print(authors_df)

Total authors in database: 5

All authors:
   id semantic_scholar_id             name  \
0   1   ss_ashish_vaswani   Ashish Vaswani   
1   2     ss_noam_shazeer     Noam Shazeer   
2   3      ss_niki_parmar      Niki Parmar   
3   4  ss_jakob_uszkoreit  Jakob Uszkoreit   
4   5      ss_llion_jones      Llion Jones   

                                            homepage  
0                          https://ashishvaswani.com  
1  https://scholar.google.com/citations?user=oR9s...  
2  https://scholar.google.com/citations?user=oR9s...  
3  https://scholar.google.com/citations?user=oR9s...  
4  https://scholar.google.com/citations?user=oR9s...  


### Get Specific Author by ID

In [14]:
author_id = {"semantic_scholar_id": "ss_ashish_vaswani"}
author_features = research_arcade.get_node_features_by_id("arxiv_authors", author_id)
print("Author details:")
print(author_features)

Author details:
None


### Update an Author

In [15]:
updated_author = {
    'semantic_scholar_id': 'ss_ashish_vaswani',
    'homepage': 'https://ashishvaswani.com'
}

research_arcade.update_node("arxiv_authors", node_features=updated_author)
print("Author updated successfully!")

Author updated successfully!


## 5. ArXiv Categories <a name="arxiv-categories"></a>

### Table Schema
- `id` (SERIAL PK)
- `name` (VARCHAR, unique)
- `description` (TEXT)

### Insert Categories

In [16]:
categories = [
    {
        'name': 'cs.CL',
        'description': 'Computation and Language (Natural Language Processing)'
    },
    {
        'name': 'cs.LG',
        'description': 'Machine Learning'
    },
    {
        'name': 'cs.AI',
        'description': 'Artificial Intelligence'
    },
    {
        'name': 'cs.CV',
        'description': 'Computer Vision and Pattern Recognition'
    },
    {
        'name': 'stat.ML',
        'description': 'Machine Learning (Statistics)'
    }
]

for category in categories:
    research_arcade.insert_node("arxiv_categories", node_features=category)
    print(f"Inserted category: {category['name']}")

Inserted category: cs.CL
Inserted category: cs.LG
Inserted category: cs.AI
Inserted category: cs.CV
Inserted category: stat.ML


### Get All Categories

In [17]:
categories_df = research_arcade.get_all_node_features("arxiv_categories")
print(f"Total categories: {len(categories_df)}")
print("\nAll categories:")
print(categories_df)

Total categories: 5

All categories:
   id     name                                        description
0   1    cs.CL  Computation and Language (Natural Language Pro...
1   2    cs.LG                                   Machine Learning
2   3    cs.AI                            Artificial Intelligence
3   4    cs.CV            Computer Vision and Pattern Recognition
4   5  stat.ML                      Machine Learning (Statistics)


## 6. ArXiv Figures <a name="arxiv-figures"></a>

### Table Schema
- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `name` (TEXT)

### Insert Figures

In [18]:
# Insert figures for the Transformer paper
figures = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/transformer_architecture.png',
        'caption': 'The Transformer model architecture. The left side shows the encoder stack and the right side shows the decoder stack.',
        'label': 'fig:architecture',
        'name': 'Figure 1'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/scaled_dot_product_attention.png',
        'caption': 'Scaled Dot-Product Attention and Multi-Head Attention mechanisms.',
        'label': 'fig:attention',
        'name': 'Figure 2'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/figures/positional_encoding.png',
        'caption': 'Positional encoding visualization showing sine and cosine functions of different frequencies.',
        'label': 'fig:positional',
        'name': 'Figure 3'
    }
]

for figure in figures:
    research_arcade.insert_node("arxiv_figures", node_features=figure)
    print(f"Inserted {figure['name']}")

Inserted Figure 1
Inserted Figure 2
Inserted Figure 3


### Get All Figures

In [19]:
figures_df = research_arcade.get_all_node_features("arxiv_figures")
print(f"Total figures: {len(figures_df)}")
print("\nAll figures:")
print(figures_df[['name', 'caption', 'label']])

Total figures: 3

All figures:
       name                                            caption  \
0  Figure 1  The Transformer model architecture. The left s...   
1  Figure 2  Scaled Dot-Product Attention and Multi-Head At...   
2  Figure 3  Positional encoding visualization showing sine...   

              label  
0  fig:architecture  
1     fig:attention  
2    fig:positional  


## 7. ArXiv Tables <a name="arxiv-tables"></a>

### Table Schema
- `id` (SERIAL PK)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `path` (VARCHAR)
- `caption` (TEXT)
- `label` (TEXT)
- `table_text` (TEXT)

### Insert Tables

In [20]:
# Insert tables for the Transformer paper
tables = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/model_variations.tex',
        'caption': 'Variations on the Transformer architecture with different hyperparameters.',
        'label': 'tab:variations',
        'table_text': 'Model | N | d_model | d_ff | h | d_k | d_v | P_drop | train time\nbase | 6 | 512 | 2048 | 8 | 64 | 64 | 0.1 | 12 hrs'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/wmt_results.tex',
        'caption': 'Performance of the Transformer on WMT 2014 English-German and English-French translation tasks.',
        'label': 'tab:wmt',
        'table_text': 'Model | EN-DE BLEU | EN-FR BLEU\nTransformer (base) | 27.3 | 38.1\nTransformer (big) | 28.4 | 41.8'
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'path': '/tables/parsing_results.tex',
        'caption': 'English constituency parsing results on WSJ test set.',
        'label': 'tab:parsing',
        'table_text': 'Model | WSJ 23 F1\nTransformer | 91.3'
    }
]

for table in tables:
    research_arcade.insert_node("arxiv_tables", node_features=table)
    print(f"Inserted table: {table['label']}")

Inserted table: tab:variations
Inserted table: tab:wmt
Inserted table: tab:parsing


### Get All Tables

In [21]:
tables_df = research_arcade.get_all_node_features("arxiv_tables")
print(f"Total tables: {len(tables_df)}")
print("\nAll tables:")
print(tables_df[['label', 'caption']])

Total tables: 3

All tables:
            label                                            caption
0  tab:variations  Variations on the Transformer architecture wit...
1         tab:wmt  Performance of the Transformer on WMT 2014 Eng...
2     tab:parsing  English constituency parsing results on WSJ te...


## 8. ArXiv Sections <a name="arxiv-sections"></a>

### Table Schema
- `id` (SERIAL PK)
- `content` (TEXT)
- `title` (TEXT)
- `appendix` (BOOLEAN)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `section_in_paper_id` (INT)

### Insert Sections

In [22]:
# Insert sections for the Transformer paper
sections = [
    {
        'content': 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...',
        'title': 'Introduction',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 1
    },
    {
        'content': 'Most competitive neural sequence transduction models have an encoder-decoder structure. Here, the encoder maps an input sequence of symbol representations...',
        'title': 'Background',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 2
    },
    {
        'content': 'Most neural sequence transduction models have an encoder-decoder structure. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers...',
        'title': 'Model Architecture',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 3
    },
    {
        'content': 'In this section we describe the training regime for our models...',
        'title': 'Training',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 4
    },
    {
        'content': 'On the WMT 2014 English-to-German translation task, the big transformer model outperforms the best previously reported models...',
        'title': 'Results',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 5
    },
    {
        'content': 'In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers...',
        'title': 'Conclusion',
        'appendix': False,
        'paper_arxiv_id': '1706.03762v7',
        'section_in_paper_id': 6
    }
]

for section in sections:
    research_arcade.insert_node("arxiv_sections", node_features=section)
    print(f"Inserted section: {section['title']}")

Inserted section: Introduction
Inserted section: Background
Inserted section: Model Architecture
Inserted section: Training
Inserted section: Results
Inserted section: Conclusion


### Get All Sections

In [23]:
sections_df = research_arcade.get_all_node_features("arxiv_sections")
print(f"Total sections: {sections_df}")
print("\nAll sections:")
print(sections_df[['title', 'section_in_paper_id', 'appendix']])

Total sections:    id                                            content               title  \
0   1  The dominant sequence transduction models are ...        Introduction   
1   2  Most competitive neural sequence transduction ...          Background   
2   3  Most neural sequence transduction models have ...  Model Architecture   
3   4  In this section we describe the training regim...            Training   
4   5  On the WMT 2014 English-to-German translation ...             Results   
5   6  In this work, we presented the Transformer, th...          Conclusion   

   appendix paper_arxiv_id  section_in_paper_id  
0     False   1706.03762v7                  1.0  
1     False   1706.03762v7                  2.0  
2     False   1706.03762v7                  3.0  
3     False   1706.03762v7                  4.0  
4     False   1706.03762v7                  5.0  
5     False   1706.03762v7                  6.0  

All sections:
                title  section_in_paper_id  appendix
0    

## 9. ArXiv Paragraphs <a name="arxiv-paragraphs"></a>

### Table Schema
- `id` (SERIAL PK)
- `paragraph_id` (INT)
- `content` (TEXT)
- `paper_arxiv_id` (VARCHAR FK → papers.arxiv_id)
- `paper_section` (TEXT)
- `section_id` (INT)
- `paragraph_in_paper_id` (INT)

### Insert Paragraphs

In [24]:
# Insert paragraphs from the Introduction section
paragraphs = [
    {
        'paragraph_id': 1,
        'content': 'Recurrent neural networks, long short-term memory and gated recurrent neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 1
    },
    {
        'paragraph_id': 2,
        'content': 'Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures. Recurrent models typically factor computation along the symbol positions of the input and output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 2
    },
    {
        'paragraph_id': 3,
        'content': 'Aligning the positions to steps in computation time, they generate a sequence of hidden states h_t, as a function of the previous hidden state h_{t-1} and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 3
    },
    {
        'paragraph_id': 4,
        'content': 'Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 4
    },
    {
        'paragraph_id': 5,
        'content': 'In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.',
        'paper_arxiv_id': '1706.03762v7',
        'paper_section': 'Introduction',
        'section_id': 1,
        'paragraph_in_paper_id': 5
    }
]

for paragraph in paragraphs:
    research_arcade.insert_node("arxiv_paragraphs", node_features=paragraph)
    print(f"Inserted paragraph {paragraph['paragraph_id']} from {paragraph['paper_section']}")

Inserted paragraph 1 from Introduction
Inserted paragraph 2 from Introduction
Inserted paragraph 3 from Introduction
Inserted paragraph 4 from Introduction
Inserted paragraph 5 from Introduction


### Get All Paragraphs

In [25]:
paragraphs_df = research_arcade.get_all_node_features("arxiv_paragraphs")
print(f"Total paragraphs: {len(paragraphs_df)}")
print("\nFirst 3 paragraphs:")
print(paragraphs_df[['paragraph_id', 'paper_section', 'content']].head(3))

Total paragraphs: 5

First 3 paragraphs:
   paragraph_id paper_section  \
0             1  Introduction   
1             2  Introduction   
2             3  Introduction   

                                             content  
0  Recurrent neural networks, long short-term mem...  
1  Numerous efforts have since continued to push ...  
2  Aligning the positions to steps in computation...  


## 10. Relationships/Edges <a name="relationships"></a>

This section demonstrates how to create relationships between different entities.

### Paper-Author Relationships (arxiv_paper_authors)

In [26]:
# Create authorship relationships for the Transformer paper
paper_authors = [
    {
        'paper_arxiv_id': '1706.03762v7',
        'author_id': 'ss_ashish_vaswani',
        'author_sequence': 1
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'author_id': 'ss_noam_shazeer',
        'author_sequence': 2
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'author_id': 'ss_niki_parmar',
        'author_sequence': 3
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'author_id': 'ss_jakob_uszkoreit',
        'author_sequence': 4
    },
    {
        'paper_arxiv_id': '1706.03762v7',
        'author_id': 'ss_llion_jones',
        'author_sequence': 5
    }
]

for relation in paper_authors:
    research_arcade.insert_edge("arxiv_paper_authors", edge_features=relation)
    print(f"Linked author {relation['author_id']} to paper (position {relation['author_sequence']})")

Table arxiv_paper_authors not found.
Linked author ss_ashish_vaswani to paper (position 1)
Table arxiv_paper_authors not found.
Linked author ss_noam_shazeer to paper (position 2)
Table arxiv_paper_authors not found.
Linked author ss_niki_parmar to paper (position 3)
Table arxiv_paper_authors not found.
Linked author ss_jakob_uszkoreit to paper (position 4)
Table arxiv_paper_authors not found.
Linked author ss_llion_jones to paper (position 5)


### Paper-Category Relationships (arxiv_paper_category)

## Conclusion

This tutorial has covered:

1. Setting up the ResearchArcade database connection
2. Working with OpenReview data
3. CRUD operations for all ArXiv entity types:
   - Papers
   - Authors
   - Categories
   - Figures
   - Tables
   - Sections
   - Paragraphs
4. Creating relationships between entities:
   - Authorship
   - Citations
   - Paper-Category links
   - Paper-Figure/Table links
   - Paragraph-level references
5. Advanced querying patterns
6. Best practices for data validation

For more information, refer to the ResearchArcade documentation.