In [1]:
import openreview
import os

In [2]:
def get_openreview_credentials():
    # First check if the credentials are in the .openreview_credentials file
    credentials_file = ".openreview_credentials"
    if os.path.exists(credentials_file):
        with open(credentials_file, "r") as file:
            lines = file.readlines()
            username = lines[0].strip()
            password = lines[1].strip()
            return username, password

    # If not found, prompt the user to enter the credentials
    username = input("Enter your OpenReview username: ")
    password = input("Enter your OpenReview password: ")
    return username, password

In [3]:
username, password = get_openreview_credentials()

In [12]:
client = openreview.api.OpenReviewClient(
    baseurl='https://api2.openreview.net',
    username=username,
    password=password
)
venue_id = 'ICLR.cc/2024/Conference'
venue_group = client.get_group(venue_id)


In [23]:
submission_name = venue_group.content['submission_name']['value']
invitation_name = f'{venue_id}/-/{submission_name}'
submissions = client.get_all_notes(invitation=invitation_name)
submissions_with_replies = client.get_notes(invitation=invitation_name, details='replies')

Getting V2 Notes: 100%|█████████▉| 9118/9128 [00:02<00:00, 3165.30it/s]


In [31]:
s = submissions_with_replies[0]
# replies = [r for r in s.details['replies'] if r['invitations'][0].endswith('Official_Comment')]
replies = [r for r in s.details['replies']]


In [49]:
for s in submissions_with_replies:
  if len(s.details['replies']) > 15:
    submission_with_replies = s
    break


In [50]:
replies = s.details['replies']
print(f"There are {len(replies)} replies for the submission {s.id}")

There are 22 replies for the submission rhgIgTSSxW


In [72]:
def print_key_value_pairs(d, indent=0):
    for k, v in d.items():
        # if isinstance(v, dict) and 'value' in v.keys():
        #     v = v['value']
        if isinstance(v, dict):
            print("  " * indent, k)
            print_key_value_pairs(v, indent + 1)
        else:
            print("  " * indent, k, ":", v)

In [102]:
def get_reply_source_type(reply):
    signature = reply['signatures'][0]
    # Parse the trailing value after the last "/" in the signature
    id = signature.split('/')[-1]
    if 'Reviewer' in signature:
        return 'reviewer', id
    elif 'Authors' in signature:
        return 'author', id
    elif 'Area_Chair' in signature:
        return 'area_chair', id
    elif 'Program_Chair' in signature:
        return 'program_chair', id

In [109]:
for i, reply in enumerate(replies):
    reply_type, reply_id = get_reply_source_type(reply)
    print(f"{i}: Reply from {reply_type} {reply_id}")

0: Reply from reviewer Reviewer_EnVq
1: Reply from reviewer Reviewer_Ly5o
2: Reply from reviewer Reviewer_fJsx
3: Reply from reviewer Reviewer_aaV3
4: Reply from author 
5: Reply from author 
6: Reply from author 
7: Reply from author 
8: Reply from author 
9: Reply from author 
10: Reply from author 
11: Reply from author 
12: Reply from author 
13: Reply from author 
14: Reply from author 
15: Reply from author 
16: Reply from reviewer Reviewer_aaV3
17: Reply from author 
18: Reply from author 
19: Reply from reviewer Reviewer_EnVq
20: Reply from area_chair Area_Chair_6yPy
21: Reply from program_chair Program_Chairs


In [106]:
replies[3]

{'content': {'summary': {'value': 'This paper considers the problem of making predictions on tabular data. The authors propose a retrieval-augmented approach where a predictor takes the representation not of the table being predicted but also the representation of the nearest neighbors from a training dataset. The encoding representations and the predictors are training together and use straightforward architecture architectures. The main result is that a combination of the carefully crafted techniques outperforms GBDT on an ensemble of tasks. The training time is higher than GBDT but not unreasonable, and better compared to prior deep learning methods. The prediction times are better'},
  'soundness': {'value': '3 good'},
  'presentation': {'value': '3 good'},
  'contribution': {'value': '3 good'},
  'strengths': {'value': '1. The results seem to be a significant advance over prior work in tabular data predictions. In particular, the first deep learning model to outperform GBDT on an 

In [105]:
replies[4]

{'content': {'title': {'value': 'Rebuttal (part 1/2)'},
  'comment': {'value': 'We thank the reviewer for the positive feedback!\n\n> A comparison of the inference and query complexity between the methods is lacking.\n\n**In the new PDF, this is addressed in Section A.4.2.** For convenience, here, we provide a summary.\n\nBelow, we report the inference throughput of TabR and XGBoost.\n\n(The technical setup:\n- XGBoost and TabR-S with tuned hyperparameters as in Table 4 of the main text\n- Computation is performed on NVIDIA 2080 Ti.\n- For both models, objects are passed by batches of 4096 objects.)\n\n**The key observations:**\n- On the considered datasets, the throughputs of TabR and XGBoost are mostly comparable.\n- **Important**: our implementation of TabR is naive and lacks even basic optimizations.\n\n|                                | CH    | CA   | HO   | AD   | DI   | OT   | HI  | BL   | WE   | CO   | MI   |\n|--------------------------------|------|------|------|------|------

In [107]:
replies[-1]

{'content': {'title': {'value': 'Paper Decision'},
  'decision': {'value': 'Accept (poster)'},
  'comment': {'value': ''}},
 'id': 'wd7zIBoZIw',
 'forum': 'rhgIgTSSxW',
 'replyto': 'rhgIgTSSxW',
 'signatures': ['ICLR.cc/2024/Conference/Program_Chairs'],
 'nonreaders': [],
 'readers': ['everyone'],
 'writers': ['ICLR.cc/2024/Conference',
  'ICLR.cc/2024/Conference/Program_Chairs'],
 'number': 1,
 'invitations': ['ICLR.cc/2024/Conference/Submission9502/-/Decision',
  'ICLR.cc/2024/Conference/-/Edit'],
 'domain': 'ICLR.cc/2024/Conference',
 'tcdate': 1705405993702,
 'cdate': 1705405993702,
 'tmdate': 1708116196470,
 'mdate': 1708116196470,
 'license': 'CC BY 4.0',
 'version': 2}

In [108]:
replies[-2]

{'content': {'metareview': {'value': 'This submission contributes a neural architecture dedicated to tabular learning based on combining a feed-forward network with a nearest neighbor mechanism. The submission generated many solid discussions and was seen as an interesting addition to the tabular-learning literature. The reviewers appreciated the extensive experiments, the writing, and the reproducibility of the work. More baselines could have been added, and more attention to categorical variables.'},
  'justification_for_why_not_higher_score': {'value': 'The paper is already borderline with regards to acceptance. I do not think that we can push it further.'},
  'justification_for_why_not_lower_score': {'value': 'The work seems solid, as acknowledged by 3 of the 4 reviewers. The answers to the review by the authors were also solid. The fourth reviewer seems unfair, and is the author of one of the competing methods puts forward in the critical review. The work seems to honestly positio

In [110]:
replies[16]

{'content': {'title': {'value': 'inference time comparison'},
  'comment': {'value': 'Is it reasonable to compare XGBoost with deep learning with 4096 batch size on GPU. How do the inference times compare on CPU or with small batching as is the case with inference in reality?'}},
 'id': '6bGJyzEwju',
 'forum': 'rhgIgTSSxW',
 'replyto': 'pfSPP72xj0',
 'signatures': ['ICLR.cc/2024/Conference/Submission9502/Reviewer_aaV3'],
 'readers': ['everyone'],
 'writers': ['ICLR.cc/2024/Conference',
  'ICLR.cc/2024/Conference/Submission9502/Reviewer_aaV3'],
 'number': 13,
 'invitations': ['ICLR.cc/2024/Conference/Submission9502/-/Official_Comment'],
 'domain': 'ICLR.cc/2024/Conference',
 'tcdate': 1700517720978,
 'cdate': 1700517720978,
 'tmdate': 1700517720978,
 'mdate': 1700517720978,
 'license': 'CC BY 4.0',
 'version': 2}

In [121]:
def get_reply_type(reply):
    def any_part_matches(s, parts):
        return any([s in p for p in parts])
    source_type, _ = get_reply_source_type(reply)
    if any_part_matches('Official_Comment', reply['invitations']):
        return source_type + ' comment'
    elif any_part_matches('Meta_Review', reply['invitations']):
        return 'meta_review'
    elif any_part_matches('Review', reply['invitations']):
        return 'review'
    elif any_part_matches('Decision', reply['invitations']):
        return 'decision'
    else:
        return 'unknown'

for i, reply in enumerate(replies):
    reply_type = get_reply_type(reply)
    source_type, source_id = get_reply_source_type(reply)
    print(f"{i:02d}: {reply_type} from {source_type} {source_id}")

00: review from reviewer Reviewer_EnVq
01: review from reviewer Reviewer_Ly5o
02: review from reviewer Reviewer_fJsx
03: review from reviewer Reviewer_aaV3
04: author comment from author 
05: author comment from author 
06: author comment from author 
07: author comment from author 
08: author comment from author 
09: author comment from author 
10: author comment from author 
11: author comment from author 
12: author comment from author 
13: author comment from author 
14: author comment from author 
15: author comment from author 
16: reviewer comment from reviewer Reviewer_aaV3
17: author comment from author 
18: author comment from author 
19: reviewer comment from reviewer Reviewer_EnVq
20: meta_review from area_chair Area_Chair_6yPy
21: decision from program_chair Program_Chairs


In [117]:
reply['invitations']

['ICLR.cc/2024/Conference/Submission9502/-/Decision',
 'ICLR.cc/2024/Conference/-/Edit']

In [82]:
def parse_openreview_review(review_obj):
    """
    Converts an OpenReview review dictionary into a human-readable string
    with labeled sections.
    
    :param review_obj: A dictionary representing the review object from OpenReview.
    :return: A formatted string summarizing the review.
    """
    
    # Extract top-level fields:
    review_id = review_obj.get("id", "N/A")
    forum_id = review_obj.get("forum", "N/A")
    signatures = review_obj.get("signatures", [])
    signature_str = ", ".join(signatures) if signatures else "N/A"

    # Extract content fields (the actual review text sections):
    content = review_obj.get("content", {})

    def fetch_value(field_key):
        """
        Safely fetch the text under content[field_key]['value'], 
        returning a placeholder if missing or empty.
        """
        return content.get(field_key, {}).get("value", "[No information provided]")

    # Commonly found review sections in OpenReview:
    rating         = fetch_value("rating")
    confidence     = fetch_value("confidence")
    summary        = fetch_value("summary")
    soundness      = fetch_value("soundness")
    presentation   = fetch_value("presentation")
    contribution   = fetch_value("contribution")
    strengths      = fetch_value("strengths")
    weaknesses     = fetch_value("weaknesses")
    questions      = fetch_value("questions")

    # Build a readable string with labeled sections:
    # Adjust formatting as needed (e.g., add bullet points, etc.).
    lines = [
        f"Review ID: {review_id}",
        # f"Forum (Submission) ID: {forum_id}",
        # f"Review by: {signature_str}",
        # "",
        "",
        "=== Summary ===",
        summary,
        "",
        "=== Soundness ===",
        soundness,
        "",
        "=== Presentation ===",
        presentation,
        "",
        "=== Contribution ===",
        contribution,
        "",
        "=== Strengths ===",
        strengths,
        "",
        "=== Weaknesses ===",
        weaknesses,
        "",
        "=== Questions ===",
        questions,
        f"Rating: {rating}",
        f"Confidence: {confidence}",
    ]

    return "\n".join(lines)

In [85]:
print(parse_openreview_review(replies[0]))

Review ID: yVdQ7kKCcl

=== Summary ===
The authors meticulously designed a supervised deep learning model for tabular data prediction, which operates in a retrieval-like manner. It outperformed tree-based models on middle-scale datasets, as well as other retrieval-based deep learning tabular learning models. To achieve this, they introduced a k-Nearest-Neighbors-like idea in model design.

=== Soundness ===
3 good

=== Presentation ===
2 fair

=== Contribution ===
2 fair

=== Strengths ===
- As emphasized by the authors, their method has managed to outperform tree based models like xgboost on middle-scale datasets.

- Overall, the presentation is clear, and the experiments are comprehensive. The details are clear and the model is highly reproducible.

- This model is the best-performing retrieval based model.

=== Weaknesses ===
- The motivations behind the module designs are not entirely clear. It appears that the authors made meticulous module (equation) optimization based on its per

In [98]:
replies[4]

{'content': {'title': {'value': 'Rebuttal (part 1/2)'},
  'comment': {'value': 'We thank the reviewer for the positive feedback!\n\n> A comparison of the inference and query complexity between the methods is lacking.\n\n**In the new PDF, this is addressed in Section A.4.2.** For convenience, here, we provide a summary.\n\nBelow, we report the inference throughput of TabR and XGBoost.\n\n(The technical setup:\n- XGBoost and TabR-S with tuned hyperparameters as in Table 4 of the main text\n- Computation is performed on NVIDIA 2080 Ti.\n- For both models, objects are passed by batches of 4096 objects.)\n\n**The key observations:**\n- On the considered datasets, the throughputs of TabR and XGBoost are mostly comparable.\n- **Important**: our implementation of TabR is naive and lacks even basic optimizations.\n\n|                                | CH    | CA   | HO   | AD   | DI   | OT   | HI  | BL   | WE   | CO   | MI   |\n|--------------------------------|------|------|------|------|------

In [89]:
print(replies[4]['content']['title']['value'])
print(replies[4]['content']['comment']['value'])

Rebuttal (part 1/2)
We thank the reviewer for the positive feedback!

> A comparison of the inference and query complexity between the methods is lacking.

**In the new PDF, this is addressed in Section A.4.2.** For convenience, here, we provide a summary.

Below, we report the inference throughput of TabR and XGBoost.

(The technical setup:
- XGBoost and TabR-S with tuned hyperparameters as in Table 4 of the main text
- Computation is performed on NVIDIA 2080 Ti.
- For both models, objects are passed by batches of 4096 objects.)

**The key observations:**
- On the considered datasets, the throughputs of TabR and XGBoost are mostly comparable.
- **Important**: our implementation of TabR is naive and lacks even basic optimizations.

|                                | CH    | CA   | HO   | AD   | DI   | OT   | HI  | BL   | WE   | CO   | MI   |
|--------------------------------|------|------|------|------|------|-----|------|------|------|------|------|
| `#trainingObjects`               