# ARR Tool: Track your ACs and Reviews

This is a simple notebook that allows SAC to track the paper review status using OpenReview API.

- check all papers & reviews under your batch
- retrieve meta-review & status
- retrieve confidential comments to editors, and review issue reports

***Copy this notebook to run in your own environment.***

Author: [Yiming Cui](https://ymcui.com/)

In [None]:
!pip install openreview-py
!pip install plotly

In [None]:
import collections
import openreview
import pandas as pd
from prettytable import PrettyTable
import ipywidgets as widgets
from IPython.display import display, HTML
from datetime import datetime
import markdown as md
import numpy as np
import matplotlib.pyplot as plt

#############################
#UPDATE THIS CELL WITH YOUR OpenReview INFO
username = "your-email" #@param {type:"string"}
password = "your-password" #@param {type:"string"}
me ="~YourProfile" #@param {type:"string"}
#############################

client = openreview.api.OpenReviewClient(baseurl='https://api2.openreview.net', username=username, password=password)

venue_id = 'aclweb.org/ACL/ARR/2025/February' #@param {type:"string"}
venue_group = client.get_group(venue_id)
submission_name = venue_group.content['submission_name']['value']
submissions = client.get_all_notes(invitation=f'{venue_id}/-/{submission_name}', details='replies')
my_sac_groups = {
    g.id
    for g in client.get_all_groups(members=me, prefix=f'{venue_id}/{submission_name}')
    if g.id.endswith('Senior_Area_Chairs')
}

## Monitor paper review status (updated with meta-review, scores)

This cell can track paper status, showing the number of completed reviews, assigned AC, meta-review status, etc.

- paper #: the paper ID
- paper ID: the paper ID (openreview.net/forum?id=...)
- Paper Type: Long / Short
- Area Chair: the area chair
- Num Reviews: the number of completed reviews / expected reviews (may more than 3 if emergency reviewers are assigned)
- Ready for Rebuttal: if received three or more reviews, it will be marked as "√"
- Author Response: if author response is available (we use a heuristic to count author comments >= 3)
- Reviewer Scores: show average and individual reviewer scores (confidence/soundness/excitement/overall)
- Meta Review Score: show meta-review score if it is ready

**If you want to refresh the submission status (fetch up-to-date data from OR), you should run previous cell again.**

In [None]:
# Helper function: Count a reply as an actual review only if its invitations include "/-/Official_Review"
def is_actual_review(reply):
    return any('/-/Official_Review' in invitation for invitation in reply.get('invitations', []))

# Helper function: Check if a reply is a meta-review (assuming its invitation contains "/-/Meta_Review")
def is_meta_review(reply):
    return any('/-/Meta_Review' in invitation for invitation in reply.get('invitations', []))

def is_withdrawn(submission):
    # Check if there's a non-empty withdrawal_confirmation field.
    withdrawal_conf = submission.content.get("withdrawal_confirmation", {}).get("value", "").strip()
    if withdrawal_conf:
        return True
    # Alternatively, check if the venue value contains "withdrawn" (case insensitive).
    venue_val = submission.content.get("venue", {}).get("value", "").lower()
    if "withdrawn" in venue_val:
        return True
    return False

# Helper function to format a list of scores into "avg_score (score1 / score2 / ...)" with 1 decimal precision.
def format_scores_as_list(scores):
    if scores:
        avg = sum(scores) / len(scores)
        score_list = " / ".join(f"{s:.1f}" for s in scores)
        return f"{avg:.1f} ({score_list})"
    else:
        return ""

# Collect data for each submission.
data = []
for submission in submissions:
    # Skip withdrawn or desk rejected papers.
    if is_withdrawn(submission):
        print(f"Skipping withdrawn paper: {submission.id}")
        continue
    if "venue" in submission.content and "desk rejected" in submission.content["venue"]["value"].lower():
        print(f"Skipping desk rejected paper: {submission.id}")
        continue

    prefix = f'{venue_id}/{submission_name}{submission.number}'
    # Process only submissions in your SAC batch.
    if not (set(submission.readers) & my_sac_groups):
        continue

    # Retrieve the assigned Area Chair.
    area_chairs_group = client.get_group(f'{prefix}/Area_Chairs')
    if area_chairs_group.members:
        ac = area_chairs_group.members[0]  # Assuming one AC per paper.
    else:
        continue

    # Extract Paper Type (e.g., "Long" or "Short").
    paper_type = submission.content.get("paper_type", {}).get("value", "")

    # Count the number of completed reviews (using our stricter filter).
    completed_reviews = sum(1 for reply in submission.details["replies"] if is_actual_review(reply))

    # Determine the expected number of reviews from the Reviewers group.
    expected_reviews = 0
    try:
        reviewers_group = client.get_group(f'{prefix}/Reviewers')
        expected_reviews = len(reviewers_group.members)
    except Exception:
        expected_reviews = 0

    # Merge the two into "Num Reviews" as "x / y".
    num_reviews = f"{completed_reviews} / {expected_reviews}"

    # Set review status: Checkmark if the paper has three or more completed reviews.
    status = "✓" if completed_reviews >= 3 else ""

    # Extract meta-review score from the meta-review reply, if available.
    meta_review_score = ""
    for reply in submission.details["replies"]:
        if is_meta_review(reply):
            content = reply.get("content", {})
            if "overall_assessment" in content:
                meta_review_score = content["overall_assessment"].get("value", "")
            elif "overall_rating" in content:
                meta_review_score = content["overall_rating"].get("value", "")
            elif "score" in content:
                meta_review_score = content["score"].get("value", "")
            break

    # Count author responses: count replies whose first signature indicates an author.
    author_response_count = sum(1 for reply in submission.details.get("replies", [])
                                  if reply.get("signatures", [])[0].find("/Authors") != -1)
    # If at least 3 author responses exist, mark with a checkmark.
    author_response = "✓" if author_response_count >= 3 else ""

    # Initialize lists to collect reviewer scores.
    confidence_scores = []
    soundness_scores = []
    excitement_scores = []
    overall_assessment_scores = []

    # Aggregate reviewer scores from review replies.
    for reply in submission.details["replies"]:
        if is_actual_review(reply):
            content = reply.get("content", {})
            try:
                val = content.get("confidence", {}).get("value", None)
                if val is not None:
                    confidence_scores.append(float(val))
            except:
                pass
            try:
                val = content.get("soundness", {}).get("value", None)
                if val is not None:
                    soundness_scores.append(float(val))
            except:
                pass
            try:
                val = content.get("excitement", {}).get("value", None)
                if val is not None:
                    excitement_scores.append(float(val))
            except:
                pass
            try:
                val = content.get("overall_assessment", {}).get("value", None)
                if val is not None:
                    overall_assessment_scores.append(float(val))
            except:
                pass

    reviewer_confidence = format_scores_as_list(confidence_scores)
    reviewer_soundness = format_scores_as_list(soundness_scores)
    reviewer_excitement = format_scores_as_list(excitement_scores)
    reviewer_overall = format_scores_as_list(overall_assessment_scores)

    data.append({
        "Paper #": submission.number,
        "Paper ID": submission.id,
        "Paper Type": paper_type,
        "Area Chair": ac,
        "Num Reviews": num_reviews,
        "Ready for Rebuttal": status,
        "Author Response": author_response,
        "Reviewer Confidence": reviewer_confidence,
        "Soundness Score": reviewer_soundness,
        "Excitement Score": reviewer_excitement,
        "Overall Assessment": reviewer_overall,
        "Meta Review Score": meta_review_score
    })

# Create a DataFrame from the collected data.
df = pd.DataFrame(data)

# Reorder columns so that the new "Num Reviews" appears after "Area Chair".
desired_order = [
    "Paper #",
    "Paper ID",
    "Paper Type",
    "Area Chair",
    "Num Reviews",
    "Ready for Rebuttal",
    "Author Response",
    "Reviewer Confidence",
    "Soundness Score",
    "Excitement Score",
    "Overall Assessment",
    "Meta Review Score"
]
df = df[desired_order]

# --- Interactive Papers Table ---
print(len(df))
print("All Papers in Your Batch:")
display(df)

# Create an interactive dropdown to filter by Area Chair.
ac_options = ["All"] + sorted(df["Area Chair"].unique())
ac_dropdown = widgets.Dropdown(options=ac_options, description="Area Chair:")
output = widgets.Output()

def update_table(change):
    with output:
        output.clear_output()
        if ac_dropdown.value == "All":
            display(df)
        else:
            filtered_df = df[df["Area Chair"] == ac_dropdown.value]
            display(filtered_df)

ac_dropdown.observe(update_table, names="value")

print("Filter Papers by Area Chair:")
display(ac_dropdown)
display(output)

### Visualization of Review / Meta-review Scores (optional)

In [None]:
# Helper function to extract the average Overall Assessment from the formatted string.
def parse_avg(s):
    try:
        return float(s.split()[0])
    except Exception:
        return np.nan

# Helper function to parse a meta review score into a float.
def parse_meta_review(s):
    try:
        return float(s)
    except Exception:
        return np.nan

# Compute temporary Series (without adding to df)
overall_assessment_avg = df["Overall Assessment"].apply(parse_avg)
meta_review_float = df["Meta Review Score"].apply(parse_meta_review)

# Define bins for Overall Assessment: from 1 to 5 (0.5 interval)
bins = np.arange(1, 5.5, 0.5)  # bin edges: 1, 1.5, ..., 5, 5.5

# Compute histogram counts for Overall Assessment
overall_counts, overall_bin_edges = np.histogram(overall_assessment_avg.dropna(), bins=bins)
overall_bin_centers = (overall_bin_edges[:-1] + overall_bin_edges[1:]) / 2

# For Meta Review Scores, use value_counts (since each paper has exactly one meta review score)
meta_counts_series = meta_review_float.dropna().value_counts().sort_index()
x_meta_values = meta_counts_series.index.values   # exact score values, e.g., 1, 1.5, 2, ...
y_meta_values = meta_counts_series.values

plt.figure(figsize=(10, 6))

# Plot Overall Assessment distribution as a line chart.
plt.plot(overall_bin_centers, overall_counts, marker='o', label="Overall Assessment", color='steelblue')
# Annotate each point for Overall Assessment.
for x, y in zip(overall_bin_centers, overall_counts):
    plt.annotate(f"{y}", xy=(x, y), xytext=(0, 5), textcoords="offset points",
                 ha="center", fontsize=9, color='steelblue')

# Plot Meta Review Score distribution as a line chart.
plt.plot(x_meta_values, y_meta_values, marker='o', label="Meta Review Score", color='red')
# Annotate each point for Meta Review Score.
for x, y in zip(x_meta_values, y_meta_values):
    plt.annotate(f"{y}", xy=(x, y), xytext=(0, 5), textcoords="offset points",
                 ha="center", fontsize=9, color='red')

plt.xlabel("Score")
plt.ylabel("Number of Papers")
plt.title("Distribution of Overall Assessment and Meta Review Scores")
# Set x-axis ticks from 1 to 5 in increments of 0.5.
plt.xticks(np.arange(1, 5.5, 0.5))
plt.legend()
plt.tight_layout()
plt.show()

### Distribution of Meta Reviews vs Overall Assessments (optional)

*Thanks to [Desmond Elliott](http://elliottd.github.io/)'s code contribution!*

I add some color and marker to individual AC to make the visualization more accessible.

In [None]:
from collections import Counter
import plotly.express as px

# Assume parse_avg and parse_meta_review functions are defined.
y_data = df["Overall Assessment"].apply(parse_avg)
# Jitter amount to help distinguish overlapping points.
jitter_strength = 0.2 # @param {type:"slider", min:0, max:1, step:0.05}
dupes = Counter(y_data)
y_data2 = [xi + np.random.uniform(-jitter_strength, jitter_strength) if dupes[xi] > 1 else xi for xi in y_data]

df2 = pd.DataFrame({
    'Meta Review Score': df["Meta Review Score"].apply(parse_meta_review),
    'Jittered Assessment': y_data2,
    'Overall Assessment': y_data,
    'Paper #': [f'Paper {i}' for i in df['Paper #']],
    'Area Chair': df["Area Chair"]  # Used for both color and marker symbol.
})

# Define a custom symbol sequence with enough unique symbols.
symbol_seq = ["circle", "square", "diamond", "cross", "x", "triangle-up", "triangle-down",
              "triangle-left", "triangle-right", "pentagon", "hexagon", "star"]

fig = px.scatter(
    df2,
    x="Jittered Assessment",
    y="Meta Review Score",
    color="Area Chair",           # Color by Area Chair.
    symbol="Area Chair",          # Marker symbol based on Area Chair.
    symbol_sequence=symbol_seq,   # Provide a sequence of symbols.
    hover_name="Paper #",
    hover_data=["Overall Assessment", "Area Chair"],
    title="Paired Distribution: Meta Review Score vs Overall Assessment"
)
fig.show()

### Correlation Between Scores (optional)

In [None]:
def parse_avg(s):
    try:
        return float(s.split()[0])
    except Exception:
        return np.nan

# Build a temporary DataFrame with numeric values from the aggregated string columns.
corr_data = pd.DataFrame({
    "Overall_Assessment_Avg": df["Overall Assessment"].apply(parse_avg),
    "Reviewer_Confidence_Avg": df["Reviewer Confidence"].apply(parse_avg),
    "Soundness_Score_Avg": df["Soundness Score"].apply(parse_avg),
    "Excitement_Score_Avg": df["Excitement Score"].apply(parse_avg),
    # For Meta Review Score, try converting directly to float if possible.
    "Meta_Review_Score": df["Meta Review Score"].apply(lambda x: float(x) if isinstance(x, (int, float)) or (isinstance(x, str) and x.strip() != "") else np.nan)
})

# Compute the correlation matrix.
corr_table = corr_data.corr()

# Create a mask for the lower triangle of the correlation matrix.
mask = np.tril(np.ones(corr_table.shape, dtype=bool))
corr_table_upper = corr_table.mask(mask)

print("Upper Triangle Correlation Table (lower triangle hidden):")
display(corr_table_upper)

## AC Dashboard

*WARNING: run previous cell before running this one.*

- Area Chair: the area chair
- Completed Reviews: the number of completed reviews
- Expected Reviews: the number of expected reviews (may more than 3 if emergency reviewers are assigned)
- Papers Ready: the number of papers that have received three or more reviews
- Num Papers: the total number of papers assigned to the area chair
- All Reviews Ready: whether all papers have received three or more reviews
- Meta Reviews Done: the number of completed meta-reviews
- All Meta-reviews Ready: whether all meta-reviews are completed

In [None]:
# Create numeric columns from "Num Reviews" by splitting the string.
# This assumes every value is in the format "x / y"
df[['CompRev', 'ExpRev']] = df["Num Reviews"].str.split('/', expand=True)
df['CompRev'] = df['CompRev'].astype(float)
df['ExpRev'] = df['ExpRev'].astype(float)

# --- Meta Table Aggregated by Area Chair ---
meta_df = df.groupby("Area Chair").agg(
    Total_Completed_Reviews=("CompRev", "sum"),
    Total_Expected_Reviews=("ExpRev", "sum"),
    Papers_Ready=("Ready for Rebuttal", lambda x: (x == "✓").sum()),
    Num_Papers=("Paper #", "count"),
    Meta_Reviews_Num=("Meta Review Score", lambda x: (x != "").sum())
).reset_index()

# Create a merged "Num Reviews" column in the format "x / y"
meta_df["Num Reviews"] = meta_df.apply(
    lambda row: f"{int(row['Total_Completed_Reviews'])} / {int(row['Total_Expected_Reviews'])}",
    axis=1
)

# Add a column indicating if all papers are review-ready.
meta_df["All Reviews Ready"] = meta_df.apply(
    lambda row: "✓" if row["Papers_Ready"] == row["Num_Papers"] else "", axis=1
)

# Format meta-review count as "x of y".
meta_df["Meta_Reviews_Done"] = meta_df.apply(
    lambda row: f"{row['Meta_Reviews_Num']} of {row['Num_Papers']}", axis=1
)

# Add a column for meta-review readiness.
meta_df["All Meta-reviews Ready"] = meta_df.apply(
    lambda row: "✓" if row["Meta_Reviews_Num"] == row["Num_Papers"] else "", axis=1
)

# Optionally drop the temporary numeric columns.
meta_df.drop(columns=["Total_Completed_Reviews", "Total_Expected_Reviews", "Meta_Reviews_Num"], inplace=True)

# Reorder columns so that "Num Reviews" comes immediately after "Area Chair",
# and "Meta_Reviews_Done" appears right before "All Meta-reviews Ready".
desired_order = [
    "Area Chair",
    "Num Reviews",
    "Papers_Ready",
    "Num_Papers",
    "All Reviews Ready",
    "Meta_Reviews_Done",
    "All Meta-reviews Ready"
]
meta_df = meta_df[desired_order]

print("\nMeta Table by Area Chair:")
display(meta_df)

## Check Confidential Comment & Review Issue Reports

This might be helpful to keep an eye on those comments that requires special attention.

**Update (Apr 3): Now I use forum-like style to present these comments in a user-friendly manner.**

In [None]:
def is_relevant_comment(reply):
    invitations = reply.get("invitations", [])
    return any(
        part in inv
        for inv in invitations
        for part in ["/-/Author-Editor_Confidential_Comment", "/-/Comment", "/-/Review_Issue_Report"]
    )

def classify_comment_type(reply):
    invitations = reply.get("invitations", [])
    if any("/-/Review_Issue_Report" in inv for inv in invitations):
        return "Review Issue"
    elif any("/-/Author-Editor_Confidential_Comment" in inv for inv in invitations):
        return "Author-Editor Confidential"
    elif any("/-/Comment" in inv for inv in invitations):
        return "Confidential Comment"
    else:
        return "Other"

def extract_comment_text(reply):
    content = reply.get("content", {})
    # Try common keys in order of likely importance
    for key in ["comment", "justification", "text", "response", "value"]:
        if key in content:
            val = content[key]
            return val.get("value") if isinstance(val, dict) else val
    # If no known key, flatten all text fields into a fallback string
    fallback = []
    for k, v in content.items():
        if isinstance(v, dict) and "value" in v:
            fallback.append(f"{k}: {v['value']}")
    return "\n".join(fallback) if fallback else "(No comment text found)"


def render_threaded_forum_grouped_by_paper(comments_df):
    # Step 1: Build note_id → row mapping
    note_map = {row["NoteId"]: row for _, row in comments_df.iterrows()}

    # Step 2: Group comments by forum (paper thread)
    paper_groups = comments_df.groupby("Paper #")

    html_blocks = []

    for paper_number, group_df in paper_groups:
        # Build children and root maps per paper
        children_map = {}
        roots = []
        paper_note_map = {row["NoteId"]: row for _, row in group_df.iterrows()}

        for _, row in group_df.iterrows():
            note_id = row["NoteId"]
            replyto = row["ReplyTo"]
            if replyto and replyto in paper_note_map:
                children_map.setdefault(replyto, []).append(note_id)
            else:
                roots.append(note_id)

        def render_comment(note_id, level=0):
            row = paper_note_map[note_id]
            indent = 20 * level
            block = f"""
            <div style="border:1px solid #ccc; border-radius:8px; padding:5px; margin:5px 0 5px {indent}px; width: 60%">
                <div style="font-weight:bold; color:#1a73e8;">
                    {row['Type']} <span style="color:gray; font-weight:normal;">({row['Role']}, {row['Date']})</span>
                </div>
                <div style="margin:5px 0 10px;">
                    <a href="{row['Link']}" target="_blank">View on OpenReview ↗</a>
                </div>
                <div style="white-space:pre-wrap; line-height:1.2;">
                    {md.markdown(row['Content'].strip())}
                </div>
            </div>
            """
            child_html = ''.join(render_comment(child_id, level + 1) for child_id in children_map.get(note_id, []))
            return block + child_html

        thread_html = ''.join(render_comment(root_id) for root_id in roots)

        paper_header = f"""
        <h3 style="border-bottom: 2px solid #999; padding-bottom: 4px;">📝 Paper #{paper_number}</h3>
        """
        html_blocks.append(paper_header + thread_html)

    display(HTML("".join(html_blocks)))

def infer_role_from_signature(signatures):
    if not signatures:
        return "Unknown"
    sig = signatures[0]  # Use first signature
    if "/Authors" in sig:
        return "Author"
    elif "/Reviewer" in sig:
        return "Reviewer"
    elif "/Area_Chair" in sig:
        return "Area Chair"
    elif "/Senior_Area_Chairs" in sig:
        return "Senior Area Chair"
    elif "/Program_Chairs" in sig:
        return "Program Chair"
    elif sig.startswith("~"):
        return "User"
    else:
        return "Other"

def format_timestamp(ms_since_epoch):
    if not ms_since_epoch:
        return ""
    dt = datetime.fromtimestamp(ms_since_epoch / 1000)
    return dt.strftime("%Y-%m-%d")


# --- Collect relevant comments ---
base_url = "https://openreview.net/forum"

relevant_comments = []

for submission in submissions:
    submission_id = f"{venue_id}/{submission_name}{submission.number}"

    for reply in submission.details.get("replies", []):
        if is_relevant_comment(reply):
            forum_id = reply.get("forum", "")
            note_id = reply.get("id", "")
            replyto = reply.get("replyto", None)
            link = f"{base_url}?id={forum_id}&noteId={note_id}"
            signatures = reply.get("signatures", [])
            role = infer_role_from_signature(signatures)
            tcdate = reply.get("tcdate", None)
            date_str = format_timestamp(tcdate)

            relevant_comments.append({
                "Paper #": submission.number,
                "Paper ID": submission.id,
                "Type": classify_comment_type(reply),
                "Role": role,
                "Date": date_str,
                "Content": extract_comment_text(reply),
                "Link": f"{base_url}?id={forum_id}&noteId={note_id}",
                "ReplyTo": reply.get("replyto", None),
                "NoteId": note_id,
            })

comments_df = pd.DataFrame(relevant_comments)
render_threaded_forum_grouped_by_paper(comments_df)

### Comments in Table Form (optional)

If you prefer to view these comments in an interactive table format, you can run the following code cell.
Might be useful if you would like to filter some specific type of comments (for e.g., review issue report).

In [None]:
# Display as DataFrame (original one, optional)
print(f"Total relevant comments found: {len(comments_df)}")
display(comments_df.sort_values(by="Type"))