## Prerequisites
* Ensure this repository's virtual environment is active (or run via `uv run`).
* The `oai_pmh_client` package must be importable from `src/`.
* Network access is required to reach the public arXiv OAI-PMH endpoint.

## Installing `oai_pmh_client` from this repo
Run one of the following commands from the repository root so the notebook can import `oai_pmh_client`:

```bash
uv pip install -e .
```

or, if you're already inside the project virtual environment,

```bash
pip install -e .
```

After installing once, restart the notebook kernel so the import path is refreshed.

In [1]:
from __future__ import annotations

from datetime import datetime, timedelta, timezone
from itertools import islice

from oai_pmh_client import OAIClient

In [4]:
BASE_URL = "https://export.arxiv.org/oai2"
METADATA_PREFIX = "oai_dc"  # arXiv also supports e.g. oai_dc, arXivRaw, etc.
MAX_RESULTS = 100  # safety cap so we don't page through everything at once
LOOKBACK_HOURS = 24

client = OAIClient(base_url=BASE_URL, datestamp_granularity="YYYY-MM-DD")
print(f"Configured OAI client for {BASE_URL}")

Configured OAI client for https://export.arxiv.org/oai2


In [5]:
window_end = datetime.now(timezone.utc)
window_start = window_end - timedelta(hours=LOOKBACK_HOURS)
print("Harvest window:", window_start.isoformat(), "->", window_end.isoformat())

records = []
iterator = client.list_records(
    metadata_prefix=METADATA_PREFIX,
    from_date=window_start,
    until_date=window_end,
    set_spec=None,
)

for record in islice(iterator, MAX_RESULTS):
    header = record.header
    identifier = header.identifier if header else "(no identifier)"
    datestamp = header.datestamp.isoformat() if header and header.datestamp else None
    deleted = header.is_deleted if header else False
    records.append(
        {
            "identifier": identifier,
            "datestamp": datestamp,
            "deleted": deleted,
        }
    )

print(f"Fetched {len(records)} record(s) (capped at {MAX_RESULTS}).")

Harvest window: 2025-11-17T07:56:27.932925+00:00 -> 2025-11-18T07:56:27.932925+00:00
Fetched 100 record(s) (capped at 100).


In [6]:
if not records:
    print("No changed records found in the chosen window.")
else:
    for entry in records:
        state = "DELETED" if entry["deleted"] else "ACTIVE"
        print(f"{entry['datestamp']} | {state} | {entry['identifier']}")

    print("---")
    print(
        f"Displayed {len(records)} changed record(s) from the last {LOOKBACK_HOURS} hours."
    )

2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:1304.0833
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:1411.7677
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:1709.01928
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:1906.08243
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:1910.07653
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2002.03303
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2012.05212
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2012.15281
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2101.08802
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2102.02115
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2105.11419
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2107.01950
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2107.09348
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2201.08443
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2201.12577
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2203.06438
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2203.16435
2025-11-17T00:00:00 | ACTIVE | oai:arXiv.org:2205.