# CNM_CycNucMed — Clean Colab/Jupyter Runner (run_pipeline)

**CNM_CycNucMed** is a lightweight runner for executing the IAEA cyclotron and nuclear medicine data pipeline from Google Colab or Jupyter environments.
It provides a clean, reproducible workflow for setting up the environment, validating the installation, and generating structured PDF reports via the run_pipeline entrypoint without relying on CLI tools or interactive argument parsing.

The notebook is designed for research, analysis, and reproducible reporting, allowing users to move from raw data to publication-ready outputs with minimal setup friction.

**Cloning or Updating the Repository**

This step ensures that the project source code is available locally and up to date.

**What it does**

*   Clones the GitHub repository if it is not yet present
*   Pulls the latest changes if the repository already exists
*   Moves the working directory into the project folder

**Why this is needed**

You must have the current source code locally before installing dependencies or running the pipeline.
This step guarantees that you are always working on the latest version of the project.

**Notes**
*   If the repository is private, cloning requires a GitHub Personal Access Token (PAT).
*   The script includes commented instructions for securely entering a token if authentication fails.

In [None]:
# =========================
# Clone / pull + cd
# =========================
import os, subprocess, pathlib

REPO_URL = "https://github.com/tommasocarzaniga/CNM_CycNucMed.git"
REPO_DIR = "CNM_CycNucMed"

def sh(cmd, cwd=None):
    print(">>", cmd)
    subprocess.check_call(cmd, shell=True, cwd=cwd)

# If you get an auth error, your repo is private.
# Clone using a GitHub Personal Access Token (PAT):
#   from getpass import getpass
#   token = getpass("GitHub PAT (will not be shown): ").strip()
#   sh(f"git clone https://{token}@github.com/tommasocarzaniga/CNM_CycNucMed.git {REPO_DIR}")

if not os.path.exists(REPO_DIR):
    sh(f"git clone {REPO_URL}")
else:
    sh("git pull", cwd=REPO_DIR)

%cd {REPO_DIR}
print("Now in:", pathlib.Path().resolve())


**Installing Dependencies and Package (Editable Mode)**

This step installs all required Python dependencies and installs the project itself in editable mode, enabling active development without repeated reinstalls.

**What it does**

*   Upgrades core packaging tools (pip, setuptools, wheel)
*   Installs all dependencies listed in requirements.txt
*   Installs the project with pip install -e ., linking the local source code directly to the environment

**Why this is needed**

Editable installation allows you to:
*   Modify source files and immediately test changes
*   Avoid reinstalling the package after every code update
*   Ensure imports always reflect the current working version of the project

**When to run it**

Run this step once during initial setup and again if you:
*   Change dependencies
*   Modify pyproject.toml or packaging configuration
*   Set up a new environment (machine, venv, container)

In [None]:
import sys, subprocess, pathlib

def pip(cmd: str):
    print(">> pip", cmd)
    subprocess.check_call([sys.executable, "-m", "pip"] + cmd.split())

pip("install -U pip setuptools wheel")

root = pathlib.Path('.').resolve()
if (root / 'requirements.txt').exists():
    pip(f"install -r {root/'requirements.txt'}")

# Editable install (pyproject.toml / src layout supported)
pip("install -e .")


**Verifying the Pipeline Entrypoint**

This step validates that the project is correctly installed and that the main pipeline function (run_pipeline) is available.

It performs three checks:
*   Confirms the imported module path (to ensure the correct environment/package is being used)
*   Reloads the module to pick up recent code changes without restarting the session
*   Asserts that run_pipeline exists, failing early if the installation or package structure is incorrect

**Why this is needed**

This verification prevents subtle development errors such as:
*   Importing the wrong package version
*   Changes in pipeline.py not being reflected in the runtime
*   Missing or misnamed entrypoint functions

**When to use it**

Use this step during development, debugging, or notebook-based workflows to ensure the pipeline is correctly wired before running long jobs.

In [None]:
import importlib
import iaea_project.pipeline as pipeline

# Force reload in case you edited files after the first import
pipeline = importlib.reload(pipeline)

print("pipeline module:", pipeline.__file__)
print("Has run_pipeline:", hasattr(pipeline, "run_pipeline"))
assert hasattr(pipeline, "run_pipeline"), (
    "run_pipeline() not found. Make sure src/iaea_project/pipeline.py defines run_pipeline and that you re-ran the install cell."
)


**Installing Browser Dependencies (Playwright)**

This command installs Chromium and all required system dependencies for Playwright.
It is required for features that rely on browser automation, such as:
*   Scraping dynamic websites (JavaScript-rendered pages)
*   Automated page rendering for data extraction
*   Generating screenshots or PDFs from HTML content

**Why this step is needed**

Playwright controls a real browser. Without installing Chromium and its dependencies:
*   The pipeline may fail when accessing dynamic web content
*   Rendering-based extraction and automation will not work
*   Errors like "browser executable not found" can occur

**When to run it**

Run this once during environment setup (e.g. after cloning the repository or setting up a new machine/container).

In [None]:
# =========================
# Dependencies (Colab)
# =========================

# Install Chromium + system deps in one go (recommended)
!playwright install --with-deps chromium



**Running the Pipeline via Python API**

This module allows you to execute the IAEA cyclotron reporting pipeline programmatically from Python.
The run_pipeline function runs the full data processing workflow and returns the file path of the generated PDF report.

What it does:
*   Executes the data pipeline (data loading, processing, summarization, and report generation)
*   Supports running for all countries or a selected subset
*   Returns the absolute path to the generated PDF file

In [None]:
from iaea_project.pipeline import run_pipeline

# Run all countries:
# pdf_path = run_pipeline()

# Run selected countries:
pdf_path = run_pipeline(["Switzerland"], enable_llm=True, skip_scrape=True)  # edit as needed

print("Generated PDF at:", pdf_path)


**cyclotron_run — CLI Wrapper**

*cyclotron_run* is a lightweight command-line interface (CLI) wrapper designed to simplify the execution of cyclotron-related data pipelines and analysis workflows.
It provides a clean, consistent entry point to run complex scripts with clear commands and arguments, making the tool easier to use, automate, and reproduce.

Instead of manually editing scripts or remembering long commands, users can interact with the system through intuitive commands such as:

*   cyclotron_run country --name "Switzerland"
*   cyclotron_run full

The goal of cyclotron_run is to improve:
*   Reproducibility of analyses
*   Usability for non-developers
*   Maintainability of the codebase
*   Scalability as new features and modules are added

In [None]:
#!python scripts/cyclotron_run.py --countries Switzerland Germany
