# Notebook 3: Batch Delete

## üéØ Objective

Delete datasets and collections from NAKALA for cleanup.

## üìã What This Notebook Does

1. Reads `delete_data_items.csv` and `delete_collections.csv`
2. Deletes specified resources from NAKALA
3. Saves results to output CSVs

## ‚ö†Ô∏è WARNING

**Deletion is PERMANENT for pending datasets!**
- Only "pending" datasets can be deleted via API
- Published datasets require manual deletion by Huma-Num staff
- Collection deletion does NOT delete contained datasets (only unlinks)

## üîÑ Workflow

```
Load Deletion CSVs ‚Üí Confirm ‚Üí Delete from NAKALA ‚Üí Verify Cleanup
```

---

## Step 1: Setup and Imports

In [None]:
import sys
import os
from pathlib import Path

# Add parent directory to path
sys.path.insert(0, str(Path.cwd().parent))

import csv
import time

# Import from nakala package
from nakala.config import API_URL, API_KEY
from nakala.api_client import get_dataset_info, delete_dataset, delete_collection

print("‚úì Imports successful")
print(f"‚úì API URL: {API_URL}")
print(f"‚úì Using test API key")

## Step 2: Configure Paths

In [None]:
# Set up paths
BASE_PATH = Path.cwd().parent
DATA_PATH = BASE_PATH / 'data'

# Input CSVs
DATASETS_CSV = DATA_PATH / 'delete_data_items.csv'
COLLECTIONS_CSV = DATA_PATH / 'delete_collections.csv'

# Output CSVs
OUTPUT_DATASETS_CSV = DATA_PATH / 'output_deletions_datasets.csv'
OUTPUT_COLLECTIONS_CSV = DATA_PATH / 'output_deletions_collections.csv'

print(f"‚úì Data path: {DATA_PATH}")
print(f"\n‚úì Input CSVs:")
print(f"  - {DATASETS_CSV.name}: {'‚úì exists' if DATASETS_CSV.exists() else '‚úó missing'}")
print(f"  - {COLLECTIONS_CSV.name}: {'‚úì exists' if COLLECTIONS_CSV.exists() else '‚úó missing'}")

## Step 3: Preview Deletion CSVs

Let's see what we're about to delete:

In [None]:
# Preview datasets to delete
if DATASETS_CSV.exists():
    print("=" * 80)
    print("DATASETS TO DELETE (delete_data_items.csv)")
    print("=" * 80)

    with open(DATASETS_CSV, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, 1):
            print(f"\nDataset {i}:")
            print(f"  ID: {row.get('dataset_id', 'N/A')}")
            print(f"  Confirmed: {row.get('confirm_delete', 'N/A')}")
else:
    print("‚ö† delete_data_items.csv not found")

In [None]:
# Preview collections to delete
if COLLECTIONS_CSV.exists():
    print("\n" + "=" * 80)
    print("COLLECTIONS TO DELETE (delete_collections.csv)")
    print("=" * 80)

    with open(COLLECTIONS_CSV, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, 1):
            print(f"\nCollection {i}:")
            print(f"  ID: {row.get('collection_id', 'N/A')}")
            print(f"  Confirmed: {row.get('confirm_delete', 'N/A')}")
else:
    print("‚ö† delete_collections.csv not found")

## Step 4: Delete Datasets

‚ö†Ô∏è **This will permanently delete datasets from NAKALA!**

In [None]:
def delete_datasets(csv_path: Path, api_key: str):
    """
    Delete datasets from CSV file
    """
    # Prepare output CSV
    output = open(OUTPUT_DATASETS_CSV, 'w', encoding='utf-8')
    output_writer = csv.writer(output)
    output_writer.writerow(['dataset_id', 'status_before', 'result', 'response'])

    print("=" * 80)
    print("STARTING DATASET DELETIONS")
    print("=" * 80)

    with open(csv_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)

        for row_num, row in enumerate(reader, 1):
            try:
                dataset_id = row.get('dataset_id', '').strip()
                confirm = row.get('confirm_delete', '').strip().upper()

                if not dataset_id:
                    print(f"Row {row_num}: Missing dataset_id")
                    continue

                if confirm != 'YES':
                    print(f"Row {row_num}: Deletion not confirmed (confirm_delete != 'YES')")
                    output_writer.writerow([dataset_id, '', 'SKIPPED', 'Not confirmed'])
                    continue

                print(f"\n{'=' * 80}")
                print(f"ROW {row_num}: Deleting {dataset_id}")
                print(f"{'=' * 80}")

                # Check dataset status
                dataset_info = get_dataset_info(dataset_id, api_key)

                if not dataset_info:
                    print(f"  ‚úó Dataset not found: {dataset_id}")
                    output_writer.writerow([dataset_id, '', 'ERROR', 'Dataset not found'])
                    continue

                status = dataset_info.get('status', 'unknown')
                print(f"  Dataset status: {status}")

                if status == 'published':
                    print(f"  ‚ùå Cannot delete published dataset via API")
                    print(f"     Contact Huma-Num staff for manual deletion")
                    output_writer.writerow([dataset_id, status, 'ERROR', 'Published datasets require manual deletion'])
                    continue

                # Delete dataset
                print(f"  Deleting dataset from server...")
                response = delete_dataset(dataset_id, api_key)

                if response.status_code == 204:
                    print(f"  ‚úì Dataset deleted successfully: {dataset_id}")
                    output_writer.writerow([dataset_id, status, 'DELETED', 'Success'])
                else:
                    print(f"  ‚úó Dataset deletion failed: {response.status_code}")
                    print(f"  Response: {response.text}")
                    output_writer.writerow([dataset_id, status, 'ERROR', response.text])

                time.sleep(1)  # Rate limiting

            except Exception as e:
                print(f"  ‚úó Error processing row {row_num}: {str(e)}")
                output_writer.writerow([dataset_id, '', 'ERROR', str(e)])

    output.close()
    print(f"\n‚úì Dataset deletions complete. Results saved to: {OUTPUT_DATASETS_CSV.name}")

# Execute deletions
if DATASETS_CSV.exists():
    delete_datasets(DATASETS_CSV, API_KEY)
else:
    print("‚ö† delete_data_items.csv not found, skipping dataset deletions")

## Step 5: Delete Collections

‚ö†Ô∏è **This will permanently delete collections from NAKALA!**

Note: Datasets inside collections will be **unlinked**, not deleted.

In [None]:
def delete_collections(csv_path: Path, api_key: str):
    """
    Delete collections from CSV file
    """
    # Prepare output CSV
    output = open(OUTPUT_COLLECTIONS_CSV, 'w', encoding='utf-8')
    output_writer = csv.writer(output)
    output_writer.writerow(['collection_id', 'result', 'response'])

    print("\n" + "=" * 80)
    print("STARTING COLLECTION DELETIONS")
    print("=" * 80)

    with open(csv_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)

        for row_num, row in enumerate(reader, 1):
            try:
                collection_id = row.get('collection_id', '').strip()
                confirm = row.get('confirm_delete', '').strip().upper()

                if not collection_id:
                    print(f"Row {row_num}: Missing collection_id")
                    continue

                if confirm != 'YES':
                    print(f"Row {row_num}: Deletion not confirmed (confirm_delete != 'YES')")
                    output_writer.writerow([collection_id, 'SKIPPED', 'Not confirmed'])
                    continue

                print(f"\n{'=' * 80}")
                print(f"COLLECTION {row_num}: Deleting {collection_id}")
                print(f"{'=' * 80}")

                # Delete collection
                print(f"  Deleting collection from server...")
                print(f"  ‚ö†Ô∏è  Note: Datasets inside will be UNLINKED, not deleted")
                response = delete_collection(collection_id, api_key)

                if response.status_code == 204:
                    print(f"  ‚úì Collection deleted successfully: {collection_id}")
                    output_writer.writerow([collection_id, 'DELETED', 'Success'])
                else:
                    print(f"  ‚úó Collection deletion failed: {response.status_code}")
                    print(f"  Response: {response.text}")
                    output_writer.writerow([collection_id, 'ERROR', response.text])

                time.sleep(1)  # Rate limiting

            except Exception as e:
                print(f"  ‚úó Error processing collection {row_num}: {str(e)}")
                output_writer.writerow([collection_id, 'ERROR', str(e)])

    output.close()
    print(f"\n‚úì Collection deletions complete. Results saved to: {OUTPUT_COLLECTIONS_CSV.name}")

# Execute deletions
if COLLECTIONS_CSV.exists():
    delete_collections(COLLECTIONS_CSV, API_KEY)
else:
    print("‚ö† delete_collections.csv not found, skipping collection deletions")

## ‚úÖ Summary

### What Was Deleted

Check the `data/` directory for:
- `output_deletions_datasets.csv` - Deletion results for datasets
- `output_deletions_collections.csv` - Deletion results for collections

### Workshop Complete! üéâ

You've successfully completed the NAKALA batch operations workflow:
1. ‚úÖ **Created** datasets and collections from CSV
2. ‚úÖ **Modified** metadata using CSV files
3. ‚úÖ **Deleted** resources for cleanup

### Next Steps

- **Run again**: You can repeat this workflow with your own CSV files!

---

**Thank you for participating in this workshop!**