# Notebook 2: Batch Modify

## üéØ Objective

Modify existing datasets and collections on NAKALA using CSV files.

## üìã What This Notebook Does

1. Reads `modification_data_items.csv` and `modification_collections.csv`
2. Applies metadata changes to existing NAKALA resources
3. Saves results to output CSVs

## üîÑ Workflow

```
Edit CSVs ‚Üí Load Modifications ‚Üí Apply to NAKALA ‚Üí Verify Changes
```

## ‚ö†Ô∏è Important

**Before running this notebook**:
1. Review `modification_data_items.csv` and `modification_collections.csv`
2. Edit them to change metadata as desired
3. The "v2" and "(updated)" markers are just examples - change them!

---

## Step 1: Setup and Imports

In [None]:
import sys
import os
from pathlib import Path

# Add parent directory to path
sys.path.insert(0, str(Path.cwd().parent))

import csv
import json
import time

# Import from nakala package
from nakala import CsvConverter
from nakala.config import API_URL
from nakala.api_client import get_dataset, modify_dataset, get_collection, modify_collection

# API key
API_KEY = os.getenv('NAKALA_API_KEY', 'aae99aba-476e-4ff2-2886-0aaf1bfa6fd2')

print("‚úì Imports successful")
print(f"‚úì API URL: {API_URL}")
print(f"‚úì Using test API key")

## Step 2: Configure Paths

In [None]:
# Set up paths
BASE_PATH = Path.cwd().parent
DATA_PATH = BASE_PATH / 'data'

# Input CSVs
DATASETS_CSV = DATA_PATH / 'modification_data_items.csv'
COLLECTIONS_CSV = DATA_PATH / 'modification_collections.csv'

# Output CSVs
OUTPUT_DATASETS_CSV = DATA_PATH / 'output_modifications_datasets.csv'
OUTPUT_COLLECTIONS_CSV = DATA_PATH / 'output_modifications_collections.csv'

print(f"‚úì Data path: {DATA_PATH}")
print(f"\n‚úì Input CSVs:")
print(f"  - {DATASETS_CSV.name}: {'‚úì exists' if DATASETS_CSV.exists() else '‚úó missing'}")
print(f"  - {COLLECTIONS_CSV.name}: {'‚úì exists' if COLLECTIONS_CSV.exists() else '‚úó missing'}")

## Step 3: Preview Modification CSVs

Let's see what modifications we're about to apply:

In [None]:
# Preview datasets modifications
if DATASETS_CSV.exists():
    print("=" * 80)
    print("DATASET MODIFICATIONS (modification_data_items.csv)")
    print("=" * 80)

    with open(DATASETS_CSV, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, 1):
            print(f"\nDataset {i}:")
            print(f"  ID: {row.get('dataset_id', 'N/A')}")
            print(f"  New Title: {row.get('title', 'N/A')[:60]}...")
            print(f"  New Description: {row.get('description', 'N/A')[:60]}...")
            print(f"  Status: {row.get('status', 'N/A')}")
else:
    print("‚ö† modification_data_items.csv not found")

In [None]:
# Preview collections modifications
if COLLECTIONS_CSV.exists():
    print("\n" + "=" * 80)
    print("COLLECTION MODIFICATIONS (modification_collections.csv)")
    print("=" * 80)

    with open(COLLECTIONS_CSV, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, 1):
            print(f"\nCollection {i}:")
            print(f"  ID: {row.get('collection_id', 'N/A')}")
            print(f"  New Title: {row.get('title', 'N/A')[:60]}...")
            print(f"  New Description: {row.get('description', 'N/A')[:60]}...")
            print(f"  Status: {row.get('status', 'N/A')}")
else:
    print("‚ö† modification_collections.csv not found")

## Step 4: Modify Datasets

Apply metadata changes to datasets:

In [None]:
def modify_datasets(csv_path: Path, base_path: Path, api_key: str):
    """
    Modify datasets from CSV file
    """
    converter = CsvConverter()

    # Prepare output CSV
    output = open(OUTPUT_DATASETS_CSV, 'w', encoding='utf-8')
    output_writer = csv.writer(output)
    output_writer.writerow(['dataset_id', 'title', 'status', 'result', 'response'])

    print("=" * 80)
    print("STARTING DATASET MODIFICATIONS")
    print("=" * 80)

    with open(csv_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)

        for row_num, row in enumerate(reader, 1):
            try:
                dataset_id = row.get('dataset_id', '').strip()

                if not dataset_id:
                    print(f"Row {row_num}: Missing dataset_id")
                    continue

                print(f"\n{'=' * 80}")
                print(f"ROW {row_num}: Modifying {dataset_id}")
                print(f"{'=' * 80}")

                output_data = [dataset_id, row.get('title', ''), '', '', '']

                # Get existing dataset
                existing = get_dataset(dataset_id, api_key)

                if not existing:
                    print(f"  ‚úó Dataset not found: {dataset_id}")
                    output_data[2] = 'ERROR'
                    output_data[3] = 'Dataset not found'
                    output_writer.writerow(output_data)
                    continue

                print(f"  ‚úì Found existing dataset")

                # Build new metadata from CSV
                new_metas = converter.csv_row_to_nakala_metas(row)

                # Get existing files (preserve)
                existing_files = existing.get('files', [])

                # Build modified dataset
                modified_dataset = {
                    'status': row.get('status', 'pending').strip(),
                    'files': existing_files,
                    'metas': new_metas
                }

                print(f"  ‚úì Modified dataset JSON prepared ({len(new_metas)} metadata objects)")

                # Modify dataset on NAKALA
                print("  Modifying dataset on NAKALA...")
                response = modify_dataset(dataset_id, modified_dataset, api_key)

                if response.status_code == 204:
                    print(f"  ‚úì Dataset modified successfully: {dataset_id}")
                    output_data[2] = 'OK'
                    output_data[3] = 'Modified'
                    output_data[4] = 'Success'
                else:
                    print(f"  ‚úó Dataset modification failed: {response.status_code}")
                    print(f"  Response: {response.text}")
                    output_data[2] = 'ERROR'
                    output_data[3] = f'Failed: {response.status_code}'
                    output_data[4] = response.text

                output_writer.writerow(output_data)
                time.sleep(1)  # Rate limiting

            except Exception as e:
                print(f"  ‚úó Error processing row {row_num}: {str(e)}")
                output_data[2] = 'ERROR'
                output_data[3] = str(e)
                output_writer.writerow(output_data)

    output.close()
    print(f"\n‚úì Dataset modifications complete. Results saved to: {OUTPUT_DATASETS_CSV.name}")

# Execute modifications
if DATASETS_CSV.exists():
    modify_datasets(DATASETS_CSV, BASE_PATH, API_KEY)
else:
    print("‚ö† modification_data_items.csv not found, skipping dataset modifications")

## Step 5: Modify Collections

Apply metadata changes to collections:

In [None]:
def modify_collections(csv_path: Path, api_key: str):
    """
    Modify collections from CSV file
    """
    converter = CsvConverter()

    # Prepare output CSV
    output = open(OUTPUT_COLLECTIONS_CSV, 'w', encoding='utf-8')
    output_writer = csv.writer(output)
    output_writer.writerow(['collection_id', 'title', 'status', 'result', 'response'])

    print("\n" + "=" * 80)
    print("STARTING COLLECTION MODIFICATIONS")
    print("=" * 80)

    with open(csv_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)

        for row_num, row in enumerate(reader, 1):
            try:
                collection_id = row.get('collection_id', '').strip()

                if not collection_id:
                    print(f"Row {row_num}: Missing collection_id")
                    continue

                print(f"\n{'=' * 80}")
                print(f"COLLECTION {row_num}: Modifying {collection_id}")
                print(f"{'=' * 80}")

                output_data = [collection_id, row.get('title', ''), '', '', '']

                # Build metadata
                metas = []

                # Title (multilingual)
                if row.get('title'):
                    lang_parts = converter.parse_multilingual_field(row['title'])
                    for part in lang_parts:
                        meta = {
                            "propertyUri": converter.property_uris['title'],
                            "value": part['value']
                        }
                        if part['lang']:
                            meta['lang'] = part['lang']
                        metas.append(meta)

                # Description (multilingual)
                if row.get('description'):
                    lang_parts = converter.parse_multilingual_field(row['description'])
                    for part in lang_parts:
                        meta = {
                            "propertyUri": converter.property_uris['description'],
                            "value": part['value'],
                            "typeUri": "http://www.w3.org/2001/XMLSchema#string"
                        }
                        if part['lang']:
                            meta['lang'] = part['lang']
                        metas.append(meta)

                # Keywords
                if row.get('keywords'):
                    lang_parts = converter.parse_multilingual_field(row['keywords'])
                    for part in lang_parts:
                        keywords = converter.parse_multiple_values(part['value'])
                        for keyword in keywords:
                            meta = {
                                "propertyUri": converter.property_uris['subject'],
                                "value": keyword,
                                "typeUri": "http://www.w3.org/2001/XMLSchema#string"
                            }
                            if part['lang']:
                                meta['lang'] = part['lang']
                            metas.append(meta)

                # Build collection JSON
                modified_collection = {
                    'status': row.get('status', 'private').strip(),
                    'metas': metas
                }

                print(f"  ‚úì Modified collection JSON prepared ({len(metas)} metadata objects)")

                # Modify collection on NAKALA
                print("  Modifying collection on NAKALA...")
                response = modify_collection(collection_id, modified_collection, api_key)

                if response.status_code == 204:
                    print(f"  ‚úì Collection modified successfully: {collection_id}")
                    output_data[2] = 'OK'
                    output_data[3] = 'Modified'
                    output_data[4] = 'Success'
                else:
                    print(f"  ‚úó Collection modification failed: {response.status_code}")
                    print(f"  Response: {response.text}")
                    output_data[2] = 'ERROR'
                    output_data[3] = f'Failed: {response.status_code}'
                    output_data[4] = response.text

                output_writer.writerow(output_data)
                time.sleep(1)  # Rate limiting

            except Exception as e:
                print(f"  ‚úó Error processing collection {row_num}: {str(e)}")
                output_data[2] = 'ERROR'
                output_data[3] = str(e)
                output_writer.writerow(output_data)

    output.close()
    print(f"\n‚úì Collection modifications complete. Results saved to: {OUTPUT_COLLECTIONS_CSV.name}")

# Execute modifications
if COLLECTIONS_CSV.exists():
    modify_collections(COLLECTIONS_CSV, API_KEY)
else:
    print("‚ö† modification_collections.csv not found, skipping collection modifications")

## ‚úÖ Summary

### What Was Modified

Check the `data/` directory for:
- `output_modifications_datasets.csv` - Modification results for datasets
- `output_modifications_collections.csv` - Modification results for collections

### Next Steps

1. **Verify** your changes on NAKALA test site
2. **Run** `3_batch_delete.ipynb` to clean up resources

---

**Tip**: You can run this notebook multiple times with different modifications!