# Step 4: Annotation + QA

Now we annotate the selected batch. This step covers:
1. Setting up a consistent annotation schema
2. **Actually labeling in the App** (not simulated)
3. QA checks before training

> **Time commitment:** Plan 1-2 minutes per sample for careful annotation. Start with 10-20 samples to get the feel, then continue or use the fast-forward option.

In [None]:
import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.load_dataset("annotation_tutorial")
batch_v0 = dataset.load_saved_view("batch_v0")

print(f"Batch v0: {len(batch_v0)} samples to annotate")

## Define Your Schema

Before labeling, define the rules. This prevents class drift and maintains consistency.

In [None]:
# Define annotation schema
LABEL_FIELD = "human_labels"  # Use this exact name in the App

SCHEMA = {
    "classes": [
        "person", "car", "truck", "bus", "motorcycle", "bicycle",
        "dog", "cat", "bird", "horse",
        "chair", "couch", "dining table", "tv",
        "bottle", "cup", "bowl",
        "other"  # catch-all for edge cases
    ],
    "field_name": LABEL_FIELD
}

SCHEMA_CLASSES = set(SCHEMA["classes"])

# Store in dataset for reference
dataset.info["annotation_schema"] = SCHEMA
dataset.save()

print(f"Schema defined: {len(SCHEMA['classes'])} classes")
print(f"Target field: {LABEL_FIELD}")
print(f"\nWhen you create a field in the App, name it exactly: {LABEL_FIELD}")

## Annotate in the App

**This is the real labeling step.** Open the App and annotate your selected samples.

### Setup (one time)
1. Launch the App with your batch
2. Click a sample to open the modal
3. Click the **Annotate** tab (pencil icon)
4. Click **Schema** -> **Add Field** -> name it exactly `human_labels`
5. Set type to **Detections** and add your classes

### For each sample
1. Review the image
2. Click **Detection** button (square icon)
3. Draw boxes around all objects of interest
4. Assign the correct class to each box
5. Move to the next sample

### Tips
- **Be consistent:** Same object = same class, every time
- **Tight boxes:** As close as possible without cutting off the object
- **Don't skip:** If it's ambiguous, label it "other" rather than skipping

In [None]:
# Launch App with your batch
session = fo.launch_app(batch_v0)

### Stop here and annotate samples

Take 15-30 minutes to actually label some samples. This is the core skill.

When you're done (or want to fast-forward), continue below.

---

## Fast-Forward Option

If you want to proceed without labeling everything manually, set `FAST_FORWARD = True` below. This copies `ground_truth` labels to `human_labels` to simulate completed annotation.

> **Note:** In real projects, there's no shortcut. Label quality determines model quality.

In [None]:
# Set to True ONLY if you want to skip manual annotation
# Default is False - you should label samples yourself
FAST_FORWARD = False

if FAST_FORWARD:
    print("Fast-forwarding: copying ground_truth to human_labels...")
    print(f"Filtering to schema classes: {len(SCHEMA_CLASSES)} classes")
    
    copied = 0
    skipped = 0
    
    for sample in batch_v0:
        if sample.ground_truth:
            # Only copy detections that match our schema
            human_dets = []
            for det in sample.ground_truth.detections:
                if det.label in SCHEMA_CLASSES:
                    human_dets.append(fo.Detection(
                        label=det.label,
                        bounding_box=det.bounding_box,
                    ))
                    copied += 1
                else:
                    skipped += 1
            sample[LABEL_FIELD] = fo.Detections(detections=human_dets)
        else:
            sample[LABEL_FIELD] = fo.Detections(detections=[])
        sample.save()
    
    print(f"Copied {copied} detections, skipped {skipped} (not in schema)")
else:
    print("Using your manual annotations.")
    print(f"Make sure you created the '{LABEL_FIELD}' field and labeled samples in the App!")

## Mark Annotated Samples

**Important:** We only mark samples as "annotated" if they actually have labels. This prevents training on unlabeled data.

In [None]:
# Reload to see changes
dataset.reload()

# Find samples that actually have labels
batch_samples = dataset.match_tags("batch:v0")

if LABEL_FIELD in dataset.get_field_schema():
    has_labels = batch_samples.match(F(f"{LABEL_FIELD}.detections").length() > 0)
    no_labels = batch_samples.match(
        (F(LABEL_FIELD) == None) | (F(f"{LABEL_FIELD}.detections").length() == 0)
    )
    
    print(f"Batch v0 status:")
    print(f"  With labels: {len(has_labels)}")
    print(f"  Without labels: {len(no_labels)}")
    
    if len(has_labels) == 0:
        print(f"\n>>> No samples have labels in '{LABEL_FIELD}'.")
        print(">>> Either label some samples in the App, or set FAST_FORWARD = True above.")
    else:
        # Tag ONLY samples that have labels as annotated
        has_labels.untag_samples("to_annotate")
        has_labels.tag_samples("annotated:v0")
        has_labels.set_values("annotation_status", ["annotated"] * len(has_labels))
        
        # Mark unlabeled samples as still needing annotation
        if len(no_labels) > 0:
            no_labels.set_values("annotation_status", ["pending"] * len(no_labels))
        
        print(f"\nTagged {len(has_labels)} samples as 'annotated:v0'")
        if len(no_labels) > 0:
            print(f"{len(no_labels)} samples still need annotation.")
else:
    print(f"Field '{LABEL_FIELD}' not found. Create it in the App first.")

## QA Checks

Before training, verify label quality.

In [None]:
# Get annotated samples
annotated = dataset.match_tags("annotated:v0")

if len(annotated) == 0:
    print("No annotated samples yet. Complete the annotation step above first.")
else:
    print(f"QA Check 1: Label coverage")
    print(f"  Annotated samples: {len(annotated)}")

In [None]:
# Check 2: Class distribution
from collections import Counter

if len(annotated) > 0:
    all_labels = []
    for sample in annotated:
        if sample[LABEL_FIELD]:
            all_labels.extend([d.label for d in sample[LABEL_FIELD].detections])

    print(f"QA Check 2: Class distribution ({len(all_labels)} total detections)")
    for label, count in Counter(all_labels).most_common(10):
        print(f"  {label}: {count}")

In [None]:
# Check 3: Unexpected classes
if len(annotated) > 0 and len(all_labels) > 0:
    actual = set(all_labels)
    unexpected = actual - SCHEMA_CLASSES

    if unexpected:
        print(f"QA Check 3: Unexpected classes found: {unexpected}")
        print("   These don't match your schema. Review before training.")
    else:
        print(f"QA Check 3: All classes match schema")

## Summary

You annotated Batch v0:
- Defined a schema for consistency
- Labeled samples in the App (or fast-forwarded)
- **Only samples with actual labels** were marked as annotated
- Ran QA checks: coverage, class distribution, schema compliance

**Artifacts:**
- `human_labels` field with your annotations
- `annotated:v0` tag on samples that have labels

**Next:** Step 5 - Train + Evaluate