# Data Quality and Uncertainty

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ucid-foundation/ucid/blob/main/notebooks/19_data_quality_uncertainty.ipynb)

---

## Overview

Assess UCID data quality and quantify uncertainty:

1. Data completeness metrics
2. Confidence scoring
3. Uncertainty propagation
4. Quality flags

---

In [None]:
%pip install -q ucid

In [None]:
import ucid

print(f"UCID version: {ucid.__version__}")

---

## 1. Data Quality Metrics

In [None]:
# Quality dimensions
quality_metrics = {
    "completeness": "% of required data present",
    "accuracy": "Correctness of values",
    "timeliness": "Data freshness",
    "consistency": "Agreement across sources",
    "validity": "Conformance to constraints",
}

print("Data Quality Dimensions:")
for metric, desc in quality_metrics.items():
    print(f"  {metric}: {desc}")

---

## 2. Confidence Scoring

In [None]:
def calculate_confidence(completeness, source_count, age_days):
    """Calculate confidence score based on data quality factors."""
    base = completeness * 100
    source_bonus = min(source_count * 5, 20)
    age_penalty = min(age_days * 0.5, 30)
    return max(0, min(100, base + source_bonus - age_penalty))


# Examples
examples = [
    {"completeness": 0.95, "sources": 3, "age": 7},
    {"completeness": 0.80, "sources": 1, "age": 30},
    {"completeness": 0.60, "sources": 1, "age": 90},
]

for ex in examples:
    conf = calculate_confidence(ex["completeness"], ex["sources"], ex["age"])
    print(
        f"Completeness: {ex['completeness']:.0%}, Sources: {ex['sources']}, Age: {ex['age']}d -> Confidence: {conf:.0f}%"
    )

---

## 3. Uncertainty Ranges

In [None]:
# Score with uncertainty
score = 72
confidence = 85

# Calculate uncertainty range
uncertainty = (100 - confidence) * 0.5
lower = max(0, score - uncertainty)
upper = min(100, score + uncertainty)

print(f"Score: {score} Â± {uncertainty:.1f}")
print(f"Range: [{lower:.1f}, {upper:.1f}]")

---

## Summary

Key concepts:
- Multiple quality dimensions
- Confidence reflects data reliability
- Uncertainty bounds on scores

---

*Copyright 2026 UCID Foundation. Licensed under EUPL-1.2.*