Skip to content

Add support for comprehensive time precision handling and multiple observations per person #210

Merged
eroell merged 11 commits into
mainfrom
fix/omop-io
Jan 22, 2026
Merged

Add support for comprehensive time precision handling and multiple observations per person #210
eroell merged 11 commits into
mainfrom
fix/omop-io

Conversation

@eroell
Copy link
Copy Markdown
Collaborator

@eroell eroell commented Jan 21, 2026

Add support for comprehensive time precision handling and multiple observations per person

TL;DR

  • ehrdata.io.omop.setup_variables/ehrdata.io.omop.setup_interval_variables now allow to specify with a new argument time_precision either date or datetime
  • E.g. multiple visit_occurrences per person is fine
  • Smarter warnings and bugfixes

Summary
This PR enhances the OMOP I/O functionality to handle multiple observations per person (e.g., multiple visits, observation periods) and adds more flexible time precision validation with helpful warnings. The changes enable more flexible and accurate temporal data extraction from OMOP CDM databases.

Key Features

  1. Time Precision Support
    New Parameter: time_precision can be "date" or "datetime" to control temporal granularity.
    Example:
# For sub-day precision with hourly intervals
edata = ed.io.omop.setup_variables(
    edata,
    backend_handle=con,
    layer="vitals",
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=6,
    interval_length_unit="hour",  # Hourly measurements
    time_precision="datetime",  # ✓ Use datetime for sub-day intervals
    num_intervals=48,  # 2 days of 6-hour intervals
)

# For daily aggregation
edata = ed.io.omop.setup_variables(
    edata,
    backend_handle=con,
    layer="labs",
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=1,
    interval_length_unit="day",
    time_precision="date",  # Date precision is fine for daily intervals
    num_intervals=30,
)
  1. Multiple Observations Per Person Support
    Previously: Each person could only have one row in the resulting EHRData object.
    Now: Rows represent individual observation units (visits, observation periods, cohorts, or persons), and a single person can have multiple rows. This failed with an error before.

Example:

import ehrdata as ed

# Patient with multiple hospital visits
edata = ed.io.omop.setup_obs(
    backend_handle=con, 
    observation_table="person_visit_occurrence"
)
# If patient 1 has 3 visits: edata will have 3 rows for that patient
# Each row has a unique visit_occurrence_id

edata = ed.io.omop.setup_variables(
    edata,
    backend_handle=con,
    layer="measurements",
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=1,
    interval_length_unit="day",
    num_intervals=7,  # 7 days from each visit start
)

# Access data for specific visit
visit_2_data = edata[edata.obs["visit_occurrence_id"] == 2]
  1. Person Table Support with Validation
    Now Supported: Use observation_table="person" for lifetime analysis, if birth_datetime is available.
    Example:
# Lifetime analysis starting from birth
edata = ed.io.omop.setup_obs(
    backend_handle=con,
    observation_table="person"  # One row per person
)

# Requires all persons to have birth_datetime populated
edata = ed.io.omop.setup_variables(
    edata,
    backend_handle=con,
    layer="lifetime_measurements",
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=1,
    interval_length_unit="year",
    time_precision="date",
    num_intervals=100,  # Up to 100 years from birth
)
# ValueError raised if any person lacks birth_datetime

Smarter Warnings
Warning 1: Time Precision Mismatch
Warns when using fine-grained intervals with date-only precision:

edata = ed.io.omop.setup_variables(
    edata,
    backend_handle=con,
    layer="hourly_vitals",
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=1,
    interval_length_unit="hour",  # Fine-grained
    time_precision="date",  # Mismatch!
    num_intervals=24,
)
# Warning: Using interval_length_unit='hour' with time_precision='date' 
# may lead to unexpected results. Consider using time_precision='datetime'.

Warning 2: Datetime Precision Fallback
Logs when datetime precision is requested but only date columns are available:

edata = ed.io.omop.setup_interval_variables(
    edata,
    backend_handle=con,
    layer="drug_eras",
    data_tables=["drug_era"],  # Only has date columns
    data_field_to_keep=["is_present"],
    interval_length_number=1,
    interval_length_unit="day",
    time_precision="datetime",  # Not available for drug_era
    num_intervals=30,
)
# Warning: Time precision datetime not available for data table drug_era. 
# Using 'date' and midnight (00:00:00) as 'datetime' instead.

Bug Fixes
Fix 1: Interval Boundary Handling
Problem: Measurements at interval boundaries were counted in both adjacent intervals.
Before:

Measurement at 2024-01-01 12:00:00
Interval 1: [2024-01-01 00:00:00, 2024-01-01 12:00:00] → ✓ Included
Interval 2: [2024-01-01 12:00:00, 2024-01-02 00:00:00] → ✓ Included (DUPLICATE!)

After:

Measurement at 2024-01-01 12:00:00
Interval 1: [2024-01-01 00:00:00, 2024-01-01 12:00:00) → ✓ Included
Interval 2: [2024-01-01 12:00:00, 2024-01-02 00:00:00) → ✗ Not included (CORRECT!)

Changed from BETWEEN start AND end to half-open intervals >= start AND < end.

Fix 2: Chronological LAST/FIRST Aggregation
Problem: LAST() without ordering could return arbitrary values, not the chronologically last value.
Before:

# Multiple measurements on same day, LAST() returns arbitrary value
measurements = [
    (2024-01-01 08:00, value=20),
    (2024-01-01 14:00, value=25),  # Chronologically last
    (2024-01-01 10:00, value=21),
]
result = LAST(value)  # Could return 20, 25, or 21 (arbitrary)

After:

result = LAST(value ORDER BY measurement_datetime)  # Always returns 25

@eroell eroell changed the title Add support for multiple observations per person and comprehensive time precision handling Add support for comprehensive time precision handling and multiple observations per person Jan 22, 2026
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@eroell eroell marked this pull request as ready for review January 22, 2026 11:08
@eroell eroell merged commit 72704e8 into main Jan 22, 2026
12 checks passed
@eroell eroell deleted the fix/omop-io branch January 22, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant