1. Ensure you have data structured longitudinally i.e. for each person, have the FEV1 and O2 data as time series.
2. Then run the current ‘point in time model’ (FEV1 only) for each FEV1 measurement for a person.
3. Produce a visualisation of the output of the model which shows all time point at once e.g. have time on the x-axis and some display of the uncertain distributions of the latent variables on the y axis. Could be sideways bar charts or box plots or whatever looks best.
4. Then we need to make a longitudinal model which has all data points at once for an individual. The simplest way to do this is to separate out the lung damage and airway blockage variables (as we always intended) and have **one** lung damage variable shared across all time points. This represents the assumption that lung damage is constant at this time scale. Then have separate blockage variables for each time point. Try to get this model running if you can + update your visualisation to include the extra lung damage latent variable.

In [1]:
import sys

sys.path.append("../../")
sys.path.append("../data/")

import O2_FEV1_df
import model_lung_health
import biology as bio

import pandas as pd
import numpy as np

from matplotlib.gridspec import GridSpec
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx

plotsdir = "../../../../PlotsSmartcare/"


def add_heatmap_to_fig(df, ax, colors, title=""):
    df = df.reindex(index=df.index[::-1])
    sns.heatmap(
        df, cmap=colors, annot=True, fmt=".2f", linewidths=0.5, ax=ax, cbar=False
    )
    ax.set_title(title)
    return -1

In [2]:
O2_FEV1 = O2_FEV1_df.create()



** Loading measurements data **

* Dropping unnecessary columns from measurements data *
Columns filtered ['User ID', 'UserName', 'Recording Type', 'Date/Time recorded', 'FEV 1', 'Weight in Kg', 'O2 Saturation', 'Pulse (BPM)', 'Rating', 'Temp (deg C)']
Dropping columns {'Calories', 'FEV 1 %', 'Activity - Steps', 'Sputum sample taken?', 'Activity - Points', 'FEV 10', 'Predicted FEV'}

* Renaming columns *
Renamed columns {'Date/Time recorded': 'Date recorded', 'FEV 1': 'FEV1', 'Weight in Kg': 'Weight (kg)'}


  df = pd.read_csv(datadir + "mydata.csv")



* Applying data sanity checks *

FEV1
Dropping 1 entries with FEV1 = 3.45 for user Kings004

Weight (kg)
Dropping 2 entries with Weight (kg) = 6.0 for user Papworth033
Dropping 1 entries with Weight (kg) = 0.55 for user Kings013
Dropping 1 entries with Weight (kg) = 8.262500000000001 for user Papworth017
Dropping 1 entries with Weight (kg) = 1056.0 for user leeds01730
Dropping 1 entries with Weight (kg) = 20.0 for user Papworth019

Pulse (BPM)
Dropping 14 entries with Pulse (BPM) == 511)
       Pulse (BPM)      UserName
60638        511.0   Papworth002
60989        511.0   Papworth001
61026        511.0    leeds01050
61374        511.0    leeds01320
63126        511.0      Kings005
63525        511.0    leeds01222
65022        511.0      Kings001
65563        511.0    leeds01253
65805        511.0      Kings012
67006        511.0   Papworth032
68674        511.0   Papworth028
69260        511.0   Papworth028
71159        511.0        FPH007
72361        511.0  BRISTOLSC021
Dropping 1 

  for idx, row in parser.parse():
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.Height.loc[df.ID == "60"] = tmp * 100
  for idx, row in parser.parse():



* Dropping unnecessary columns from patient data *
Columns filtered: ['ID', 'Study Date', 'DOB', 'Age', 'Sex', 'Height', 'Weight', 'Predicted FEV1', 'FEV1 Set As']
Columns dropped: {'GP Letter Sent', 'Remote Monitoring App User ID', 'Freezer Required', 'Pulmonary Exacerbation', 'Unable Sputum Samples', 'Sputum Samples', 'Study Email', 'Telemetric Measures', 'Age 18 Years', 'CFQR Quest Comp', 'Study Number', 'Date Last PE Start', 'Comments', 'Date Last PE Stop', 'Informed Consent', 'Date Consent Obtained', 'Genetic Testing', 'Unable Informed Consent', 'Hospital', 'Transplant Recipients', 'Inconvenience Payment', 'Less Exacerbation'}

* Correcting patient data *
ID 60: Corrected height 60 from 1.63 to 163.0
ID 66: Corrected height for ID 66 from 1.62 to 162.0
Replace Age by calculate age
Drop FEV1 Set As and Predicted FEV1
Compute Calculated Predicted FEV1

* Applying data sanity checks *
Loaded patient data with 147 entries (147 initially)

** Loading antibiotics data **

* Dropping un

# Point-in-time inference across time for HFEV1, AB, FEV1

In [None]:
for id in ["131"]:  # O2_FEV1.ID.unique()[0:50]:
    df_for_ID = O2_FEV1[O2_FEV1.ID == id]

    # Create model tailored to patient
    height = df_for_ID.Height.values[0]
    age = df_for_ID.Age.values[0]
    sex = df_for_ID.Sex.values[0]
    HFEV1_prior = model_lung_health.set_HFEV1_prior("gaussian", height, age, sex)
    (
        inference,
        FEV1,
        HFEV1,
        prior_HFEV1,
        AB,
        prior_AB,
    ) = model_lung_health.build_HFEV1_AB_FEV1(healthy_FEV1_prior=HFEV1_prior)

    # Prepare inference
    ## Set data structures for priors
    df_hfev1_prior = pd.DataFrame(
        index=HFEV1.bins_str, columns=["prior"], data=prior_HFEV1.values
    )
    df_ab_prior = pd.DataFrame(
        index=AB.bins_str, columns=["prior"], data=prior_AB.values
    )
    ## Set data structures for posteriors
    fev1 = df_for_ID.FEV1.values
    days = df_for_ID["Date recorded"].astype(str).values
    ## Create empty dataframe of with HFEV1.bins as index name, and days as column names
    df_hfev1_posterior = pd.DataFrame(index=HFEV1.bins_str, columns=days)
    df_ab_posterior = pd.DataFrame(index=AB.bins_str, columns=days)

    # Run inference queries
    for i in range(len(fev1)):
        res_inf = model_lung_health.infer(
            inference, [HFEV1, AB], [[FEV1, fev1[i]]], joint=False
        )
        df_hfev1_posterior[days[i]] = res_inf[HFEV1.name].values
        df_ab_posterior[days[i]] = res_inf[AB.name].values

    # Create heatmap using sns with df_hfev1_posterior index on x, columns on y, coloured by values
    title = f"Point-in-time inference of HFEV1 and AB for ID {id} ({height}cm, {age}yr, {sex})"

    fig, axs = plt.subplots(
        3,
        2,
        figsize=(len(fev1) * 0.5 + 1, 30),
        gridspec_kw={"height_ratios": [1, 3, 2], "width_ratios": [2, len(fev1)]},
    )

    fig.suptitle(title, fontsize=16, y=1.005)
    sns.scatterplot(x=days, y=fev1, ax=axs[0, 1])

    # Add heatmaps of priors and posteriors
    add_heatmap_to_fig(df_hfev1_prior, axs[1, 0], "Blues", "HFEV1 prior")
    add_heatmap_to_fig(df_hfev1_posterior, axs[1, 1], "Blues", "HFEV1 posteriors")
    add_heatmap_to_fig(df_ab_prior, axs[2, 0], "Greens", "AB prior")
    add_heatmap_to_fig(df_ab_posterior, axs[2, 1], "Greens", "AB posteriors")

    plt.tight_layout()
    plt.savefig(f"{plotsdir}point_in_time_inference/{title}.png")
    # plt.close()

# Point-in-time inference across time for HFEV1, LD, UFEV1, SAB, FEV1

In [16]:
for id in ["131"]:  # O2_FEV1.ID.unique()[0:50]:
    print("ID: ", id)
    df_for_ID = O2_FEV1[O2_FEV1.ID == id]

    # Create model tailored to patient
    height = df_for_ID.Height.values[0]
    age = df_for_ID.Age.values[0]
    sex = df_for_ID.Sex.values[0]
    HFEV1_prior = model_lung_health.set_HFEV1_prior("gaussian", height, age, sex)
    LD_param = model_lung_health.set_LD_prior(
        df_for_ID.FEV1.values, HFEV1_prior["mu"], HFEV1_prior["sigma"]
    )
    (
        inference,
        HFEV1,
        prior_HFEV1,
        LD,
        prior_LD,
        UFEV1,
        SAB,
        prior_SAB,
        FEV1,
    ) = model_lung_health.build_full_FEV1_side(
        HFEV1_prior=HFEV1_prior, LD_prior=LD_param
    )

    # Prepare inference
    ## Set data structures for priors
    df_ld_prior = pd.DataFrame(
        index=LD.bins_str, columns=["prior"], data=prior_LD.values
    )
    df_hfev1_prior = pd.DataFrame(
        index=HFEV1.bins_str, columns=["prior"], data=prior_HFEV1.values
    )
    df_sab_prior = pd.DataFrame(
        index=SAB.bins_str, columns=["prior"], data=prior_SAB.values
    )

    ## Set data structures for posteriors
    fev1 = df_for_ID.FEV1.values
    days = df_for_ID["Date recorded"].astype(str).values
    ## Create empty dataframe of with HFEV1.bins as index name, and days as column names
    df_ld_posterior = pd.DataFrame(index=LD.bins_str, columns=days)
    df_hfev1_posterior = pd.DataFrame(index=HFEV1.bins_str, columns=days)
    df_sab_posterior = pd.DataFrame(index=SAB.bins_str, columns=days)
    df_ufev1_posterior = pd.DataFrame(index=UFEV1.bins_str, columns=days)

    # Run inference queries
    for i in range(len(fev1)):
        res = model_lung_health.infer(
            inference, [HFEV1, LD, UFEV1, SAB], [[FEV1, fev1[i]]], joint=False
        )
        df_sab_posterior[days[i]] = res[SAB.name].values
        df_ufev1_posterior[days[i]] = res[UFEV1.name].values
        df_ld_posterior[days[i]] = res[LD.name].values
        df_hfev1_posterior[days[i]] = res[HFEV1.name].values

    # Create heatmap using sns with df_hfev1_posterior index on x, columns on y, coloured by values
    title = f"Point-in-time inference of HFEV1, LD, UFEV1, SAB, FEV1 for ID {id} ({height}cm, {age}yr, {sex})"

    fig, axs = plt.subplots(
        1 + 4,
        2,
        figsize=(len(fev1) * 0.5 + 1, 40),
        gridspec_kw={
            "height_ratios": [1, 3, 10, 3, 10],
            "width_ratios": [2, len(fev1)],
        },
    )

    fig.suptitle(title, fontsize=16, y=1.005)
    sns.scatterplot(x=days, y=fev1, ax=axs[0, 1])
    axs[0, 0].axis("off")

    # Add heatmaps of priors and posteriors
    ## Order df_hfev1_posterior by index descending (flip index)

    add_heatmap_to_fig(df_sab_prior, axs[1, 0], "Greens")
    add_heatmap_to_fig(df_sab_posterior, axs[1, 1], "Greens", "Small Airway Blockage")

    add_heatmap_to_fig(df_ufev1_posterior, axs[2, 1], "Blues", "Unblocked FEV1")

    add_heatmap_to_fig(df_ld_prior, axs[3, 0], "Reds")
    add_heatmap_to_fig(df_ld_posterior, axs[3, 1], "Reds", "Lung Damage")

    add_heatmap_to_fig(df_hfev1_prior, axs[4, 0], "Blues")
    add_heatmap_to_fig(df_hfev1_posterior, axs[4, 1], "Blues", "Healthy FEV1")

    plt.tight_layout()
    plt.savefig(f"{plotsdir}/point_in_time_inference/{title}.png")
    # plt.close()


ID:  131
Defining gaussian prior with mu = 3.90, sigma = 0.4
Defining gaussian prior with mu = 0.35, sigma = 0.4
Defining uniform prior until 0.35 L, then gaussian tail up to 0.8 L


KeyboardInterrupt: 

# Longitudinal model with LD shared across all times

In [None]:
# TODO: takes 1m16 to run for 1 patient (with 16 days of data)
for id in ['100', '101', '124', '128', '129', '152', '130']:
    print("ID: ", id)
    df_for_ID = O2_FEV1[O2_FEV1.ID == id]

    # Create model tailored to pat ient
    height = df_for_ID.Height.values[0]
    age = df_for_ID.Age.values[0]
    sex = df_for_ID.Sex.values[0]
    HFEV1_prior = model_lung_health.set_HFEV1_prior("gaussian", height, age, sex)
    (
        model,
        inference,
        HFEV1,
        prior_HFEV1,
        LD,
        prior_LD,
        UFEV1,
        SAB_list,
        prior_SAB_i,
        FEV1_list,
    ) = model_lung_health.build_longitudinal_FEV1_side(
        df_for_ID.shape[0], HFEV1_prior=HFEV1_prior
    )

    # Prepare inference
    ## Set data structures for priors
    df_ld_prior = pd.DataFrame(
        index=LD.bins_str, columns=["prior"], data=prior_LD.values
    )
    df_hfev1_prior = pd.DataFrame(
        index=HFEV1.bins_str, columns=["prior"], data=prior_HFEV1.values
    )
    df_sab_i_prior = pd.DataFrame(
        index=SAB_list[0].bins_str, columns=["prior"], data=prior_SAB_i.values
    )

    ## Set data structures for posteriors
    fev1 = df_for_ID.FEV1.values
    days = df_for_ID["Date recorded"].astype(str).values

    ## Create empty dataframe of with HFEV1.bins as index name, and days as column names
    df_ld_posterior = pd.DataFrame(index=LD.bins_str, columns=["posterior"])
    df_hfev1_posterior = pd.DataFrame(index=HFEV1.bins_str, columns=["posterior"])
    df_ufev1_posterior = pd.DataFrame(index=UFEV1.bins_str, columns=["posterior"])
    df_sab_posteriors = pd.DataFrame(index=SAB_list[0].bins_str, columns=days)

    # Element-wise join of FEV1 var and fev1 evidence
    evidences = list(zip(FEV1_list, fev1))

    # Run inference queries
    print("Run inference queries")
    res = model_lung_health.infer(inference, [HFEV1, LD, UFEV1], evidences, joint=False)
    df_hfev1_posterior["posterior"] = res[HFEV1.name].values
    df_ld_posterior["posterior"] = res[LD.name].values
    df_ufev1_posterior["posterior"] = res[UFEV1.name].values

    # Run SAB inference queries
    print("Run SAB inference queries")
    ## Running the inference takes 35s per SAB.
    ## Sometimes running the inference takes only 0.4s
    ## Potential cause: The run crashes when you try to infer too many variables at once because
    ## pgmpy computes the joint distribution of all variables, which hits
    ## numpy's restriction of max 32-dimensional matrices
    for i in range(len(SAB_list)):
        print(f"SAB: {i}")
        res = model_lung_health.infer(inference, [SAB_list[i]], evidences, joint=False)
        df_sab_posteriors[days[i]] = res[SAB_list[i].name].values

    # Create heatmap using sns with df_hfev1_posterior index on x, columns on y, coloured by values
title = f"Longitudinal inference with shared LD (ID, {id}, {height}cm, {age}yr, {sex})"

fig_width = 1 + round(len(fev1) / 2)
fig = plt.figure(figsize=(round(fig_width * 1.3 + 2), 20))

# 1 col unit is equivalent to the Index + 1 col, or 2 cols
gs = GridSpec(nrows=5, ncols=fig_width)

fig.suptitle(title, fontsize=16, y=1.005)

ax_scatter = plt.subplot(gs[4, 1:fig_width])
sns.scatterplot(x=days, y=fev1, ax=ax_scatter)

# Add heatmaps of priors and posteriors
ax = plt.subplot(gs[0:3, 0])
add_heatmap_to_fig(df_hfev1_prior, ax, "Blues")

ax = plt.subplot(gs[0:3, 1])
add_heatmap_to_fig(df_hfev1_posterior, ax, "Blues", "Healthy FEV1")

ax = plt.subplot(gs[1, 3])
add_heatmap_to_fig(df_ld_prior, ax, "Reds")
ax = plt.subplot(gs[1, 4])
add_heatmap_to_fig(df_ld_posterior, ax, "Reds", "Lung Damage")

ax = plt.subplot(gs[0:3, 6])
add_heatmap_to_fig(df_ufev1_posterior, ax, "Blues", "Unblocked FEV1")

ax = plt.subplot(gs[3, 0])
add_heatmap_to_fig(df_sab_i_prior, ax, "Greens")

ax = plt.subplot(gs[3, 1:fig_width])
add_heatmap_to_fig(df_sab_posteriors, ax, "Greens", "Small Airway Blockage")

plt.tight_layout()
plt.savefig(f"{plotsdir}longitudinal_inference_shared_ld/{title}.png")
plt.close()

ID:  100
*** Building the longitudinal model with LD as shared variable across time ***
Defining gaussian prior with mu = 3.61, sigma = 0.35
Run inference queries
Preprocess query
Run query
Query var to list
Find nodes with query vars
Conversion to set of tuple
Compute clique potentials
Variable elimination marginalize


  0%|          | 0/19 [00:00<?, ?it/s]

factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 18 (%):16) at 0x7fcde5744f70>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 11 (%):16) at 0x7fcde5745bd0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 17 (%):16) at 0x7fcde5745030>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 16 (%):16) at 0x7fcde57450f0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 0 (%):16) at 0x7fcde5747070>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 1 (%):16) at 0x7fcde5747130>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 8 (%):16) at 0x7fcde5747220>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 13 (%):16) at 0x7fcde5744100>]
factors: [<DiscreteFactor representing phi(Unblocke

  0%|          | 0/19 [00:00<?, ?it/s]

factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 18 (%):16) at 0x7fcde57446a0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 11 (%):16) at 0x7fcde5745090>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 17 (%):16) at 0x7fcde57459c0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 16 (%):16) at 0x7fcde5745780>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 1 (%):16) at 0x7fcde57440d0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 8 (%):16) at 0x7fcde5746d40>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 13 (%):16) at 0x7fcde57467d0>]
factors: [<DiscreteFactor representing phi(Unblocked FEV1 (L):58, Small Airway Blockage 9 (%):16) at 0x7fcde5744d00>]
factors: [<DiscreteFactor representing phi(Unblocke

In [None]:
# model.get_cpds()

# nx.draw(
#     model,
#     with_labels=True,
#     node_size=2000,
#     node_color="skyblue",
#     node_shape="o",
#     alpha=0.7,
#     linewidths=5,
# )
# plt.show()
# model.nodes()


In [6]:
fig = plt.figure(figsize=(10, 20))
plt.savefig(f"{plotsdir}/longitudinal_inference_shared_ld/hey.png")

<Figure size 720x1440 with 0 Axes>