-
Notifications
You must be signed in to change notification settings - Fork 78
Draft (new feature) : Model to estimate when a intervention had effect #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Draft (new feature) : Model to estimate when a intervention had effect #480
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #480 +/- ##
==========================================
+ Coverage 94.40% 94.42% +0.01%
==========================================
Files 29 29
Lines 2075 2241 +166
==========================================
+ Hits 1959 2116 +157
- Misses 116 125 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…over intervention's distributions
b7b91de
to
ee701f2
Compare
NoteIn my last PR, I added functionality allowing users to specify priors for the effect of the intervention. This provides more flexibility for Bayesian modeling and allows users to incorporate domain knowledge directly into the inference process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JeanVanDyk. I'm very excited about the new addition, thanks for putting it together. Bear with me if I'm sometimes slow to reply - busy dad life!
At this point, could I request that you put together an ipynb
file that showcases the functionality. I'm assuming this PR will evolve and go through a couple of iterations, so this notebook will end up being a new docs page to help users understand the new functionality. If you edit the ./docs/source/notebooks/index.md
file, you can add the notebook name under the interrupted time series section. That way the notebook will render if you either build the docs locally, but it should also render in the remote docs preview build.
Did you still want feedback on the GitHub Discussion at this point?
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Simplified
|
@JeanVanDyk I just did a quick update from Hoping to start taking a decent look at this now. Will try to provide some comments even though I know this is not 100% finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you run
make uml
? This will generate updated uml plots (e.g.classes.png
) - Looking at that generated class diagram (and not yet having looked at the
InterventionTimeEstimator
the hope would be to avoid overwriting the methodscalculate_impact
,predict
,score
andfit
methods. These are hopefully very generic and should hopefully be able to be dealt with in thePyMCModel
base class. Like I say, I've not yet taken a look at the code itself, but this would be the hope. - I cannot currently run the new notebook. When the first model fit is attempted I get this
Traceback
TypingError Traceback (most recent call last) Cell In[4], [line 10](vscode-notebook-cell:?execution_count=4&line=10) 2 from causalpy.pymc_models import InterventionTimeEstimator as ITE 4 model = ITE( 5 time_variable_name="t", 6 treatment_type_effect={"level": []}, 7 sample_kwargs={"sample_seed": seed}, 8 ) ---> [10](vscode-notebook-cell:?execution_count=4&line=10) result = ITS( 11 data=df, 12 treatment_time=None, 13 formula="y ~ 1 + t", 14 model=model, 15 )File ~/git/CausalPy/causalpy/experiments/interrupted_time_series.py:284, in InterruptedTimeSeries.init(self, data, treatment_time, formula, model, **kwargs)
281 raise ValueError("Model type not recognized")
283 # score the goodness of fit to the pre-intervention data
--> 284 self.score = self.model.score(X=self.pre_X, y=self.pre_y)
286 # Postprocessing with handler
287 self.datapre, self.datapost, self.pre_y, self.pre_X, self.treatment_time = (
288 self.handler.data_postprocessing(
289 self.model, data, idata, treatment_time, self.pre_y, self.pre_X
290 )
291 )
File ~/git/CausalPy/causalpy/pymc_models.py:773, in InterventionTimeEstimator.score(self, X, y)
771 mu_ts = az.extract(mu_ts, group="posterior_predictive", var_names="mu_ts").T
772 # Note: First argument must be a 1D array
--> 773 return r2_score(y.data, mu_ts)
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats.py:1167, in r2_score(y_true, y_pred)
1134 def r2_score(y_true, y_pred):
1135 """R² for Bayesian regression models. Only valid for linear models.
1136
1137 Parameters
(...) 1165
1166 """
-> 1167 r_squared = r2_samples(y_true=y_true, y_pred=y_pred)
1168 return pd.Series([np.mean(r_squared), np.std(r_squared)], index=["r2", "r2_std"])
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats.py:1127, in r2_samples(y_true, y_pred)
1125 var_e = _numba_var(svar, np.var, (y_true - y_pred))
1126 else:
-> 1127 var_y_est = _numba_var(svar, np.var, y_pred, axis=1)
1128 var_e = _numba_var(svar, np.var, (y_true - y_pred), axis=1)
1129 r_squared = var_y_est / (var_y_est + var_e)
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/utils.py:372, in _numba_var(numba_function, standard_numpy_func, data, axis, ddof)
350 """Replace the numpy methods used to calculate variance.
351
352 Parameters
(...) 369
370 """
371 if Numba.numba_flag:
--> 372 return numba_function(data, axis=axis, ddof=ddof)
373 else:
374 return standard_numpy_func(data, axis=axis, ddof=ddof)
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py:535, in stats_variance_2d(data, ddof, axis)
533 var = np.zeros(a_a)
534 for i in range(a_a):
--> 535 var[i] = stats_variance_1d(data[i], ddof=ddof)
536 else:
537 var = np.zeros(b_b)
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/utils.py:205, in maybe_numba_fn.call(self, *args, **kwargs)
203 """Call the jitted function or normal, depending on flag."""
204 if Numba.numba_flag:
--> 205 return self.numba_fn(*args, **kwargs)
206 else:
207 return self.function(*args, **kwargs)
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/numba/core/dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
420 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
421 f"by the following argument(s):\n{args_str}\n")
422 e.patch_message(msg)
--> 424 error_rewrite(e, 'typing')
425 except errors.UnsupportedError as e:
426 # Something unsupported is present in the user code, add help info
427 error_rewrite(e, 'unsupported_error')
File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/numba/core/dispatcher.py:365, in _DispatcherBase._compile_for_args..error_rewrite(e, issue_type)
363 raise e
364 else:
--> 365 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /Users/benjamv/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py (517)
File "../../../../../mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py", line 517:
def copy(self, deep=True): # pylint:disable=overridden-final-method
@conditional_jit(nopython=True)
^
During: Pass nopython_type_inference
This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'xarray.core.dataarray.DataArray'>
- I think there should be no changes to
its_skl.ipynb
right? Can we revert those changes back to the state on main, so there's no change in that file in this PR. - Same for
its_covid.ipynb
?
@@ -40,6 +40,7 @@ did_pymc_banks.ipynb | |||
its_skl.ipynb | |||
its_pymc.ipynb | |||
its_covid.ipynb | |||
its_no_treatment_time.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. But in the notebook can you ensure you only have ONE top level markdown header. Otherwise it adds these as additional entries into the How To index page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sure. I'll fix that!
As for the two notebooks you mentioned, I don't recall making any changes to them. I only ran them to check whether the modifications contained any small mistakes. I'll try reverting those changes.
Here is the graph @drbenvincent : |
Hi @drbenvincent, I've made some changes and will summarize them here, addressing the points you raised:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handlers
- Can we rename to make more clear. E.g.
UnknownTreatmentTimeHandler
andKnownTreatmentTimeHandler
- We discussed a range of options to do this, and I still think the handler pattern is quite a good approach. If we did conditional logic in the main
InterruptedTimeSeries
class then it would be disconnected and unclear. Though I think we should formalise the interface with an abstract base class. Any major objections?
from abc import ABC, abstractmethod
class TreatmentTimeHandler(ABC):
@abstractmethod
def data_preprocessing(self, data, treatment_time, model): pass
@abstractmethod
def data_postprocessing(self, ...): pass
@abstractmethod
def plot_intervention_line(self, ax, ...): pass
@abstractmethod
def plot_impact_cumulative(self, ax, ...): pass
def plot_treated_counterfactual(self, ax, ...):
"""Optional: override if needed"""
pass
treatment_type_effect
input
Minor gripes about this. The data structure is potentially a bit complex and allows user to add multiple keys (i.e. effects). What do you think about changing the API to something like this:
model = ITE(
time_variable_name="t",
treatment_effect_type="level", # Required: one of "level", "trend", "impulse"
treatment_effect_params=None, # Optional: custom parameters
sample_kwargs={"sample_seed": seed},
)
Notebook
- Rather than using the abbreviation
ITE
, can you just import asInterventionTimeEstimator
? We want to maximise clarity for the reader and not require that they remember acronyms :) - Sorry to drag it out, but I will have to leave a proper review of the notebook for another day. But we are definitely getting there. Maybe a couple more small iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Architecture / API though
At the moment there is an API change to the norm in that for the first time we would be directly telling the model about the data when we generate it, by giving it time_variable_name="t"
, like this:
model = ITE(
time_variable_name="t",
treatment_type_effect={"impulse": []},
sample_kwargs={"random_seed": seed, "target_accept": 0.95},
)
All the other models just consume sample_kwargs
and don't override the PyMCModel.__init__
method
CausalPy/causalpy/pymc_models.py
Lines 71 to 78 in 8b26d42
def __init__(self, sample_kwargs: Optional[Dict[str, Any]] = None): | |
""" | |
:param sample_kwargs: A dictionary of kwargs that get unpacked and passed to the | |
:func:`pymc.sample` function. Defaults to an empty dictionary. | |
""" | |
super().__init__() | |
self.idata = None | |
self.sample_kwargs = sample_kwargs if sample_kwargs is not None else {} |
Your approach is to call the PyMCModel.__init__
with super().__init__(sample_kwargs)
and then add in the other arguments to the model self.
I don't yet know if this is deeply problematic. But it is a change in terms of information flow in that it's the first time we are giving the models information about the data while they are created.
My temptation would be to get rid of this and stick with the current pattern.
An important point to think about here is that the build_model
method is not actually called until we call PyMCModel.fit
, so perhaps we can instead provide the time_variable_name
and treatment_type_effect
related stuff to the experiment and pass it to the model via build_model
.
I'm tempted to try to revert to the current architecture. Something along the lines of
class InterruptedTimeSeries:
def fit(self):
time_var = self._extract_time_variable_from_formula(self.formula)
if isinstance(self.model, ITE):
idata = self.model.fit(X, y, coords, time_variable=time_var)
else:
idata = self.model.fit(X, y, coords)
New Feature:
InterventionTimeEstimator
for Unknown Treatment TimingThis PR introduces a new model,
InterventionTimeEstimator
, designed to estimate when an intervention has an effect in a time series — especially in cases where the exact treatment time is unknown.Use Case
This addition gives users a flexible, Bayesian approach to model treatment timing uncertainty directly within the CausalPy framework.
Notes / Open Questions
Where should this model fit into the CausalPy workflow?
I’m unsure whether
InterventionTimeEstimator
should be integrated within theInterruptedTimeSeries
(ITS) feature, or used as a standalone tool.This affects how a user-defined model could be supported.
Depending on the intended usage, I can propose a solution to allow users to inject their own custom models.
Custom model usage — base vs. intervention
Should users be able to:
Covariates
I considered adding time-varying covariates to improve the fit. Would that be useful or out of scope?
Multivariate Time Series
It's relatively easy to extend the model for multivariate input. Let me know if this is something you'd like to see.
Model Summary
Inputs:
t
: 1D array of time pointsy
: 1D array of observed valuesspan
: restricts the window for switchpoint detectioncoords
: can includeseasons
for modeling periodic effectseffect
: list of components, e.g."trend"
,"level"
,"impulse"
grain_season
: number of time steps per seasonModel Components:
intercept + trend + seasonal
effect
componentsFeel free to share any feedback or suggestions! I'm happy to refine the model or explore extensions based on your input.
📚 Documentation preview 📚: https://causalpy--480.org.readthedocs.build/en/480/