Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make tfd_irregoperations more tolerant #10

Open
fabian-s opened this issue Jul 25, 2018 · 8 comments
Open

make tfd_irregoperations more tolerant #10

fabian-s opened this issue Jul 25, 2018 · 8 comments
Labels
bug Something isn't working methods
Milestone

Comments

@fabian-s
Copy link
Contributor

e.g. stuff like
dti$rcst - mean(dti$rcst) should work, or at least dti$rcst - mean(dti$rcst, na.rm = TRUE)

@fabian-s fabian-s added bug Something isn't working methods labels Jul 25, 2018
@fabian-s
Copy link
Contributor Author

will need modifiying / replacing fun_op to avoid failure on arg-comparison -- for each obs, the op is only defined on the intersection of args... :(

@jeff-goldsmith
Copy link
Contributor

Not sure it's related, but

dti_df$cca %>% tf_smooth

works, while

dti_df$cca %>% mean(na.rm = TRUE) %>% tf_smooth

doesn't ...

fabian-s referenced this issue in tidyfun/tidyfun Jun 3, 2019
now deals with 1-element lists, irregular inputs with NAs.
relates to #48 (thx jeff)
@fabian-s
Copy link
Contributor Author

fabian-s commented Jun 3, 2019

ouch, thx. fixed in tidyfun/tidyfun@3e1b8a6

@fabian-s fabian-s transferred this issue from tidyfun/tidyfun May 10, 2022
@fabian-s
Copy link
Contributor Author

see #5, warn about too much irregularity but do it

@fabian-s
Copy link
Contributor Author

fabian-s commented Jul 14, 2022

"warn about too much irregularity":

  • yields a WARNING
  • enough curves have enough data points in common
  • no warn if >50% have >50% gridpoints in commion (average pointwise coverage)
  • warn if any grid points exist with <10 % coverage (minimal pointwise coverage)

@fabian-s fabian-s modified the milestone: put it on CRAN Jan 8, 2024
@fabian-s
Copy link
Contributor Author

fabian-s commented Feb 14, 2024

on further thought:

"warn about too much irregularity and do it" will be messy to code and probably still be unreliable.
it also does a lot of intransparent interpolating of values behind the scenes -- better to make users decide where and how they want to inter/extrapolate by having them explicitly convert irregular data to regular data on a common grid etc. I no longer think that stuff like dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE) should "just work" - in order to make this work, way too much magic would need to happen behind the scenes. better to give a fairly clear error (in this case we get "tf_arg(x) and tf_arg(y) are not equal", which could be better but seems informative enough...).

the current implementation (branch 5-NAhandling @ c7e351cf) of e.g. mean(<tfd_irreg>) will do what's expected for somewhat irregular data, IMO:

only return a mean function value for args that are present in all functions:

> tf_rgp(3) |> tf_sparsify() |> mean()
tfd[1] on (0,1) based on 7 to 7 (mean: 7) evaluations each  # input data had 51 !
inter-/extrapolation by tf_approx_linear 
[1]: (0.04, 0.065);(0.26,-0.752);(0.34,-0.676); ...

return an empty "tfd_irreg" for completely irregular data without any grid points in common:

> tf_rgp(3) |> tf_jiggle() |> mean()
empty or missing input `data`; returning prototype of length 0
tfd[1] on (0,0) based on 0 to 0 (mean: 0) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (NULL,NULL)

or take the mean of all available data at each arg with na.rm = TRUE:

> tf_rgp(3) |> tf_sparsify() |> mean(na.rm = TRUE)
tfd[1] on (0,1) based on 44 to 44 (mean: 44) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.00,0.38);(0.02,1.19);(0.04,0.54); .

@jeff-goldsmith have you come across other issues in this vein? I'm having a hard time coming up with test cases for this.

@jeff-goldsmith
Copy link
Contributor

Not sure this is exactly the kind of test cases you have in mind, but here are some settings where we'd have varying degrees of overlap in args across functional observations:

  • accelerometers that record at the minute level when worn, and people put on / take off at different times
  • ambulatory blood pressure cuffs which take one observation every ~30 minutes after the start time; some subjects have overlapping data but most are offset from each other (some people start at 8:32, others 9:17, etc)
  • basically any dataset after registration would have effectively no overlap across subjects

One wrinkle on whether dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE) should "just work" -- chf_df$activity - mean(chf_df$activity) works because this is tf_reg. Users might not immediately get why one works and one doesn't; we may also want to suggest a workflow for "center irregular functional data".

@fabian-s
Copy link
Contributor Author

fabian-s commented Feb 18, 2024

users might not immediately get why one works and one doesn't;

true -- we need to add more doc / warnings for this

we may also want to suggest a workflow for "center irregular functional data".

i now think operations like this might actually become easier once tf_rebase is all done -- Ops-methods can then cast a tfd_reg to tfd_irreg on the same args and then perform this kind of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working methods
Projects
None yet
Development

No branches or pull requests

2 participants