Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make _assign_metrics public #199

Closed
dcherian opened this issue Apr 16, 2020 · 16 comments
Closed

Make _assign_metrics public #199

dcherian opened this issue Apr 16, 2020 · 16 comments
Labels

Comments

@dcherian
Copy link
Contributor

#108 (comment) suggests that _assign_metrics should be public.

That way you can use grid.interp and grid.diff to construct metrics when necessary and then just assign them to an existing object.

@jbusecke
Copy link
Contributor

I think this is resonable. @rabernat, do you have thoughts?

@jbusecke
Copy link
Contributor

I think if we do this, we could also implement a .get_metrics_dict. I have been messing around with subsets of data and it would be very helpful to get a complete dict like {'X':['dx1', 'dx2', 'dx3',...], 'Y':[...]} to pass it to a new grid object (we currently only store the actual dataarrays, not the coordinate names). This would probably only require small effort to refactor.

@dcherian
Copy link
Contributor Author

How about .metrics with property setters and getters?

@jbusecke
Copy link
Contributor

I would be in favor of that!

@rabernat
Copy link
Contributor

Sure, sounds good to me.

@jbusecke
Copy link
Contributor

Something we can integrate in this feature is a way to return the original metrics_dict so it could be passed to another grid object. This might be helpful for subsets of data (#193 )

@jbusecke
Copy link
Contributor

jbusecke commented Oct 14, 2020

Just started to tinker with a PR for #103 and I think it would be helpful to address this here first. A few additional questions:

I would create a subclass metrics on the grid object? which would store a dictionary like this:

metric_dict = {axis_combo:[(metric_name1, metric_da1), (metric_name2, metric_da2), ...]}

this would enable us to store the name in combination with the dataarray. This is helpful for naming derived metrics (e.g. for #222 and #103) and prepare the internals to be able to generate for instance a subset with all needed kwargs to create a new grid (useful for #193 #200 ).

should the set method accept only str (names of variables in the grid._ds dataset? Or should we expand this functionality to accept an actual dataarray?

Should we be able to update a single metric once the grid object is created? Or require the user to completely 'reset' the metric_dict?

What checks should we perform in the set method?

  1. Make sure that each grid position is unique (e.g. if two x distances are passed and are located on the same position, we should error out)
  2. ... anything else?

@dcherian
Copy link
Contributor Author

I would create a subclass metrics on the grid object? which would store a dictionary like this:

How about something like

class Metrics(dict):
    def __setitem__(self, key, value):
        # do validation / alignment checks here
        super().__setitem__(key, value)

    def validate(self):
        # could be useful
        pass

    def infer_remaining(self):
        # maybe?
        pass
    
class Grid:
    
    metrics: Metrics = Metrics()
    
inst = Grid()
print(inst.metrics)  # {}
inst.metrics["a"] = 10
inst.metrics["b"] = 20
print(inst.metrics)  # {a: 10, b: 20}

store the name in combination with the dataarray

What are these names? Are they the same as metric_da1.name?

should the set method accept only str (names of variables in the grid._ds dataset? Or should we expand this functionality to accept an actual dataarray?

To help with #108 (comment) (grid operations used to construct metrics), I think we should allow setting DataArrays.

Should we be able to update a single metric once the grid object is created? Or require the user to completely 'reset' the metric_dict?

Hmm.. maybe not until this is actually needed? #108 (comment) should only require setting metrics that have not been assigned

@jbusecke
Copy link
Contributor

I like your suggestion a lot.

Couple of thoughts (forgive me if these are stupid, I am not that versed with classes yet):

  1. We need to keep track of what axes the metrics represent (currently a tuple of one or more axis names, lets call that axis_combo. Think distance axis_combo=('X') vs area axis_combo=('X','Y'), how can we modify the above to keep track of that? Use a tuple like (value, axis_combo), so that the above is modified to inst.metrics['distance_a']=(10,('X'))?
    We could also use an attr of the dataarray? That is if we require that as

  2. I really like the infer logic here, but I think we should be able to track internally, which ones are 'native' versus inferred. Any proposal how to do this most elegantly? Again we could use the attrs?

What are these names? Are they the same as metric_da1.name?

Yes, if we require a dataarray as value, we could just check for a name on the input, and allow to pass either a str (value gets picked from grid._ds), or a dataarray (name gets parsed from there or we raise a warning if da.name is None).

Should we be able to update a single metric once the grid object is created? Or require the user to completely 'reset' the metric_dict?
Hmm.. maybe not until this is actually needed? #108 (comment) should only require setting metrics that have not been assigned

I think I wasnt exactly clear here. I think your example clarified, that you can set each metric by itself, and dont need to pass them as a list of values or anything like that.

@dcherian
Copy link
Contributor Author

I thought Grid.metrics would have axis_combo as keys ('X',), ('X', 'Y'), ('X', 'Y', 'Z'); What is distance_a and why do we need it :D?

Why should we track "native" vs "inferred" metrics? We could add a private attribute to the Metrics class _inferred = list(str).

@jbusecke
Copy link
Contributor

I thought Grid.metrics would have axis_combo as keys ('X',), ('X', 'Y'), ('X', 'Y', 'Z'); What is distance_a and why do we need it :D?

For future features (#103 and extension of the grid.transform() functionality to metrics and datasets), I would like to be able to attach 'altered' metrics with the original name onto the output array/datasets. For example, lets say you have some dataarray data, which has the coordinate area_t. I would like to be able to coarsen this and get another dataarray which has a coarsened coordinate area_t or optional area_t_coarsened. Either way for things to stay consistent, I need the original name. Does that make sense?

Why should we track "native" vs "inferred" metrics? We could add a private attribute to the Metrics class _inferred = list(str).

Let me clarify what I envision under inferred. We are currently able to multiple lower order metrics to higher order metrics (e.g. multiply distances to get area). I think it would also be good to have an option in the future to interpolate metrics to positions without metrics (like in #196). I think both of these should be clearly marked and the user needs to be aware of the fact that these might be less accurate than native metrics.

I am also working on a draft of a new vanity feature that Ill describe in an issue soon, which would need both of these informations, hehe.

@dcherian
Copy link
Contributor Author

Either way for things to stay consistent, I need the original name. Does that make sense?

Yes but the name should be metric_da.name no? I think we can raise errors if metric_da.name is None when assigning to Metrics

I think coarsen syntax could be

coarsegrid = grid.coarsen(X=5)
newds = coarsegrid.mean(ds)

This would preserve the names of metrics variables area_t and you have a new consistent Grid object for the new grid.
I prefer this over

coarsegrid, newds = grid.coarsen(X=5).mean(ds)

I think it would also be good to have an option in the future to interpolate metrics to positions without metrics

Yes, but this could be opt-in and we can add a verbose flag to print out what was inferred. Alternatively, we could store inferred names under a private attribute and mark those appropriately in a repr for Metrics

@jbusecke
Copy link
Contributor

Yes but the name should be metric_da.name no? I think we can raise errors if metric_da.name is None when assigning to Metrics

Yes, I think we are thinking the same thing here.

I think coarsen syntax could be

coarsegrid = grid.coarsen(X=5)
newds = coarsegrid.mean(ds)
This would preserve the names of metrics variables area_t and you have a new consistent Grid object for the new grid.
I prefer this over

coarsegrid, newds = grid.coarsen(X=5).mean(ds)

Interesting. I am not quite sure which version I favor at the moment. But could you copy+paste this suggestion to #103 , so folks who are not watching this issue could discuss?

I think this is sufficiently concrete that I can take a swing at implementing this maybe tomorrow or next week, and then we can discuss details over in the PR? Thanks for all the input.

This was referenced Jun 8, 2021
@github-actions
Copy link

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 11, 2021
@github-actions
Copy link

This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.

@jbusecke
Copy link
Contributor

Just for posteriority: This was closed via #336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants