Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom mergeable metrics in whylogs #241

Closed
andyndang opened this issue Jun 24, 2021 · 7 comments
Closed

Support custom mergeable metrics in whylogs #241

andyndang opened this issue Jun 24, 2021 · 7 comments
Assignees

Comments

@andyndang
Copy link
Contributor

andyndang commented Jun 24, 2021

There are three kinds of metrics that whylogs users track:

  1. Tracking derived metrics from customers. Typically this is numerical data. You can use the approach above to track these metrics because they will show up as a “whylogs” column
  2. Custom metrics that are mergeable: basically if you have metrics that can be “summed” or “aggregated” across different profiles, this is a feature request that we are tracking from other customers as well.
  3. One-off metrics: sometimes users have one-off metrics that they want to piggy back on top of whylogs. These metrics are not aggregatable, but they want to use whylogs object to store these metrics.
@andyndang
Copy link
Contributor Author

For 1, we're already doing this.

For 2, we'll need to support:

  • Storing metrics (probably in binary form)
  • A class that can handle the metric in Python
  • A class that can handle the metric in Java
  • These classes will implements method for serder and merging metrics. So we get something like obj.to_bytes(), obj.parse_bytes(), obj.merge(another). I believe these should be sufficient for most of the use cases

For 3, we can start with supporting numbers. But what's the behavior when merging two objects with "unmergeable" metrics? Should we throw exception? Warnings? Drop the fields?

@lalmei
Copy link
Contributor

lalmei commented Jun 24, 2021

I would focus on a simple class custom metric api, like

class CustomMetric

   def track(inputs):
   def merge(self, right_metric):

if there is no merge then we cant merge, unless it inherents from something like numberTracker .

@andyndang
Copy link
Contributor Author

That might work. We still need to:

  • Decide to throw error/drop the metrics or not
  • How to store this (so you'll need to convert it to bytes/from bytes)
  • Store information about the class and the metrics for later parsing

@ramannanda9
Copy link

A base class should be able to handle serialization pretty easily.

@dataclass
class CustomMetric(abc.ABC):
    @abc.abstractmethod
    def track():
        pass
     @abc.abstracmethod
     def merge(self, right_metric: 'CustomMetric'):
          pass
     @abc.abstractmethod
     def name():
          pass
     def deserialize(name:str) -> 'CustomMetric'
           //implementation here by traversing subclasses of 'CustomMetric' and calling its constructor
     def serialize():
         return {"name": self.name, 'params': dataclasses.asdict(self)  }         

@lalmei
Copy link
Contributor

lalmei commented Jul 8, 2021

except we need at least the protobuf de/serialization

@jamie256
Copy link
Contributor

jamie256 commented Sep 2, 2021

The protobuf message packing and then serialization is how our datasetprofile serializes all the associated metrics for a dataset when the logger writes, but I took your suggestion using a dataclass and doing most of that work in a base class. Here is a draft PR if you have comments: #300

@jamie256 jamie256 self-assigned this May 3, 2022
@jamie256 jamie256 added the v1 label May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants