Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elegant way to filter fields? #750

Closed
mholt opened this issue Oct 24, 2019 · 16 comments
Closed

Elegant way to filter fields? #750

mholt opened this issue Oct 24, 2019 · 16 comments

Comments

@mholt
Copy link

mholt commented Oct 24, 2019

Hi, I've been working with zap and it's great, but I have a question that I haven't been able to solve in a few days' time.

How can I filter out certain fields? Preferably by name (and possibly type). For example, I want to mask IP addresses in fields called "remote_addr" or strip fields called "password".

I've tried making an Encoder and wrapped EncodeEntry() and this works, but not for complex types like objects and arrays. In other words, if a log adds a field with zap.Object("http_request", req), EncodeEntry() has no way to peer inside the object and find offending fields and filter them ... does it?

I've also tried implementing AddString() and AddObject() on my Encoder type but then the object loses its structure, and all its fields are flatted into the top level of the context.

Any ideas how this can be done?

It is important for regulatory/compliance reasons, apparently.

@jcorbin
Copy link
Contributor

jcorbin commented Oct 24, 2019

So it'll probably be a bit repetitive, but you should be able to achieve this with a custom implementation of zapcore.ObjectEncoder, I'd suggest:

  • some type strippingEncoder struct
  • with a func (strippingEncoder) keyAllowed(key string) bool
  • then every one of its func (strippingEncoder) AddT(key string, value T) implementations first does a if !enc.keyAllowed(key) { return }; you could of course do type specific value replacement, rather than returning, but that'd be a bit more of a maskingEncoder I suppose...
  • from there, you should be able to see your way to do doing type-specific detection also
  • if you need path sensitivity, your func (strippingEncoder) AddObject implementation can track the descendant key path (e.g. by using a path []string field which it temporarily appends to while calling the given ObjectMarshaller

@mholt
Copy link
Author

mholt commented Oct 24, 2019

@jcorbin Thank you for the quick reply! :) That was basically my approach so far (tested with just AddString() and AddObject()) and the problem seems to be AddObject(), since the question is... what do I call when the field is allowed? I want the object to be encoded, but I need to inspect its nested fields, so I still need my own AddString() called when the underlying encoder calls AddString(). But doing this flattens the structure completely.

I don't really understand:

if you need path sensitivity, your func (strippingEncoder) AddObject implementation can track the descendant key path (e.g. by using a path []string field which it temporarily appends to while calling the given ObjectMarshaller

I don't think this helps me achieve marshaling a JSON object: caddyserver/caddy#1109 (comment)

Another seeming problem is that, for example *zapcore.jsonEncoder is both a zapcore.ObjectEncoder and zapcore.ArrayEncoder, and encoding an object with that encoder calls AppendT() methods which is not part of the zapcore.Encoder interface... so, I think it breaks an abstraction? Making it difficult to do this effectively.

@jcorbin
Copy link
Contributor

jcorbin commented Oct 24, 2019

Have another look at the core json encoder:

@mholt
Copy link
Author

mholt commented Oct 24, 2019

Thanks, I'm looking now.

What do you mean by "key tracking"? I don't see anything in addKey that deals with "path state". I also don't know what a "leaf key string" is. Keys are just strings, there is no structure to keys?

@jcorbin
Copy link
Contributor

jcorbin commented Oct 24, 2019

Right, for the json encoder, the only state is "the built buffer".

But for anything that wants to do structural things (which filtering or re-writing of fields is fundamentally one such), you may need to track the full key path.

The structure will only exist within your custom encoder, let me try to be more specific here with some pseudo code that I'm writing live here in this github comment (buyer beware):

type encoder struct {
  path []string // stack of keys leading to the current value being encoded
  // TODO whatever other encoding state you need
}

func (enc encoder) AddObject(key string, obj ObjectMarshaller) error {
  // TODO opportunity to apply key filter before descending
  enc.path = append(enc.path, key) // push key
  // TODO encode the key somehow
  err := enc.AppendObject(obj) // recurses with collect path recursively
  enc.path = enc.path[:len(enc.path)-1]  // pop key
  return err
}

// for all T in ObjectEncoder's required method set:
func (enc encoder) AddT(key string, value T) {
  // NOTE at this point, enc.path has all parent keys containing the about to be added key, in case context matters for deciding if you should allow or not
}

Sorry for the abortive first comment, took me a while to realize that there are challenges to "Simply Composing" an encoder-wrapper type; read all of this with the hazard that "for our purposes, we consider zap 'done', and I haven't actively touched the code in quite some time" ( I'm reactivating all sorts of dormant mental pathways just to try to answer this for you; buyer beware ;-) )

@mholt
Copy link
Author

mholt commented Oct 24, 2019

Thank you for taking the effort to demonstrate what you mean! I appreciate it a lot. I am aware that "zap is done" so I'm just trying to get familiar with its architecture.

I suppose I wasn't clear on this before, but my wrapper encoder has an underlying zapcore.Encoder that it defers actual encoding to. In other words, for filling a buffer with JSON or console encodings (or whatever else), I defer to the underlying implementation and it shouldn't matter what it is. My encoder simply decides which fields to allow (or modifies them before letting them be encoded).

Thus, I don't implement this method:

 err := enc.AppendObject(obj)

And I can't call that for an underlying encoder because AppendObject() is not part of the zapcore.Encoder interface. In other words, there's no way to know that an encoder can actually append anything.

So I'm limited to AddT() methods.

@mholt
Copy link
Author

mholt commented Oct 24, 2019

I'm gonna try a different approach, where I wrap the *zap.Logger type so that Info(), Error(), methods (etc.) can iterate and filter the zapcore.Field values that are passed in before being dispatched to the real logging methods. I'll report back.

@mholt
Copy link
Author

mholt commented Oct 24, 2019

Nevermind, that still makes zap.Object() too opaque to filter, same issue as before... need to have control of the object's individual fields too, recursively... dang.

@jcorbin
Copy link
Contributor

jcorbin commented Oct 24, 2019

The difficulties in writing "filtering middlewares" has come up sporadically in other (now closed) issues before, e.g. #453 fwiw, there are probably others...

I believe the status is "possible, but more effort that you'd think, because performance".

Maybe sometime after generics, we can consider doing a 2.0 that would better support this use case ( once the state of "because performance" shifts for a library like this... )

@mholt
Copy link
Author

mholt commented Oct 24, 2019

I see... thanks for pointing me to those issues. (For some reason I hadn't found them.)

Although, I have had a slight breakthrough by iterating on my initial attempt. By combining my own AddObject() method with wrapping the marshaler zapcore.ObjectMarshaler, I was able to recursively filter out fields. I have a lot of cleanup to do, and to do some more testing to make sure I'm not fooling myself (which I've done a few times already, heh), and then I'll report back.

Edit: Okay, yeah, it works -- the trick is to sneak my encoder in between the call to AddObject (on the underlying encoder) and MarshalLogObject (which the underlying encoder invokes when its AddObject is called). Since MarshalLogObject takes an encoder, I just make sure it receives my filtering encoder. I believe there are no extra allocations (but it does slightly deepen the call stack).

Will post code today or tomorrow.

@mholt
Copy link
Author

mholt commented Oct 25, 2019

Thank you again for your help! I figured it out, I believe.

Here is the implementation: https://github.com/caddyserver/caddy/pull/2831/files#diff-600bb63c5e5c442aa4ed554d687f2120

@mholt mholt closed this as completed Oct 25, 2019
@mholt
Copy link
Author

mholt commented Oct 25, 2019

Actually, that solution seems to be appending each log entry with the previous one. An initial log, for example, would be:

{"method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62472"}

But then repeating the same request shows:

{"method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62472", "method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62474"}

And then it just keeps multiplying.

Do you think that would be something to do with zap's pooling behavior? (I'm in the process of debugging and just wanted to check if this is something you've seen before.) (I am using the console encoder; these snippets are from the end of each line -- but it also happens with the JSON encoder.)

@mholt
Copy link
Author

mholt commented Oct 25, 2019

Ah... so the zap default encoders pool themselves (

func getJSONEncoder() *jsonEncoder {
) when EncodeEntry is called.

I don't fully understand what is happening, but: because the encoders pool themselves, encoders get reused?

Changing my wrapper's EncodeEntry method from this:

for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

to this:

fe.wrapped = fe.wrapped.Clone()
for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

eliminates the multiplying of fields and it works again. (Notably, fe does not have a pointer receiver, so it operates on shallow copies. However, using wrappedCopy := fe.wrapped.Clone() instead of fe.wrapped = ... causes the problem to emerge again.)

@mfridman
Copy link

@mholt did you end up finding an elegant way to retrieve fields from the underlying zap core? I'd imagine this is a common thing to want to do, but haven't found good example..

any other insight you may have since opening the issue?

@mholt
Copy link
Author

mholt commented Dec 14, 2019

@mfridman Yeah, I got this to work at an acceptable level of elegance. Current implementation is here: https://sourcegraph.com/github.com/caddyserver/caddy@v2/-/blob/modules/logging/filterencoder.go

Note the calls to filtered() which can access the underlying fields.

It's elegant enough since it only has to be done once.

Caddy 2's logging features are actually really neat, since all the heavy lifting like this is done -- you can do things with logs that you can't do in other web servers, thanks to zap.

@prashantv
Copy link
Collaborator

I don't fully understand what is happening, but: because the encoders pool themselves, encoders get reused?

Changing my wrapper's EncodeEntry method from this:

for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

to this:

fe.wrapped = fe.wrapped.Clone()
for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

eliminates the multiplying of fields and it works again. (Notably, fe does not have a pointer receiver, so it operates on shallow copies. However, using wrappedCopy := fe.wrapped.Clone() instead of fe.wrapped = ... causes the problem to emerge again.)

Sorry for the delay, just wanted to make sure this was explained: The EncodeEntry method is passed in fields which are only for that log message, not part of all future logs. The API for the Encoder is:

  • When any of the ObjectEncoder methods like AddString or any Add* are called, the encoder should be modified such that all future calls and clones include that field
  • When EncodeEntry is called, it's not supposed to add any fields to the encoder, it's just fields that are part of a single message, not for all future messages.

A simpler interface would have been to drop the []Field as part of EncodeEntry, and just have the caller (zap's core) call Clone() and then use the Add* fields, then call EncodeEntry without fields, and then drop the encoder. However, we thought it'd be possible to optimize the EncodeEntry method if it took fields, since you wouldn't need to do a full clone.

Thanks for sharing the filter encoder, and glad zap is working well for Caddy 2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants