Elegant way to filter fields? #750

mholt · 2019-10-24T19:38:13Z

Hi, I've been working with zap and it's great, but I have a question that I haven't been able to solve in a few days' time.

How can I filter out certain fields? Preferably by name (and possibly type). For example, I want to mask IP addresses in fields called "remote_addr" or strip fields called "password".

I've tried making an Encoder and wrapped EncodeEntry() and this works, but not for complex types like objects and arrays. In other words, if a log adds a field with zap.Object("http_request", req), EncodeEntry() has no way to peer inside the object and find offending fields and filter them ... does it?

I've also tried implementing AddString() and AddObject() on my Encoder type but then the object loses its structure, and all its fields are flatted into the top level of the context.

Any ideas how this can be done?

It is important for regulatory/compliance reasons, apparently.

The text was updated successfully, but these errors were encountered:

jcorbin · 2019-10-24T19:54:50Z

So it'll probably be a bit repetitive, but you should be able to achieve this with a custom implementation of zapcore.ObjectEncoder, I'd suggest:

some type strippingEncoder struct
with a func (strippingEncoder) keyAllowed(key string) bool
then every one of its func (strippingEncoder) AddT(key string, value T) implementations first does a if !enc.keyAllowed(key) { return }; you could of course do type specific value replacement, rather than returning, but that'd be a bit more of a maskingEncoder I suppose...
from there, you should be able to see your way to do doing type-specific detection also
if you need path sensitivity, your func (strippingEncoder) AddObject implementation can track the descendant key path (e.g. by using a path []string field which it temporarily appends to while calling the given ObjectMarshaller

mholt · 2019-10-24T20:02:53Z

@jcorbin Thank you for the quick reply! :) That was basically my approach so far (tested with just AddString() and AddObject()) and the problem seems to be AddObject(), since the question is... what do I call when the field is allowed? I want the object to be encoded, but I need to inspect its nested fields, so I still need my own AddString() called when the underlying encoder calls AddString(). But doing this flattens the structure completely.

I don't really understand:

if you need path sensitivity, your func (strippingEncoder) AddObject implementation can track the descendant key path (e.g. by using a path []string field which it temporarily appends to while calling the given ObjectMarshaller

I don't think this helps me achieve marshaling a JSON object: caddyserver/caddy#1109 (comment)

Another seeming problem is that, for example *zapcore.jsonEncoder is both a zapcore.ObjectEncoder and zapcore.ArrayEncoder, and encoding an object with that encoder calls AppendT() methods which is not part of the zapcore.Encoder interface... so, I think it breaks an abstraction? Making it difficult to do this effectively.

jcorbin · 2019-10-24T20:16:03Z

Have another look at the core json encoder:

AddObject https://github.com/uber-go/zap/blob/master/zapcore/json_encoder.go#L97 can do key tracking; for json, it just writes to the buffer, but your could accumulate path state here
AppendObject https://github.com/uber-go/zap/blob/master/zapcore/json_encoder.go#L189 then basically re-enters the whole encoder, every encoder has to be a full ObjectEncoder, and works by recursively calling itself through these two methods
this will let you match on more than just the leaf key string if needed, since you can match on (basically a stack) of collected key strings in every AddT

mholt · 2019-10-24T20:18:40Z

Thanks, I'm looking now.

What do you mean by "key tracking"? I don't see anything in addKey that deals with "path state". I also don't know what a "leaf key string" is. Keys are just strings, there is no structure to keys?

jcorbin · 2019-10-24T20:30:21Z

Right, for the json encoder, the only state is "the built buffer".

But for anything that wants to do structural things (which filtering or re-writing of fields is fundamentally one such), you may need to track the full key path.

The structure will only exist within your custom encoder, let me try to be more specific here with some pseudo code that I'm writing live here in this github comment (buyer beware):

type encoder struct {
  path []string // stack of keys leading to the current value being encoded
  // TODO whatever other encoding state you need
}

func (enc encoder) AddObject(key string, obj ObjectMarshaller) error {
  // TODO opportunity to apply key filter before descending
  enc.path = append(enc.path, key) // push key
  // TODO encode the key somehow
  err := enc.AppendObject(obj) // recurses with collect path recursively
  enc.path = enc.path[:len(enc.path)-1]  // pop key
  return err
}

// for all T in ObjectEncoder's required method set:
func (enc encoder) AddT(key string, value T) {
  // NOTE at this point, enc.path has all parent keys containing the about to be added key, in case context matters for deciding if you should allow or not
}

Sorry for the abortive first comment, took me a while to realize that there are challenges to "Simply Composing" an encoder-wrapper type; read all of this with the hazard that "for our purposes, we consider zap 'done', and I haven't actively touched the code in quite some time" ( I'm reactivating all sorts of dormant mental pathways just to try to answer this for you; buyer beware ;-) )

mholt · 2019-10-24T20:35:57Z

Thank you for taking the effort to demonstrate what you mean! I appreciate it a lot. I am aware that "zap is done" so I'm just trying to get familiar with its architecture.

I suppose I wasn't clear on this before, but my wrapper encoder has an underlying zapcore.Encoder that it defers actual encoding to. In other words, for filling a buffer with JSON or console encodings (or whatever else), I defer to the underlying implementation and it shouldn't matter what it is. My encoder simply decides which fields to allow (or modifies them before letting them be encoded).

Thus, I don't implement this method:

 err := enc.AppendObject(obj)

And I can't call that for an underlying encoder because AppendObject() is not part of the zapcore.Encoder interface. In other words, there's no way to know that an encoder can actually append anything.

So I'm limited to AddT() methods.

mholt · 2019-10-24T20:51:45Z

I'm gonna try a different approach, where I wrap the *zap.Logger type so that Info(), Error(), methods (etc.) can iterate and filter the zapcore.Field values that are passed in before being dispatched to the real logging methods. I'll report back.

mholt · 2019-10-24T21:00:05Z

Nevermind, that still makes zap.Object() too opaque to filter, same issue as before... need to have control of the object's individual fields too, recursively... dang.

jcorbin · 2019-10-24T21:12:01Z

The difficulties in writing "filtering middlewares" has come up sporadically in other (now closed) issues before, e.g. #453 fwiw, there are probably others...

I believe the status is "possible, but more effort that you'd think, because performance".

Maybe sometime after generics, we can consider doing a 2.0 that would better support this use case ( once the state of "because performance" shifts for a library like this... )

mholt · 2019-10-24T21:40:12Z

I see... thanks for pointing me to those issues. (For some reason I hadn't found them.)

Although, I have had a slight breakthrough by iterating on my initial attempt. By combining my own AddObject() method with wrapping the marshaler zapcore.ObjectMarshaler, I was able to recursively filter out fields. I have a lot of cleanup to do, and to do some more testing to make sure I'm not fooling myself (which I've done a few times already, heh), and then I'll report back.

Edit: Okay, yeah, it works -- the trick is to sneak my encoder in between the call to AddObject (on the underlying encoder) and MarshalLogObject (which the underlying encoder invokes when its AddObject is called). Since MarshalLogObject takes an encoder, I just make sure it receives my filtering encoder. I believe there are no extra allocations (but it does slightly deepen the call stack).

Will post code today or tomorrow.

mholt · 2019-10-25T00:10:28Z

Thank you again for your help! I figured it out, I believe.

Here is the implementation: https://github.com/caddyserver/caddy/pull/2831/files#diff-600bb63c5e5c442aa4ed554d687f2120

mholt · 2019-10-25T17:58:45Z

Actually, that solution seems to be appending each log entry with the previous one. An initial log, for example, would be:

{"method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62472"}

But then repeating the same request shows:

{"method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62472", "method": "POST", "uri": "/fooosdf", "remote": "127.0.0.1:62474"}

And then it just keeps multiplying.

Do you think that would be something to do with zap's pooling behavior? (I'm in the process of debugging and just wanted to check if this is something you've seen before.) (I am using the console encoder; these snippets are from the end of each line -- but it also happens with the JSON encoder.)

mholt · 2019-10-25T18:40:24Z

Ah... so the zap default encoders pool themselves (

zap/zapcore/json_encoder.go

Line 42 in 0065243

func getJSONEncoder() *jsonEncoder {

) when EncodeEntry is called.

I don't fully understand what is happening, but: because the encoders pool themselves, encoders get reused?

Changing my wrapper's EncodeEntry method from this:

for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

to this:

fe.wrapped = fe.wrapped.Clone()
for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)

eliminates the multiplying of fields and it works again. (Notably, fe does not have a pointer receiver, so it operates on shallow copies. However, using wrappedCopy := fe.wrapped.Clone() instead of fe.wrapped = ... causes the problem to emerge again.)

mfridman · 2019-12-14T19:09:22Z

@mholt did you end up finding an elegant way to retrieve fields from the underlying zap core? I'd imagine this is a common thing to want to do, but haven't found good example..

any other insight you may have since opening the issue?

mholt · 2019-12-14T23:24:23Z

@mfridman Yeah, I got this to work at an acceptable level of elegance. Current implementation is here: https://sourcegraph.com/github.com/caddyserver/caddy@v2/-/blob/modules/logging/filterencoder.go

Note the calls to filtered() which can access the underlying fields.

It's elegant enough since it only has to be done once.

Caddy 2's logging features are actually really neat, since all the heavy lifting like this is done -- you can do things with logs that you can't do in other web servers, thanks to zap.

prashantv · 2019-12-27T22:14:34Z

I don't fully understand what is happening, but: because the encoders pool themselves, encoders get reused?

Changing my wrapper's EncodeEntry method from this:
for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)
to this:
fe.wrapped = fe.wrapped.Clone()
for _, field := range fields {
	field.AddTo(fe)
}
return fe.wrapped.EncodeEntry(ent, nil)
eliminates the multiplying of fields and it works again. (Notably, fe does not have a pointer receiver, so it operates on shallow copies. However, using wrappedCopy := fe.wrapped.Clone() instead of fe.wrapped = ... causes the problem to emerge again.)

Sorry for the delay, just wanted to make sure this was explained: The EncodeEntry method is passed in fields which are only for that log message, not part of all future logs. The API for the Encoder is:

When any of the ObjectEncoder methods like AddString or any Add* are called, the encoder should be modified such that all future calls and clones include that field
When EncodeEntry is called, it's not supposed to add any fields to the encoder, it's just fields that are part of a single message, not for all future messages.

A simpler interface would have been to drop the []Field as part of EncodeEntry, and just have the caller (zap's core) call Clone() and then use the Add* fields, then call EncodeEntry without fields, and then drop the encoder. However, we thought it'd be possible to optimize the EncodeEntry method if it took fields, since you wouldn't need to do a full clone.

Thanks for sharing the filter encoder, and glad zap is working well for Caddy 2!

mholt closed this as completed Oct 25, 2019

bobheadxi mentioned this issue Apr 5, 2022

log: sensitive data redaction sourcegraph/sourcegraph-public-snapshot#33473

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elegant way to filter fields? #750

Elegant way to filter fields? #750

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019 •

edited

Loading

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019 •

edited

Loading

mholt commented Oct 24, 2019

mholt commented Oct 24, 2019

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019 •

edited

Loading

mholt commented Oct 25, 2019

mholt commented Oct 25, 2019 •

edited

Loading

mholt commented Oct 25, 2019 •

edited

Loading

mfridman commented Dec 14, 2019

mholt commented Dec 14, 2019

prashantv commented Dec 27, 2019

Elegant way to filter fields? #750

Elegant way to filter fields? #750

Comments

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019 • edited Loading

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019 • edited Loading

mholt commented Oct 24, 2019

mholt commented Oct 24, 2019

mholt commented Oct 24, 2019

jcorbin commented Oct 24, 2019

mholt commented Oct 24, 2019 • edited Loading

mholt commented Oct 25, 2019

mholt commented Oct 25, 2019 • edited Loading

mholt commented Oct 25, 2019 • edited Loading

mfridman commented Dec 14, 2019

mholt commented Dec 14, 2019

prashantv commented Dec 27, 2019

mholt commented Oct 24, 2019 •

edited

Loading

jcorbin commented Oct 24, 2019 •

edited

Loading

mholt commented Oct 24, 2019 •

edited

Loading

mholt commented Oct 25, 2019 •

edited

Loading

mholt commented Oct 25, 2019 •

edited

Loading