fix(file source): log a single warning when ignoring small files#863
Conversation
Closes #842 Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
That makes sense. I'm curious if this could be solved similarly to how we rate limit? Ex: warn!(message = "Ignoring file smaller than fingerprint_bytes", once = true);If that makes sense? |
|
It's something we could enhance our logging infra to know about, yeah. We'd need to introduce some fancier stuff like |
|
Sure. Point, counter point. What's wrong with setting a high rate limit 😄 ? Ex: every 5 minutes. |
|
Just setting a high rate limit won't take into account which file you're talking about, so it'd only ever tell you about the first small file it comes across. |
|
Really? I thought rate-limiting used the message to create window buckets? @LucioFranco am I wrong on that? |
|
@binarylogic no that is too inefficient I found, we use the callsite id that is statically generated. This type of solution is fine since only these log statements eat the cost. |
|
Ah, I didn't even know call site id was a thing. I'm curious if it would make sense to provide an optional second ID to create a composite ID with the call site ID. Ex: Just trying to help. This solution is fine if nothing I'm proposing is better. |
|
@binarylogic pretty much the issue is that from the rate limiter perspective it has to check every log statement against that and that is too inefficient. We could bake in specific things for this log statement but I dont think its worth it in this case. |
|
Agreed, I'd want to wait and see if this is something we want to use more places before spending the non-trivial amount of time and effort needed to support this in the rate limiter. Just not worth it right now. |
|
Makes sense. You do you. |
|
@lukesteensen thanks for adding this warning about small files... My test input file was tiny and nothing was happening... Now that this message has been added I got insight into why it was not working. |
|
Any chance to not log this for 0 byte files? |
|
Hey @karlseguin, we certainly could. Do you mind expanding on your use case a little more? We could consider an Alternatively, you could switch the fingerprinting strategy to [sources.file]
type = "file"
fingerprinting.strategy = "device_and_inode"You can read more about that here: https://docs.vector.dev/v/master/usage/configuration/sources/file#file-identification Let me know if that helps. |
|
I'm asking specifically about the warning. It just adds noise to our error logs when empty files get created. I realize this is a WARN, so I could just up our log level to ERROR...or I could, you know, use Vector to filter it out, but...The initial issue was:
While it might be technically correct to say that an empty file is being "ignored" (for lack of the required fingerprint bytes), since there's nothing to log, I think an warning about it being ignored is counter-productive. I think it's safe to silently ignore empty files (or drop the log level to INFO or DEBUG). Reproduce with this vector.toml: Then run vector: And run: What do I expect to see? Nothing |
Closes #842
This ended up being kind of annoying. The naive solution results in spamming log messages every time we glob for new files. Rate limiting that log doesn't seem like a great solution because I don't think there's a point in reminding the user every so often that a small file is still small. This message is really targeted at people manually testing out Vector by
echoing to a file and then waiting for something to happen. Since that really only needs to happen once, we keep a set of files we've already warned about being too small and use that to log exactly one time when we see a small file. 🤷♂️