Skip to content

fix(file source): log a single warning when ignoring small files#863

Merged
lukesteensen merged 1 commit into
masterfrom
warn-about-smalls
Sep 13, 2019
Merged

fix(file source): log a single warning when ignoring small files#863
lukesteensen merged 1 commit into
masterfrom
warn-about-smalls

Conversation

@lukesteensen
Copy link
Copy Markdown
Member

Closes #842

This ended up being kind of annoying. The naive solution results in spamming log messages every time we glob for new files. Rate limiting that log doesn't seem like a great solution because I don't think there's a point in reminding the user every so often that a small file is still small. This message is really targeted at people manually testing out Vector by echoing to a file and then waiting for something to happen. Since that really only needs to happen once, we keep a set of files we've already warned about being too small and use that to log exactly one time when we see a small file. 🤷‍♂️

Closes #842

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@binarylogic
Copy link
Copy Markdown
Contributor

we keep a set of files we've already warned about being too small and use that to log exactly one time when we see a small file.

That makes sense. I'm curious if this could be solved similarly to how we rate limit? Ex:

warn!(message = "Ignoring file smaller than fingerprint_bytes", once = true);

If that makes sense?

@lukesteensen
Copy link
Copy Markdown
Member Author

It's something we could enhance our logging infra to know about, yeah. We'd need to introduce some fancier stuff like once = true and keying on the file param. I figured it wasn't worth the time right now.

@binarylogic
Copy link
Copy Markdown
Contributor

Sure. Point, counter point. What's wrong with setting a high rate limit 😄 ? Ex: every 5 minutes.

@lukesteensen
Copy link
Copy Markdown
Member Author

Just setting a high rate limit won't take into account which file you're talking about, so it'd only ever tell you about the first small file it comes across.

@binarylogic
Copy link
Copy Markdown
Contributor

Really? I thought rate-limiting used the message to create window buckets? @LucioFranco am I wrong on that?

@LucioFranco
Copy link
Copy Markdown
Contributor

@binarylogic no that is too inefficient I found, we use the callsite id that is statically generated. This type of solution is fine since only these log statements eat the cost.

@binarylogic
Copy link
Copy Markdown
Contributor

Ah, I didn't even know call site id was a thing. I'm curious if it would make sense to provide an optional second ID to create a composite ID with the call site ID. Ex: [callsite_id, file_path]. Still too inefficient?

Just trying to help. This solution is fine if nothing I'm proposing is better.

@LucioFranco
Copy link
Copy Markdown
Contributor

@binarylogic pretty much the issue is that from the rate limiter perspective it has to check every log statement against that and that is too inefficient. We could bake in specific things for this log statement but I dont think its worth it in this case.

@lukesteensen
Copy link
Copy Markdown
Member Author

Agreed, I'd want to wait and see if this is something we want to use more places before spending the non-trivial amount of time and effort needed to support this in the rate limiter. Just not worth it right now.

@binarylogic
Copy link
Copy Markdown
Contributor

Makes sense. You do you.

@lukesteensen lukesteensen merged commit b9a7812 into master Sep 13, 2019
@lukesteensen lukesteensen deleted the warn-about-smalls branch September 13, 2019 14:50
@stormasm
Copy link
Copy Markdown

@lukesteensen thanks for adding this warning about small files...
I could not figure out why my source file was not working...

My test input file was tiny and nothing was happening...

Now that this message has been added I got insight into why it was not working.
I made my file bigger than 256 bytes and everything started working...
Thank you !

@karlseguin
Copy link
Copy Markdown

Any chance to not log this for 0 byte files?

@binarylogic
Copy link
Copy Markdown
Contributor

Hey @karlseguin, we certainly could. Do you mind expanding on your use case a little more? We could consider an ignore_empty option that would ignore empty files.

Alternatively, you could switch the fingerprinting strategy to device_and_inode:

[sources.file]
  type = "file"
  fingerprinting.strategy = "device_and_inode"

You can read more about that here:

https://docs.vector.dev/v/master/usage/configuration/sources/file#file-identification

Let me know if that helps.

@karlseguin
Copy link
Copy Markdown

I'm asking specifically about the warning. It just adds noise to our error logs when empty files get created. I realize this is a WARN, so I could just up our log level to ERROR...or I could, you know, use Vector to filter it out, but...The initial issue was:

We need to communicate to users when a file is ignored for whatever reason.

While it might be technically correct to say that an empty file is being "ignored" (for lack of the required fingerprint bytes), since there's nothing to log, I think an warning about it being ignored is counter-productive. I think it's safe to silently ignore empty files (or drop the log level to INFO or DEBUG).

Reproduce with this vector.toml:

data_dir = "/tmp"

[sources.test_in]
  type = "file"
  ignore_older = 172800
  include = ["test.log"]

[sinks.text_out]
  inputs = ["test_in"]
  type = "console"

Then run vector:

./vector --config vector.toml

And run:

touch test.log

What do I expect to see? Nothing
What do I actually see? "WARN source{name="test_in"}:file_server: file_source::file_server: Ignoring file smaller than fingerprint_bytes file="test.log""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add log message if file is ignored due to too little content

5 participants