New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
journald source fails to retrieve checkpoint #1081
Comments
Thanks @phyber, we'll investigate and see what's going on. |
I'm puzzled what might be happening here. There are actually two places where that error could appear, and neither make sense. In the first, it indicates a failure to read the cursor record previously saved to a file. The error is In the second, it indicates a failure to retrieve the cursor from the journald library. From my reading of the systemd code and online comments, this can only happen if the journal has not been opened and read from, but this code only happens after we try to read from the journal. Is it possible that the journal on your system is completely empty or that there is a permissions issue with your user reading from the journal? |
I was also getting this error. I managed to narrow it down, originally thinking it was related to reading/writing to the checkpoint.txt. It was erroring in the 'second' place (which you've since changed). You're latest changeset stops the error, however there is no output recorded from the sink.
I believe it is an issue around: This is happening on Amazon Linux 2 (systemd-219, vector-0.5) and Ubuntu 18.04 (systemd-237, vector built from source). |
It does look like that at first glance, but it's definitely the second location. One error string ends with a The journal on my system is definitely not empty, there are plenty of events to be seen in I'll be looking at this more today, so I'll have a play around with various permissions and see if that changes anything. Thanks. |
Sorry @phyber , I should have explained better. I agree with you and I am experiencing the exact same issue as you. My comments were for @bruceg to help debug the problem. I've determined permissions are not the problem either as I get the same result (no data from the source when there is clearly logs coming via |
Thanks @dbhowell. It was not my intention to mask the problem, but this does shed a lot more light on what might be going on. If The initialization for journald takes a flags parameter, which includes a bit named Try setting |
@bruceg I can confirm that setting I assume you will need to use You're new issue #1097 looks like you're well ahead of me. Thanks for persisting with this. Unfortunately this is well out of my expertise. |
PR #1105 should resolve this issue. Note that it replaces the |
This almost resolves the issue. I believe you had to add a filter match as well, such as:
and most likely a |
I don't see why a filter match is necessary. If we are seeking to the last boot (which is currently running), what does the filter accomplish? I implemented and tested it, and it made no difference on my system. |
See here: systemd/systemd#1752 Basically the seek works to the nearest first record of the boot, but does not guarantee the next records are from the same boot. |
Got it, see PR #1122. |
This is done then, right? |
I'm experiencing exactly the same issue running the latest code compiled on NixOS:
|
The two read states (saw-record vs at-end) are actually independent and not mutually exclusive, so representing them as an enum leads to flow control errors. This led to breaking a previous fix for issue #1081 (journald source fails to retrieve checkpoint). Signed-off-by: Bruce Guenter <bruce@untroubled.org>
…name (#1202) * Fix systemd dynamic library name The correct name for the systemd library is `libsystemd.so.0` and not `libsystemd.so`. Signed-off-by: Bruce Guenter <bruce@untroubled.org> * Rework journald read state logic The two read states (saw-record vs at-end) are actually independent and not mutually exclusive, so representing them as an enum leads to flow control errors. This led to breaking a previous fix for issue #1081 (journald source fails to retrieve checkpoint). Signed-off-by: Bruce Guenter <bruce@untroubled.org> * Break up the overly-long `JournaldServer::run` function Signed-off-by: Bruce Guenter <bruce@untroubled.org>
@knl could you confirm the fix just committed to master resolves this issue for you? |
@bruceg the failure is not there anymore. However, I see nothing being shipped to elasticsearch either. But that is probably a different issue. |
If you add a console sink such as the following to the config, does it show data coming from journald?
|
@bruceg I used the following config:
I run it as:
Nothing is displayed on the console. The system is NixOS. I'll investigate a bit during the weekend. |
That certainly looks like the logs are not reaching vector. Try running vector with the |
Sorry, I forgot to mention that, I did also run with |
I think the problem with the |
@phyber @knl thanks for your patience on this. And apologies for the rocky experience with this source. This could have been prevented with full integration tests (which we're working on). Journald integration is surprisingly more complicated than it appears. The woes of 0.x development. 😄 To update you, we've identified the issue and you can track progress on #1473. Things we're doing that will prevent regression:
We're hoping to have this completed in the next week, which depends on the solution we choose in #1473. |
This should be resolved by the resolution of #1473. If this is still a problem, please reopen this issue. |
Software Versions
Note that I'm running vector from the 0.5.0 RPM package, but the version is incorrectly reported.
I'm running Vector on the latest (at time of filing issue) Amazon Linux 2 in EC2. The instance is pretty much vanilla except for Puppet and Vector.
Vector Config
Issue
Hi,
There appears to be an issue with the
journald
source, but I'm hoping it's just me missing something.I'm expecting the above configuration to read from journald, filter out events from the puppet service, and write them to
/var/log/vector-puppet-%Y-%m-%d
.However, what actually happens is Vector throws an error regarding the journald checkpointing.
I believe this error originates here: https://github.com/timberio/vector/blob/c112c4ac7f45e69fea312e7691566a3f9e8e3066/src/sources/journald.rs#L221-L234
However, this is as far as I've got. Trying to follow the
cursor
code quickly disappears into thejournald
crate which calls out to some C code.Might I be missing some required systemd configuration? I've scanned the documentation and read the
vector.spec.toml
, and I believe the above config should "just work".Thanks.
The text was updated successfully, but these errors were encountered: