Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse nginx style syslog messages #1454

Closed
binarylogic opened this issue Dec 28, 2019 · 8 comments · Fixed by #1757
Closed

Parse nginx style syslog messages #1454

binarylogic opened this issue Dec 28, 2019 · 8 comments · Fixed by #1757
Labels
meta: good first issue Anything that is good for new contributors. source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@binarylogic
Copy link
Contributor

The syslog source conforms for the Syslog5424 spec, but not all tools follow it strictly. We should make a best effort to parse common Syslog formats, even if they deviate from the spec. An example of this is Nginx logs:

<190>Dec 28 16:49:07 plertrood-thinkpad-x220 nginx: 127.0.0.1 - - [28/Dec/2019:16:49:07 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0"

The syslog source should parse this.

@binarylogic binarylogic added source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality. meta: good first issue Anything that is good for new contributors. labels Dec 28, 2019
@StephenWakely
Copy link
Contributor

It looks like Nginx sends syslog messages as per RFC 3164.

There is a crate that parses 3164, including that nginx message. https://crates.io/crates/nom-syslog

A couple of issues with that crate are:

  • it is using quite an old version of nom (3.2).
  • if there is no year specified it defaults it to 2017. Really it should take the current year.

The owners may be happy to take some pull requests to fix this.

I imagine another option could be added to the syslog source to specify how the message should be parsed - "5424" or "3164". If the option isnt specified it could try 5424 first, and if this fails then try 3164. If it all fails then it could either drop or retain according to the drop_invalid option.

Let me know if you are happy for it to work like this and I'll be happy to have a go.

@binarylogic
Copy link
Contributor Author

binarylogic commented Dec 29, 2019

I'm personally in favor of supporting some version of 3164. 3164 is tough because it's not an actual specification for a format, just a review of common formats. Either way, I don't care too much about getting into those details, I care more about user expectations and providing a good user experience, which I think this contributes towards.

Regarding the library and it's dependencies, I'll let @LucioFranco chime in on that.

@StephenWakely
Copy link
Contributor

The RFC 3164 does specify at least some fields - timestamp and hostname so at least that should be largely attainable.

Annoyingly it does specify the timestamp format to exclude the year. Who thought that would be a good idea? It has occurred to me that just taking the current year would not work for messages around the new year. Really some logic needs to be used along the lines of :

if it is 1st jan and the date is 31st Dec, take the previous year.
Otherwise take the current year.

I've knocked up a quick parser using Nom that attempts to parse both formats and provides a function that can resolve the year: https://github.com/FungusHumungus/syslog-loose

It's incomplete, but seems fairly straightforward. If you would be interested in using this I would be happy to carry on working on it. Using a parser specifically for this project would allow more flexibility to stray from the spec to cater for more non compliant clients..

@LucioFranco
Copy link
Contributor

@FungusHumungus oh wow looks great already what you wrote! I think a lot of the current rust syslog crates are kinda lacking. I think if you're willing to flesh your crate out a bit more and port tests from some of the older libraries I think we'd be very interested in using it. From what I can tell our current syslog crate is a bit unmaintained but has seemed to work pretty well overall.

@StephenWakely
Copy link
Contributor

Awesome, I'll stay on the case!

@lukesteensen
Copy link
Member

I'd be in favor of a best-effort 3164 parser. Like Ben said, there's no real standard there but there is enough commonality that we can make a decent guess.

I took a quick look at syslog-loose and it's looking great so far, thanks @FungusHumungus! It'd be awesome to have one simple crate we could rely on to parse everything syslog.

@StephenWakely
Copy link
Contributor

Hi, so just an update. I think the parser is almost ready : https://github.com/FungusHumungus/syslog-loose/ if you want to take a look.

I want to get some property tests in place. Then the code needs a bit of tidying up and of course more documentation. Then it should be good to go.

How would you like to proceed?

Would it be best for me to push a crate up to crates.io and link Vector to it? I am happy to continue maintaining it for the forseeable and perhaps add someone from your organisation as a maintainer should I drop out..

Or would you prefer me to incorporate all the code into Vector?

@LucioFranco
Copy link
Contributor

@FungusHumungus I think its fine to keep it external and to go through crates.io, if we need to do something we can come to you for that. Once, you have it published I think it'd be good to open a PR here to integrate it and ensure it all works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: good first issue Anything that is good for new contributors. source: syslog Anything `syslog` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
4 participants