Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_nginx_log fails on empty referer #643

Closed
mtekel opened this issue Jan 16, 2024 · 1 comment · Fixed by #746
Closed

parse_nginx_log fails on empty referer #643

mtekel opened this issue Jan 16, 2024 · 1 comment · Fixed by #746

Comments

@mtekel
Copy link

mtekel commented Jan 16, 2024

Hello,

it seems pattern definition for nginx common log used by parse_nginx_log function expects non-empty referer: https://github.com/vectordotdev/vrl/blob/2b39353b3236e0aac26314ad47153238c52aa2ff/src/stdlib/log_util.rs#L136C1-L136C106

As it turns out, in practice, referer can be empty, see https://stackoverflow.com/questions/6880659/in-what-cases-will-http-referer-be-empty. E.g. when the enduser

entered the site URL in browser address bar itself.
visited the site by a browser-maintained bookmark.
visited the site as first page in a new window/tab/session, in some browsers.
clicked a link on a page having restrictive tag.
clicked a link on a page having restrictive Referrer-Policy header.
clicked a link having rel="noreferrer".
clicked a link in an external application (i.e. not a webbrowser, e.g. Flash).
switched from a https URL to a http URL.
has security software installed (antivirus/firewall/etc) which strips the referrer from all requests.
is behind a proxy which strips the referrer from all requests.
visited the site programmatically (like, curl) without setting the referrer header (bots!).

This means that any time we get client request with empty referer, vector fails to parse nginx log line. We do get thousands of these issues each day.

Example code for vrl playground (vrl 0.9.1, vector cebe6284).

Working:
https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoic3RydWN0dXJlZCA9IHBhcnNlX25naW54X2xvZyEoLm1lc3NhZ2UsXCJpbmdyZXNzX3Vwc3RyZWFtaW5mb1wiKVxuLiA9IG1lcmdlKC4sIHN0cnVjdHVyZWQpXG4iLCJldmVudCI6eyJtZXNzYWdlIjoiLSAtIC0gWzAzL09jdC8yMDIzOjE0OjIxOjM2ICswMDAwXSBcIlBPU1QgLyBIVFRQLzEuMVwiIDQ5OSAwIFwiLVwiIFwiLVwiIDExMjggMC4wMDMgW3NvbWUuYWRkcmVzcy5jb21dIFstXSBodHRwcyAwIDAuMDA0IDAwNSAxMC41My4xMzQuNDcifSwiaXNfanNvbmwiOmZhbHNlLCJlcnJvciI6bnVsbH0%3D

Broken:
https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoic3RydWN0dXJlZCA9IHBhcnNlX25naW54X2xvZyEoLm1lc3NhZ2UsXCJpbmdyZXNzX3Vwc3RyZWFtaW5mb1wiKVxuLiA9IG1lcmdlKC4sIHN0cnVjdHVyZWQpXG4iLCJldmVudCI6eyJtZXNzYWdlIjoiLSAtIC0gWzAzL09jdC8yMDIzOjE0OjIxOjM2ICswMDAwXSBcIlBPU1QgLyBIVFRQLzEuMVwiIDQ5OSAwIFwiXCIgXCItXCIgMTEyOCAwLjAwMyBbc29tZS5hZGRyZXNzLmNvbV0gWy1dIGh0dHBzIDAgMC4wMDQgMDA1IDEwLjUzLjEzNC40NyJ9LCJpc19qc29ubCI6ZmFsc2UsImVycm9yIjpudWxsfQ%3D%3D

@drmason13
Copy link
Contributor

Thanks for the report.
I can take a look at this one, it's simply swapping a + for a * in a regex (at first glance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants