Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Splunk "cooked" data #3848

Open
binarylogic opened this issue Sep 13, 2020 · 3 comments
Open

Support Splunk "cooked" data #3848

binarylogic opened this issue Sep 13, 2020 · 3 comments
Labels
have: nice This feature is nice to have. It is low priority. needs: approval Needs review & approval before work can begin. needs: more demand Needs more demand before work can begin, +1 or comment to support. needs: requirements Needs a a list of requirements before work can be begin provider: splunk Anything `splunk` service provider related type: feature A value-adding code addition that introduce new functionality.

Comments

@binarylogic
Copy link
Contributor

I've seen users request the ability to accept Splunk "cooked" data in various issue trackers and forums. This style of data can be received by other upstream Splunk forwarders. Decoding this data would make Vector unique for use cases where this data must be decoded and forwarded to non-Splunk destinations. It looks like this:

"--splunk-cooked-mode-v3--\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000e4a2da812b43\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u00008089\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000@\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0013__s2s_capabilities\u0000\u0000\u0000\u0000\u0014ack=0;compression=0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0005_raw\u0000"

There are a few things we need to do before spending time on this:

  1. This type of data is proprietary and undocumented. As such, we should verify that we are allowed to decode this data.
  2. We should try to understand the use case better. Splunk offers the ability to forward "uncooked" data. I'd like to know more about situations where a user can't opt-out of cooked data.
@binarylogic binarylogic added needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin provider: splunk Anything `splunk` service provider related have: nice This feature is nice to have. It is low priority. type: feature A value-adding code addition that introduce new functionality. needs: more demand Needs more demand before work can begin, +1 or comment to support. labels Sep 13, 2020
@binarylogic binarylogic changed the title Support splunk "cooked" data Support Splunk "cooked" data Oct 8, 2020
@MadsRC
Copy link

MadsRC commented Oct 10, 2021

In the world of Splunk, the question of cooked vs uncooked is to choose between Splunk proprietary protocol (S2S AKA cooked) or just sending raw data over a TCP connection (uncooked).

The benefit of using cooked data is that you get to include metadata. Suppose I have a Universal Forwarder installed on a Linux machine, configured to 2 types of log: Journal and Apache Access logs.
In my Splunk config, I'd set up 2 sources, point them to the files to monitor and provide each of the sources with a source type. For Journal logs it would be something like "linux" and for Apache access logs it would be something like "apache-weblog" (I'm sorry, I don't have a copy of a real-world config with me).

If I ship those 2 sources as uncooked data, the receiver would simply receive the contents of the files and it is up to the receiver to guess whether a piece of data is from Journal or from an Apache Access log. To get around this, we could have the forwarder ship the Journal contents over one connection and the Apache access logs over another differentiated by destination ports - However, I hope you agree that this wouldn't scale very well.

Now, if we ship the data as cooked data, we'll ship the source type (and some other data) alongside the raw data to the receiver.

In general, a Splunk user is never forced by Splunk to use uncooked data and is generally advised to use S2S/cooked data.

One usecase for uncooked data that I've encountered was when building a new log collection pipeline. We had Splunk as the data repo and we had Splunk agents installed on all endpoints. But for Reasons(TM), we wanted the log pipeline to not be Splunk software. In this case, since Vector didn't support S2S (#2537) we ended up with tons of Vector instances receiving raw TCP from Splunk (as uncooked data). It wasn't pretty and didn't scale well... but it worked.

@jszwedko
Copy link
Member

It was discovered in #11292 that the Splunk UF's httpout protocol is just a wrapper around the same cooked protocol used for tcpout so we'll need this to support the httpout output for Splunk as well.

@mapennell
Copy link

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
have: nice This feature is nice to have. It is low priority. needs: approval Needs review & approval before work can begin. needs: more demand Needs more demand before work can begin, +1 or comment to support. needs: requirements Needs a a list of requirements before work can be begin provider: splunk Anything `splunk` service provider related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

4 participants