Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Reader escaping special characters when parsing Web API Calls #12

Closed
ClintonCao opened this issue Oct 28, 2022 · 4 comments · Fixed by #14
Closed

CSV Reader escaping special characters when parsing Web API Calls #12

ClintonCao opened this issue Oct 28, 2022 · 4 comments · Fixed by #14
Assignees
Labels
enhancement New feature or request

Comments

@ClintonCao
Copy link

I noticed that the current CSV reader would escape special characters when parsing the row values I want to use as event symbols. Though I that security wise it is good to escape these characters, maybe we can first transform them to a string and then further process it? I think not escaping some of the special characters might be helpful for the use case if we want to learn a behavioural model from web API calls.

I'll provide an example of a model that is learned from HTTP events collected from a Kubernetes cluster.

Say we have the following data:

_source_source_ip _source_destination_ip _source_destination_port _source_query source_host_service destination_host_service
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka

And say I want to use the following columns to create the event symbol for FlexFringe: _source_destination_port, _source_query, _source_host_service, _destination_host_service. Then I will expect the following model to come out of FlexFringe:

image

But the actual that comes out of FlexFringe is the following:
image

It looks like the CSV reader ignores spaces and "/" when parsing the data

@ClintonCao ClintonCao added the enhancement New feature or request label Oct 28, 2022
@TCatshoek TCatshoek self-assigned this Oct 28, 2022
@TCatshoek
Copy link
Contributor

Working on this in the refactor-inputdata branch. Will update once it's ready to go!

@laxris
Copy link
Contributor

laxris commented Oct 28, 2022

Is it csv or abadingo? If I remember correctly, / is also a special character to seperate data from symbols and the remaining part gets just stored in a node...

@ClintonCao
Copy link
Author

ClintonCao commented Oct 28, 2022

It could be de abbadingo reader then. IIRC when I had a look with Tom last time, the CSV reader transforms the parsed data into abbadingo format and then passes it onto the abbandingo reader 🤔

@TCatshoek
Copy link
Contributor

Yeah the csv reader internally translates things to abbadingo format and then parses it as abbadingo. If any of the abbadingo delimiter characters are in the input data things can break. One of the goals of the ongoing refactor is to separate the csv parsing from abbadingo parsing completely so that this is no longer an issue

@TCatshoek TCatshoek linked a pull request Jan 2, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants