Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve usability of CSV format #2336

Merged
merged 7 commits into from Jun 10, 2022
Merged

Improve usability of CSV format #2336

merged 7 commits into from Jun 10, 2022

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Jun 9, 2022

Three changes:

  • Add a vast import csv --separator=... option that defaults to ,, and tests it for space and tab separators.
  • Render enums in their canonical string representation instead of their internal numerical representation.
  • Import additional columns in the csv import as strings instead of crashing.

馃摑 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

馃幆 Review Instructions

@rdettai can you give this a try since you requested these?

For the actual code reviewer, take a look at it commit-by-commit.

@dominiklohmann dominiklohmann added enhancement bug Incorrect behavior labels Jun 9, 2022
@dominiklohmann dominiklohmann requested a review from a team June 9, 2022 10:47
Copy link
Contributor

@dispanser dispanser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I tested import (tsv, ssv) and export.

Thanks in particular for the extensive integration test coverage!

Copy link
Contributor

@rdettai rdettai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following error when I use vast import --type=aws.flowlogs csv --separator=" ":

  • vast.format.csv.reader encountered invalid vast.import.csv.separator ''; must be a single character

But the import seems to have worked fine...
Any Idea what might be happening?

@dominiklohmann
Copy link
Member Author

vast import --type=aws.flowlogs csv --separator=" "

That's an issue with how CAF parses the command line. If you use this instead it works:

vast import --type=aws.flowlogs csv '--separator=" "'

@dominiklohmann
Copy link
Member Author

@rdettai and I went over this in a call and discovered another issue with the CSV parser that is independent of the issues solved by this PR. I plan to take a further look tomorrow.

@rdettai
Copy link
Contributor

rdettai commented Jun 9, 2022

yes, just to track the full context here: without the quotes arount --separator=" " it actually ignored the option altogether and inserted everything into a single column 馃槃. More concretely:

  • my source file looks like this:
> cat source.csv
version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
2 462855639117 eni-0a0c4004d2115a27f - - - - - - - 1654735756 1654735786 - NODATA
  • the schema I have matches these columns
  • when I import this as CSV (with "," as separator),
    • the importer sees this file as one column (that is expected)
    • the importer accepts the event in this file and puts its content into the a single column. That shouldn't happen because the name of that mega column (the concatenation version account-id interface-id ... action log-status) does not match any field in the schema
> vast export csv
aws.flowlogs,"2 462855639117 eni-0a0c4004d2115a27f - - - - - - - 1654735756 1654735786 - NODATA"

@dominiklohmann dominiklohmann merged commit ac3c8c2 into master Jun 10, 2022
@dominiklohmann dominiklohmann deleted the topic/csv-fixes branch June 10, 2022 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
4 participants