Skip to content

VAST 2020.01.31

Compare
Choose a tag to compare
@dominiklohmann dominiklohmann released this 04 Feb 09:35
879c41a

TL;DR

  • Apache Arrow export (+ experimental Python shim)
  • Optimized index implementation with 3x space reduction
  • Switch to CalVer and monthly release schedule

Notes

Dear community, we are pleased to announce the release of VAST 2020.01.31! As you can see, we switched to CalVer versioning. Moving forward, we plan to cut a release at the end of every month. In the last week before the release, we will focus on testing.

On the feature side, we have added support exporting data in the Apache Arrow format. This effectively connects VAST with the data science world. The focus is on in-memory columnar analytics. We also wrote a small Python shim to demonstrate this. See the /examples directory for a notebook.

Additionally, we implemented a new optimized index type based on hashing that yields a 3x space reduction, compared to normal string indexes. This index type can be selected by adding #index=hash to any type in the schema. The index supports (in)equality comparison only and does not work with substring queries. Internally, it computes a list of fingerprints over the data, plus a small satellite data structure resolves false positives and allows for building the complement.

Changelog

The following items reflect notable changes in reverse-chronological order:

  • 🔄 VAST is switching to a calendar-based versioning scheme starting with this release. #739

  • 🎁 When a record field has the #index=hash attribute, VAST will choose an optimized index implementation. This new index type only supports (in)equality queries and is therefore intended to be used with opaque types, such as unique identifiers or random strings. #632, #726

  • 🎁 An experimental new Python module enables querying VAST and processing results as pyarrow tables. #685

  • 🐞 A bug in the quoted string parser caused a parsing failure if an escape character occurred in the last position. #685

  • 🔄 Record field names can now be entered as quoted strings in the schema and expression languages. This lifts a restriction where JSON fields with whitespaces or special characters could not be ingested. #685

  • 🔄 Two minor modifications were done in the parsing framework: (i) the parsers for enums and records now allow trailing separators, and (ii) the dash (-) was removed from the allowed characters of schema type names. #706

  • 🐞 The example configuration file contained an invalid section vast. This has been changed to the correct name system. #705

  • 🐞 A race condition in the index logic was able to lead to incomplete or empty result sets for vast export. #703

  • 🔄 Build configuration defaults have been adapted for a better user experience. Installations are now relocatable by default, which can be reverted by configuring with --without-relocatable. Additionally, new sets of defaults named --release and --debug (renamed from --dev-mode) have been added. #695

  • 🎁 On FreeBSD, a VAST installation now includes an rc.d script that simplifies spinning up a VAST node. CMake installs the script at PREFIX/etc/rc.d/vast. #693

  • 🎁 The long option --config, which sets an explicit path to the VAST configuration file, now also has the short option -c. #689

  • 🎁 Added Apache Arrow as new export format. This allows users to export query results as Apache Arrow record batches for processing the results downstream, e.g., in Python or Spark. #633

  • 🐞 The import process did not print statistics when importing events over UDP. Additionally, warnings about dropped UDP packets are no longer shown per packet, but rather periodically reported in a readable format. #662

  • 🐞 Importing events over UDP with vast import <format> --listen :<port>/udp failed to register the accountant component. This caused an unexpected message warning to be printed on startup and resulted in losing import statistics. VAST now correctly registers the accountant. #655

  • 🐞 PCAP ingestion failed for traces containing VLAN tags. VAST now strips IEEE 802.1Q headers instead of skipping VLAN-tagged packets. #650

  • 🐞 In some cases it was possible that a source would connect to a node before it was fully initialized, resulting in a hanging vast import process. #647

  • 🎁 The import pcap command now takes an optional snapshot length via --snaplen. If the snapshot length is set to snaplen, and snaplen is less than the size of a packet that is captured, only the first snaplen bytes of that packet will be captured and provided as packet data. #642

  • 🔄 The import pcap command no longer takes interface names via --read,-r, but instead from a separate option named --interface,-i. This change has been made for consistency with other tools. #641