Skip to content

VAST 2020.09.30

Compare
Choose a tag to compare
@dominiklohmann dominiklohmann released this 30 Sep 13:17
17fc510

We’re happy to announce the monthly release 2020.09.30 of VAST.

YAML Config

The VAST configuration file received a makeover: it now uses YAML syntax, the ops-friendly and industry standard. We ensured that the configuration and command line behave exactly the same by aligning the CLI hierarchy with the config file structure. VAST now looks for a vast.yaml configuration file instead of vast.conf. Every installation of VAST ships with a vast.yaml.example file that illustrates the new layout and serves as a reference for documentation options.

During startup, VAST looks for configuration files in the following places, and merges their content with the more specific files taking a higher precedence:

  • <sysconfdir>/vast/vast.yaml for system-wide configuration, where <sysconfdir> is the platform-specific directory for configuration files, e.g., /etc/vast.
  • ~/.config/vast/vast.yaml for user-specific configuration. VAST respects the XDG base directory specification and its environment variables.
  • The command line option --config=path/to/vast.yaml.

The top-level configuration file section vast bundles all options affecting VAST. Similarly, the top-level section caf contains all options that affect the underlying actor system framework CAF directly, allowing for more complex and sophisticated configurations.

Adding YAML support resulted in a new depency for VAST: yaml-cpp (≥0.6.2). This robust library provides a YAML 1.2 spec-compliant parser and printer, plus it enjoys wide availability on most platforms and package managers.

Index Optimizations

The layout of the on-disk data structures used for the index has changed. VAST divides the index state into horizontal partitions (aka. shards). Instead of creating one file per record field per partition, the index now creates only a single file per partition and dynamically maps the required parts into memory. Additionally, VAST no longer relies on the binary serialization protocol of CAF. Instead, a new FlatBuffers framing with better state versioning enables a reliable upgrade path when the on-disk format changes.

Moreover, VAST used to periodically re-write the whole state of the meta index to disk into a separate file. The rationale was that the contents of the meta index are much smaller than the contents of the index. However, for large databases even the much smaller meta index can grow to a size where this can disrupt disk I/O and slow down the indexing process. To prevent that, we’ve split up the information contained in the meta index and distributed it over all partitions, so every write is now limited to the incremental state since the previous partition.

Because I/O is such a delicate topic in data-intensive applications that must keep up with high-volume data sources, we also added a new asynchronous I/O abstraction to avoid blocking threads when they don’t have to. We’ve added a new filesystem actor that centralizes I/O operations, such as reads and writes. A nice side-effect is that it makes it dead-simply to support new filesystems in the future, e.g., HDFS or S3, by merely adding a new actor implementation that adheres to the same type-safe messaging API.

Better Introspection

We re-designed the output of the vast status command in a push for a better user experience. vast status now shows information about the system, grouped by its major components. By adding more flags, the command shows more details: vast status --detailed offers slightly more context, and --debug exposes a lot of internal state that is well-suited for developers.

Smaller Things

  • The new vast get <id> [ids...] command enables direct queries to the archive.
  • The JSON export format now renders the VAST duration and port as strings instead of numbers.
  • A new utility lsvast now ships with every VAST installation. It allows for inspecting the contents of the VAST database without running VAST.

Changelog Highlights

As always, you can find the full technical scoop of what changed in our changelog.

🎁 Features

  • The output of the status command was restructured with a strong focus on usability. The new flags --detailed and --debug add additional content to the output. #995
  • VAST now merges the contents of all used configuration files instead of using only the most user-specific file. The file specified using --config takes the highest precedence, followed by the user-specific path ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.yaml, and the compile-time path <sysconfdir>/vast/vast.yaml #1040
  • VAST now ships with a new tool lsvast to display information about the contents of a VAST database directory. See lsvast --help for usage instructions. #863
  • VAST now supports the XDG base directory specification: The vast.yaml is now found at ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.yaml, and schema files at ${XDG_DATA_HOME:-${HOME}/.local/share}/vast/schema/. The user-specific configuration file takes precedence over the global configuration file in <sysconfdir>/vast/vast.yaml. #1036

🧬 Experimental Features

  • The vast get command has been added. It retrieves events from the database directly by their IDs. #938

⚠️ Changes

  • All configuration options are now grouped into vast and caf sections, depending on whether they affect VAST itself or are handed through to the underlying actor framework CAF directly. Take a look at the bundled vast.yaml.example file for an explanation of the new layout. #1073
  • Data exported in the Apache Arrow format now contains the name of the payload record type in the metadata section of the schema. #1072
  • The JSON export format now renders duration and port fields using strings as opposed to numbers. This avoids a possible loss of information and enables users to re-use the output in follow-up queries directly. #1034
  • The delay between the periodic log messages for reporting the current event rates has been increased to 10 seconds. #1035
  • The global VAST configuration now always resides in <sysconfdir>/vast/vast.yaml, and bundled schemas always in <datadir>/vast/schema/. VAST no longer supports reading a configuration file in the current working directory. #1036
  • The options that affect batches in the import command received new, more user-facing names: import.table-slice-type, import.table-slice-size, and import.read-timeout are now called import.batch-encoding, import.batch-size, and import.batch-timeout respectively. #1058
  • The persistent storage format of the index now uses FlatBuffers. #863
  • The prioprietary VAST configuration file has changed to the more ops-friendly industry standard YAML. This change introduced also a new dependency: yaml-cpp version 0.6.2 or greater. The top-level vast.yaml.example illustrates how the new YAML config looks like. Please rename existing configuration files from vast.conf to vast.yaml. VAST still reads vast.conf but will soon only look for vast.yaml or vast.yml files in available configuration file paths. #1045 #1055 #1059 #1062
  • We refactored the index architecture to improve stability and responsiveness. This includes fixes for several shutdown issues. #863

🐞 Bug Fixes

  • Stalled sources that were unable to generate new events no longer stop import processes from shutting down under rare circumstances. #1058