Skip to content

VAST 2020.08.28

Compare
Choose a tag to compare
@dominiklohmann dominiklohmann released this 28 Aug 09:07
b605059

We’re happy to announce the monthly release 2020.08.28 of our stack.

Robustness and State Recovery

We found several bugs during the shutdown process of a VAST server process, which could have caused an unresponsive process and potential loss of state. VAST now uses a multi-stage procedure to terminate itself: first attempt to shutdown all components cleanly, falling back to a hard kill afterwards, and if that fails with another timeout, the process will call abort(3).

In stress testing, we identified and fixed issues with operating VAST under high load: For large database directories, a partial read during startup corrupted the index state. We fixed both the reading behavior that led to partial reads and the possible corruption. An overflow in CAF's stream slot identifiers could deadlock the system. We deployed a workaround and have proposed a proper fix upstream.

To avoid multiple VAST processes accessing the same database directory, VAST now atomically creates a PID lock file in the database directory on startup. This ensures that at most one VAST server process can operate the persistent state.

Straightening the Data Model

The vector type has been renamed to list. In an effort to streamline the type system vocabulary, we favor list over vector because it’s closer to terminology in the ecosystem (e.g., Apache Arrow). This change requires updating existing schemas by changing vector<T> to list<T>.

Additionally, the set type has been removed. Experience with the data model showed that there is no strong use case to separate sets from lists in the VAST core. While a set data type proves useful in programming languages, VAST deals with immutable data where set constraints have been enforced upon generating the data. This change requires updating existing schemas by changing set<T> to list<T>. In the query language, the symbol for the empty map changed from {-} to {}, as it now unambiguously identifies map instances.

Changelog Highlights

As always, you can find the full technical scoop of what changed in our changelog.

🎁 Features

  • The default schema for Suricata has been updated to support the suricata.ftp and suricata.ftp_data event types. #1009
  • VAST now writes a PID lock file on startup to prevent multiple server processes from accessing the same persistent state. The pid.lock file resides in the vast.db directory. #1001

⚠️ Changes

  • The vector type has been renamed to list. In an effort to streamline the type system vocabulary, we favor list over vector because it's closer to existing terminology (e.g., Apache Arrow). This change requires updating existing schemas by changing vector<T> to list<T>. #1016
  • The set type has been removed. Experience with the data model showed that there is no strong use case to separate sets from vectors in the core. While this may be useful in programming languages, VAST deals with immutable data where set constraints have been enforced upstream. This change requires updating existing schemas by changing set<T> to vector<T>. In the query language, the new symbol for the empty map changed from {-} to {}, as it now unambiguously identifies map instances. #1010

🐞 Bug Fixes

  • VAST did not terminate when a critical component failed during startup. VAST now binds the lifetime of the node to all critical components. #1028
  • VAST would overwrite existing on-disk state data when encountering a partial read during startup. This state-corrupting behavior no longer exists. #1026
  • Incomplete reads have not been handled properly, which manifested for files larger than 2GB. On macOS, writing files larger than 2GB may have failed previously. VAST now respects OS-specific constraints on the maximum block size. #1025
  • The shutdown process of the server process could potentially hang forever. VAST now uses a 2-step procedure that first attempts to terminate all components cleanly. If that fails, it will attempt a hard kill afterwards, and if that fails after another timeout, the process will call abort(3). #1005
  • When running VAST under heavy load, CAF stream slot ids could wrap around after a few days and deadlock the system. As a workaround, we extended the slot id bit width to make the time until this happens unrealistically large. #1020
  • Some file descriptors remained open when they weren't needed any more. This descriptor leak has been fixed. #1018
  • Importing JSON no longer fails for JSON fields containing null when the corresponding VAST type in the schema is a non-trivial type like vector<string>. #1009