VAST 2020.08.28
We’re happy to announce the monthly release 2020.08.28 of our stack.
Robustness and State Recovery
We found several bugs during the shutdown process of a VAST server process, which could have caused an unresponsive process and potential loss of state. VAST now uses a multi-stage procedure to terminate itself: first attempt to shutdown all components cleanly, falling back to a hard kill afterwards, and if that fails with another timeout, the process will call abort(3)
.
In stress testing, we identified and fixed issues with operating VAST under high load: For large database directories, a partial read during startup corrupted the index state. We fixed both the reading behavior that led to partial reads and the possible corruption. An overflow in CAF's stream slot identifiers could deadlock the system. We deployed a workaround and have proposed a proper fix upstream.
To avoid multiple VAST processes accessing the same database directory, VAST now atomically creates a PID lock file in the database directory on startup. This ensures that at most one VAST server process can operate the persistent state.
Straightening the Data Model
The vector
type has been renamed to list
. In an effort to streamline the type system vocabulary, we favor list
over vector
because it’s closer to terminology in the ecosystem (e.g., Apache Arrow). This change requires updating existing schemas by changing vector<T>
to list<T>
.
Additionally, the set
type has been removed. Experience with the data model showed that there is no strong use case to separate sets from lists in the VAST core. While a set data type proves useful in programming languages, VAST deals with immutable data where set constraints have been enforced upon generating the data. This change requires updating existing schemas by changing set<T>
to list<T>
. In the query language, the symbol for the empty map
changed from {-}
to {}
, as it now unambiguously identifies map
instances.
Changelog Highlights
As always, you can find the full technical scoop of what changed in our changelog.
🎁 Features
- The default schema for Suricata has been updated to support the
suricata.ftp
andsuricata.ftp_data
event types. #1009 - VAST now writes a PID lock file on startup to prevent multiple server processes from accessing the same persistent state. The
pid.lock
file resides in thevast.db
directory. #1001
⚠️ Changes
- The
vector
type has been renamed tolist
. In an effort to streamline the type system vocabulary, we favorlist
overvector
because it's closer to existing terminology (e.g., Apache Arrow). This change requires updating existing schemas by changingvector<T>
tolist<T>
. #1016 - The
set
type has been removed. Experience with the data model showed that there is no strong use case to separate sets from vectors in the core. While this may be useful in programming languages, VAST deals with immutable data where set constraints have been enforced upstream. This change requires updating existing schemas by changingset<T>
tovector<T>
. In the query language, the new symbol for the emptymap
changed from{-}
to{}
, as it now unambiguously identifiesmap
instances. #1010
🐞 Bug Fixes
- VAST did not terminate when a critical component failed during startup. VAST now binds the lifetime of the node to all critical components. #1028
- VAST would overwrite existing on-disk state data when encountering a partial read during startup. This state-corrupting behavior no longer exists. #1026
- Incomplete reads have not been handled properly, which manifested for files larger than 2GB. On macOS, writing files larger than 2GB may have failed previously. VAST now respects OS-specific constraints on the maximum block size. #1025
- The shutdown process of the server process could potentially hang forever. VAST now uses a 2-step procedure that first attempts to terminate all components cleanly. If that fails, it will attempt a hard kill afterwards, and if that fails after another timeout, the process will call
abort(3)
. #1005 - When running VAST under heavy load, CAF stream slot ids could wrap around after a few days and deadlock the system. As a workaround, we extended the slot id bit width to make the time until this happens unrealistically large. #1020
- Some file descriptors remained open when they weren't needed any more. This descriptor leak has been fixed. #1018
- Importing JSON no longer fails for JSON fields containing
null
when the corresponding VAST type in the schema is a non-trivial type likevector<string>
. #1009