Welcome to the repository of PreS Compaction.
This repository contains the Server code for the research, "Pre-Select files for compaction" (PreS).
We have integrated PreS into Apache IoTDB, an LSM-tree based time series database, which is an open-source platform with superior performance.
Pre-Select files for compaction strategy (PreS) is a dynamic compaction strategy to predict query patterns and select files for compaction. PreS is tailored for time series database. PreS can capture the incoming queries, analyze historical access information, extract the features of the captured queries, and generate samples using the temporal features of time series. Based on a machine learning model, PreS predicts the query patterns expected by users, which alleviates the issue of static methods failing to track query trends. The predicted query patterns will be used to guide the compaction process of LSM-tree. PreS predicts query patterns and evaluates compaction benefit to find a compromise between the number of files and read amplification, thereby enhancing the adaptability of compaction to queries and reducing query cost.
The following figure shows the architecture of PreS: Please refer to figure 3. The specific picture will be filled after the paper is accepted.
It consists of four core components: Query Collector, Query Pattern Predictor, Compaction Benefit Analyzer, and File Selector. For more specific details, please refer to the paper "PreS: A .....".
About Apache IoTDB:
IoTDB (Internet of Things Database) is a data management system for time series data, which can provide users specific services, such as, data collection, storage and analysis. Due to its light weight structure, high performance and usable features together with its seamless integration with the Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high throughput data input, and complex data analysis in the industrial IoT field.
Main features of IoTDB are as follows:
- Flexible deployment strategy. IoTDB provides users a one-click installation tool on either the cloud platform or the terminal devices, and a data synchronization tool bridging the data on cloud platform and terminals.
- Low cost on hardware. IoTDB can reach a high compression ratio of disk storage.
- Efficient directory structure. IoTDB supports efficient organization for complex time series data structure from intelligent networking devices, organization for time series data from devices of the same type, fuzzy searching strategy for massive and complex directory of time series data.
- High-throughput read and write. IoTDB supports millions of low-power devices' strong connection data access, high-speed data read and write for intelligent networking devices and mixed devices mentioned above.
- Rich query semantics. IoTDB supports time alignment for time series data across devices and measurements, computation in time series field (frequency domain transformation) and rich aggregation function support in time dimension.
- Easy to get started. IoTDB supports SQL-Like language, JDBC standard API and import/export tools which is easy to use.
- Seamless integration with state-of-the-practice Open Source Ecosystem. IoTDB supports analysis ecosystems such as, Hadoop, Spark, and visualization tool, such as, Grafana.
- For the latest information about IoTDB(https://iotdb.apache.org/), please visit IoTDB official website.
Please refer to the paper for the design concept of the components. The specific content will be supplemented after the paper is accepted.
The Java classes you can refer to for specific code are QueryMonitorYaos, in package org.apache.iotdb.db.engine.compaction.
The Java classes you can refer to for specific code are MLQueryAnalyzerYaos, in package org.apache.iotdb.db.engine.compaction;.
The Java classes you can refer to for specific code are YaosSizeCompactionSelector, in package org.apache.iotdb.db.engine.compaction.inner.sizetiered.
Function "selectLevelTask_byYaos_V1()" in the class describes how the compaction evaluator selects files from the disk and submits them for compaction.
The Java classes you can refer to for specific code are YaosSizeCompactionSelector, in package org.apache.iotdb.db.engine.compaction.
With the periodic trigger of the compaction operation, the function "selectLevelTask()" will trigger all the operations of PreS and ultimately submit the selected files for compaction.