Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 50 additions & 35 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,76 @@
# Architecture

## High Level Architecture
## Overview

The following diagram depicts the high level components of Timeplus core engine.
The diagram below illustrates the high-level components of the Timeplus core engine. The following sections explain how these components work together as a unified system.

![Architecture](/img/proton-high-level-arch.gif)

### The Flow of Data
## Data Flow

#### Ingest
### Ingest

When data is ingested into Timeplus, it first lands in the NativeLog. As soon as the log commit completes, the data becomes immediately available for streaming queries.
When data is ingested into Timeplus, it first lands in the **NativeLog**. As soon as the log commit completes, the data becomes instantly available for streaming queries.

In the background, dedicated threads continuously tail new entries from the NativeLog and flush them to the Historical Store in larger, optimized batches.
In the background, dedicated threads continuously tail new entries from the NativeLog and flush them into the **Historical Store** in optimized, larger batches.

#### Query
### Query

Timeplus supports three query modes: **historical**, **streaming**, and **hybrid (streaming + historical)**.
Timeplus supports three query models: **historical**, **streaming**, and **hybrid (streaming + historical)**.

- **Historical Query (a.k.a. Table Query)**
- **Historical Query (Table Query)**
Works like a traditional database query. Data is read directly from the **Historical Store**, leveraging standard database optimizations for efficient lookups and scans:
- Primary index
- Skipping index
- Secondary index
- Bloom filter
- Partition pruning

Works like a traditional database query. Data is fetched directly from the **Historical Store**, and all standard database optimizations like the following apply. These optimizations accelerate large-scale scans and point lookups, making historical queries fast and efficient.
- Primary index
- Skipping index
- Secondary index
- Bloom filter
- Partition pruning
- **Streaming Query**
Operates on the **NativeLog**, where records are strictly ordered. Queries run incrementally, enabling real-time workloads such as **incremental ETL**, **joins**, and **aggregations**.

- **Streaming Query**
- **Hybrid Query**
Combines the best of both worlds. A streaming query can automatically **backfill** from the Historical Store when:
1. Data has expired from the NativeLog (due to retention).
2. Reading from the Historical Store is faster than rewinding and replaying from the NativeLog.

Operates on the **NativeLog**, which stores records in sequence. Queries run incrementally, enabling real-time processing patterns such as **incremental ETL**, **joins**, and **aggregations**.
This eliminates the need for an external batch system, avoiding the extra **latency, inconsistency, and cost** usually associated with maintaining separate batch and streaming pipelines.

- **Hybrid Query**
## Dural Storage

Streaming queries can automatically **backfill** from the Historical Store when:
1. Data no longer exists in the NativeLog (due to retention policies).
2. Pulling from the Historical Store is faster than rewinding the NativeLog to replay old events.
### NativeLog

This allows seamless handling of scenarios like **fast backfill** and **mixed real-time + historical analysis** without breaking query continuity and also don't need yet another external batch system to load the historical data which usually introduce worse latency, inconsitency and cost.

### The Dural Storage

#### NativeLog

The **Timeplus NativeLog** is the system’s write-ahead log (WAL) or journal: an append-only, high-throughput store optimized for low-latency, highly concurrent data ingestion. In a cluster deployment, it is replicated using **Multi-Raft** for fault tolerance. By enforcing a strict ordering of records, NativeLog forms the backbone of streaming processing in **Timeplus Core**.
The **Timeplus NativeLog** is the system’s write-ahead log (WAL) or journal: an append-only, high-throughput store optimized for low-latency, highly concurrent data ingestion. In a cluster deployment, it is replicated using **Multi-Raft** for fault tolerance. By enforcing a strict ordering of records, NativeLog forms the backbone of streaming processing in **Timeplus Core**, it is also the building block of other internal components like the repliated meta store in Timeplus.

NativeLog uses its own record format, consisting of two high-level types:

- **Control records** (a.k.a. meta records) – store metadata and operational information.
- **Data records** – columnar-encoded for fast serialization/deserialization and efficient vectorized streaming execution.

Each record is assigned a monotonically increasing sequence numbersimilar to a Kafka offsetwhich guarantees ordering.
Each record is assigned a monotonically increasing sequence numbersimilar to a Kafka offsetwhich guarantees ordering.

Lightweight indexes are maintained to support rapid rewind and replay operations by **timestamp** or **sequence number** in streaming queries.
Lightweight indexes are maintained to support rapid rewind and replay operations by **timestamp** or **sequence number** for streaming queries.

#### Historical Store
### Historical Store

The **Historical Store** in Timeplus stores data **derived** from the **NativeLog**. It powers use cases such as:

- **Historical queries** (a.k.a. *table queries* in Timeplus)
- **Fast backfill** into streaming queries
- Acting as a **serving layer** for downstream applications
- Acting as a **serving layer** for applications

Timeplus supports two storage encodings for the Historical Store: **columnar** and **row**.

##### 1. Columnar Encoding (*Append Stream*)
Optimized for **append-most workloads** with minimal data mutation, such as telemetry or event logs. Benefits include:
#### 1. Columnar Encoding (*Append Stream*)
Optimized for **append-most workloads** with minimal data mutation, such as telemetry or events, logs, metrics etc. Benefits include:

- High data compression ratios
- Blazing-fast scans for analytical workloads
- Backed by the **ClickHouse MergeTree** engine

This format is ideal when the dataset is largely immutable and query speed over large volumes is a priority.

##### 2. Row Encoding (*Mutable Stream*)
#### 2. Row Encoding (*Mutable Stream*)
Designed for **frequently updated datasets** where `UPSERT` and `DELETE` operations are common. Features include:

- Per-row **primary indexes**
Expand All @@ -82,6 +79,24 @@ Designed for **frequently updated datasets** where `UPSERT` and `DELETE` operati

Row encoding is the better choice when low-latency, high-frequency updates are required.

## External Storage

Timeplus natively connects to external storage systems through **External Streams** and **External Tables**, giving you flexibility in how data flows in and out of the platform.

- **Ingest from External Systems**
Stream data directly from Kafka, Redpanda, or Pulsar into Timeplus. Use **Materialized Views** for incremental processing (e.g., ETL, filtering, joins, aggregations).

- **Send Data to External Systems**
Push processed results downstream to systems like **ClickHouse** for analytics or long-term storage.

- **Keep Data Inside Timeplus**
Store **Materialized View outputs** in Timeplus itself to serve client queries with low latency.

- **Raw Data Pipelines**
Ingest and persist raw data in Timeplus, then build end-to-end pipelines for **filtering, transforming, and shaping** the data—serving both **real-time** and **historical** queries from a single platform.

This flexible integration model lets you decide whether Timeplus acts as a **processing engine**, a **serving layer**, or the **primary data hub** in your stack.

## References

[How Timeplus Unifies Streaming and Historical Data Processing](https://www.timeplus.com/post/unify-streaming-and-historical-data-processing)
1 change: 1 addition & 0 deletions docs/cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Cluster