Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a couple of typos in the design doc #1350

Merged
merged 1 commit into from Jul 24, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/design.md
Expand Up @@ -29,7 +29,7 @@ The sidecar implements the gRPC service on top of Prometheus' [HTTP and remote-r

#### Metric Data Backup

Data sources that persist their data for long-term storage do so via the Prometheus 2.0 storage engine. The storage engine periodically produces immutable blocks of data for a fixed time range. A block is a directory with a handful of larger files containing all sample data and peristed indices that are required to retrieve the data:
Data sources that persist their data for long-term storage do so via the Prometheus 2.0 storage engine. The storage engine periodically produces immutable blocks of data for a fixed time range. A block is a directory with a handful of larger files containing all sample data and persisted indices that are required to retrieve the data:

```bash
01BX6V6TY06G5MFQ0GPH7EMXRH
Expand Down Expand Up @@ -69,19 +69,19 @@ The meta.json is updated during upload time on sidecars.

A store node acts as a gateway to block data that is stored in an object storage bucket. It implements the same gRPC API as data sources to provide access to all metric data found in the bucket.

It continuously synchronizes which blocks exist in the bucket and translates requests for metric data into object storage requests. It implements various strategies to minimize the number of requests to the object storage such as filtering relevant blocks by their meta data (e.g. time range and labels) and caching frequent index lookups.
It continuously synchronizes which blocks exist in the bucket and translates requests for metric data into object storage requests. It implements various strategies to minimize the number of requests to the object storage such as filtering relevant blocks by their metadata (e.g. time range and labels) and caching frequent index lookups.

The Prometheus 2.0 storage layout is optimized for minimal read amplification. For example, sample data for the same time series is sequentially aligned in a chunk file. Similarly, series for the same metric name are sequentially aligned as well.
The store node is aware of the files' layout and translates data requests into a plan of a minimum amount of object storage request. Each requests may fetch up to hundreds of thousands of chunks at once. This is essential to satisfy even big queries with a limited amount of requests to the object storage.
The store node is aware of the files' layout and translates data requests into a plan of a minimum amount of object storage request. Each request may fetch up to hundreds of thousands of chunks at once. This is essential to satisfy even big queries with a limited amount of requests to the object storage.

Currently only index data is cached. Chunk data could be cached but is orders of magnitude larger in size. In the current state, fetching chunk data from the object storage already only accounts for a small fraction of end-to-end latency. Thus, there's currently no incentive to increase the store nodes resource requirements/limit its scalability by adding chunk caching.

### Stores & Data Sources - It's all the same

Since store nodes and data sources expose the same gRPC Store API, clients can largely treat them as equivalent and don't have to concern with which specific component they are querying.
Since store nodes and data sources expose the same gRPC Store API, clients can largely treat them as equivalent and don't have to be concerned with which specific component they are querying.
Each implementer of the Store API advertise meta information about the data they provide. This allows clients to minimize the set of nodes they have to fan out to, to satisfy a particular data query.

In it's essence the Store API allows to look up data by a set of label matchers (as known from PromQL), and a time range. It returns compressed chunks of samples as they are found in the block data. It is purely a data retrieval API and does _not_ provide complex query execution.
In its essence, the Store API allows to look up data by a set of label matchers (as known from PromQL), and a time range. It returns compressed chunks of samples as they are found in the block data. It is purely a data retrieval API and does _not_ provide complex query execution.

```
┌──────────────────────┐ ┌────────────┬─────────┐ ┌────────────┐
Expand Down Expand Up @@ -133,7 +133,7 @@ Based on the metadata of store and source nodes, they attempt to minimize the re

### Compactor

The compactor is a singleton process that does not participate in the Thanos cluster. Instead it is only pointed at an object storage bucket and continously consolidates multiple smaller blocks into larger ones. This significantly reduces total storage size in the bucket, the load on store nodes and the amount of requests required to fetch data for a query from the bucket.
The compactor is a singleton process that does not participate in the Thanos cluster. Instead it is only pointed at an object storage bucket and continuously consolidates multiple smaller blocks into larger ones. This significantly reduces total storage size in the bucket, the load on store nodes and the amount of requests required to fetch data for a query from the bucket.

In the future, the compactor may do additional batch processing such as down-sampling and applying retention policies.

Expand All @@ -143,7 +143,7 @@ None of the Thanos components provides any means of sharding. The only explicitl

Store, rule, and compactor nodes are all expected to scale significantly within a single instance or high availability pair. Similar to Prometheus, functional sharding can be applied for rare cases in which this does not hold true.

For example, rule sets can be divided across multiple HA pairs of rule nodes. Store nodes likely are subject to functional sharding regardless by assigning dedicated buckets per region/datecenter.
For example, rule sets can be divided across multiple HA pairs of rule nodes. Store nodes likely are subject to functional sharding regardless by assigning dedicated buckets per region/datacenter.

Overall, first-class horizontal sharding is possible but will not be considered for the time being since there's no evidence that it is required in practical setups.

Expand Down