# Aggregator Verification Benchmark

This benchmark tests our data aggregation system to ensure it's accurate,
consistent, and fast.

#### Accuracy

We check if the final aggregated data is correct.

Here's how it works: As new events are generated, we keep a live, in-memory
count of the metrics, which serves as our "ground truth." Once all events have
been processed, we query the final aggregated data and compare it against our
ground truth count. If the numbers don't match, this report will flag the
discrepancies.

#### Consistency

The final metrics in the database must perfectly match the live, in-memory
count.

If the values are different, it likely points to a bug causing the system to
write data inconsistently.

Note: The test runs on a single machine, which can sometimes cause a temporary
processing backlog. If you see an inconsistency, please wait a couple of minutes
for the queue to catch up before re-running the check.

#### Read/Write Speed

Write Speed isn't a primary concern because the engine uses a fast in-memory
buffer for real-time data, which is sufficient for most situations.

Read Speed is crucial, especially for building reports that pull from different
aggregated metrics.

Current Performance: The system aggregates about 1 million records in ~700ms-2s,
depending on query granularity.

Data Volume: Depending on the event's content, 100,000 events can produce
anywhere from 1 million to 15 million aggregated metrics.

This performance is for the core engine alone. The API has pre-caching options
for time-series data that provide a ~50x speed improvement. This depends on the
strategy you choose - for "cache all" like we have here, it would be near read
speed with huuge storage amplification factors, for partial hits (ordinal,
immutable data) the time would be affected by non-cached an O(n^2) in-memory
merge. Max cache entry size is limited by mongo's 16mb doc limit.

In [1]:
[Deno.env.get("OTEL_DENO"), Deno.env.get("OTEL_SERVICE_NAME")];

[ [32m"true"[39m, [32m"bench"[39m ]

## Run the benchmark

In [2]:
import { Engine } from "@/core/mod.ts";
import { BenchConfig, runBench } from "./agg_verification.ts";

const MONGODB_URI =
  "mongodb://root:example@localhost:27017/quant_bench?&authSource=admin";

const engine = new Engine({
  mongoUri: MONGODB_URI,
  bufferAgeMs: 1000 * 60 * 5,
  cache: {
    enabled: true,
    ttlSeconds: 600,
  },
});

const playground: BenchConfig = {
  numSources: 15,
  eventTypesPerSource: 200,
  numEvents: 100000,
  partition: {
    granularity: "second",
    length: 100000, // Number of spans per partion
  },
  attributions: Object.fromEntries(
    new Array(7).fill(0).map((_, i) => ["user", ["user" + String(i)]]),
  ),
  payloadSchema: {
    "price": "number",
    "volume": "number",
    "asset": "string",
    "market": "string",
    "time_of_day": "boolean",
  },
};

const result = await runBench(engine, playground);

Starting aggregator service...
Starting stats service for instance 829b9011-f8d7-4fae-a19b-7e963a171455
Starting LifecycleManager service...
LifecycleManager: Running retention policy checks...
---= Starting Bench =---
Setting up event sources and report...
Generating 100000 events...
  ... 10000 / 100000 events generated.
  ... 20000 / 100000 events generated.
  ... 30000 / 100000 events generated.
  ... 40000 / 100000 events generated.
  ... 50000 / 100000 events generated.
  ... 60000 / 100000 events generated.
  ... 70000 / 100000 events generated.
  ... 80000 / 100000 events generated.
  ... 90000 / 100000 events generated.
  ... 100000 / 100000 events generated.
Event generation finished in 582.77 seconds.


### Verify Event Writes

In [3]:
await engine.getTotalRawEventCount();

[33m100000[39m

### Verify Aggregated Metrics

*shoud be equal to groud truth

In [4]:
const engineReport = await result.getEngineReport("day");
for (const [key, value] of Object.entries(engineReport)) {
  const ok = result.groundTruthResults[key] === value;
  if (!ok) {
    console.log(
      key,
      ok ? "✔️ OK" : "❌ KO",
      value,
      ok ? "=" : "!=",
      result.groundTruthResults[key],
    );
  }
}

Query time: 5.546s
$boolean_groups ❌ KO 0[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Cached query:

In [5]:
const _ = await result.getEngineReport("day");


Query time: 0.073s


Aproximate number of metrics (depends on storage granularity):

In [None]:
const NUM_EVENTS = 100000;
const NUM_PAYLOAD = 5; // we have 5 attributes for metric counts
const NUM_PARTION = 9; // for second grannularity and 100000 spans we have 9 partions (got from db)

console.log(`~ ${100000 * (NUM_PAYLOAD / 2) * NUM_PARTION} aggregated metrics`);

~ 2250000 aggregated metrics
