Skip to content

Feature/1 machine learning feature pipeline app#2

Merged
gangtao merged 2 commits into
mainfrom
feature/1-machine-learning-feature-pipeline-app
May 15, 2026
Merged

Feature/1 machine learning feature pipeline app#2
gangtao merged 2 commits into
mainfrom
feature/1-machine-learning-feature-pipeline-app

Conversation

@gangtao
Copy link
Copy Markdown
Contributor

@gangtao gangtao commented May 15, 2026

add ml feature demo

@gangtao gangtao linked an issue May 15, 2026 that may be closed by this pull request
@gangtao gangtao merged commit 7bb4cdd into main May 15, 2026
@gangtao gangtao deleted the feature/1-machine-learning-feature-pipeline-app branch May 19, 2026 16:54
gangtao added a commit that referenced this pull request May 21, 2026
Closes #27.

Extends the existing alpha-101 app to also implement WorldQuant
Alpha #2:

    -1 * correlation(rank(delta(log(volume), 2)),
                     rank((close - open) / open),
                     6)

Shared pipeline changes (affecting Alpha #1 too, verified non-breaking):

- random_market_data gains a `volume` field (uniform [1, 100] per tick).
- market_data + mv_market_data plumb volume through.
- v_bars now exposes `open` (earliest price in bucket), `close`
  (latest price), and `volume` (sum of tick volumes). Alpha #1's
  downstream chain works unchanged.

New views for Alpha #2:

- v_features_2: per-stock per-bucket intraday_ret = (close - open) / open,
  log_vol_delta_2 = log(volume_t) - log(volume_{t-2}), and bucket-to-bucket
  returns (plumbed for the backtest).
- v_ranks_2: per-bucket cross-sectional rank of BOTH features (using the
  same mean-zero rank pattern as Alpha #1, computed once per bucket via
  group_array + array_sort + array_first_index for each feature).
- v_alpha_2: per-stock 6-bucket rolling Pearson correlation between the
  two rank series, negated. Manual covariance / variance computation via
  array_reduce('avg', array_map(...)) over lags(_, 0, 5).
- v_backtest_2: same shape as v_backtest, branched on `strategy` config
  (linear | sign).

Two new dashboards:

- "Alpha #2 Live" (volume per bucket, leaderboard, alpha over time)
- "Alpha #2 Backtest" (t-stat tile, summary, per-stock, portfolio PnL)

End-to-end live-verified: install of the rebuilt .tpapp produces both
alphas; v_alpha_2 values bounded in [-1, 1]; v_backtest_2 emits
sensible pnl on each (time, stock) row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gangtao added a commit that referenced this pull request May 21, 2026
With consistent _alpha_N naming, the dashboard SQL can template the
alpha suffix at runtime. Each dashboard now has a single "Alpha"
dropdown (values: 1, 2) writing to `filter_alpha`, and the panel
queries reference views as `alpha_101.v_alpha_{{filter_alpha}}` and
`alpha_101.v_backtest_alpha_{{filter_alpha}}`.

Before: 4 dashboards (Realtime Alpha 101, Alpha #1 Backtest, Alpha #2
Live, Alpha #2 Backtest).
After: 2 dashboards (Realtime Alpha 101, Alpha 101 Backtest), each
serving both alphas via the dropdown.

The Realtime dashboard also shows price AND volume side-by-side
(both filtered by the stock dropdown), since those are shared
market-data views that don't change between alphas.

Adding Alpha #N now requires only:
- a new DDL chain (v_features_alpha_N, ..., v_backtest_alpha_N)
- appending "N" to the Alpha dropdown's inlineValues

No new dashboard files. Verified end-to-end: 2 dashboards register
with 6 panels each; switching the Alpha dropdown updates the
leaderboard / alpha-over-time / all backtest panels in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gangtao added a commit that referenced this pull request May 21, 2026
gangtao added a commit that referenced this pull request May 21, 2026
The random source assigns each stock a fixed price band ([50, 80, 120,
…] with only ±0.5% tick noise), so cross-sectional rank of raw price-
level features is a constant per stock — bands never overlap. This
makes alphas #3, #4, and #6 degenerate (alpha_4 always = -1 in steady
state, etc.) — the math is correct, the data just has no rank
information to extract.

Documents which alphas DO produce meaningful varying signals on this
synthetic feed (#1, #2, #9, #12, #22, #41, #54 — the ones operating
on returns, deltas, or intra-bar quantities) versus which need real
market data to demo properly.

Caught when reviewer noticed alpha_4 = -1 everywhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine Learning Feature Pipeline App

1 participant