Centralized financial data layer for trading systems.
| Subsystem | Source | Status |
|---|---|---|
| prices — daily OHLCV | yfinance (primary) + Stooq CSV (IT fallback) | implemented |
| sentiment — superinvestor 13F | dataroma.com (HTML scraping) | implemented |
| viz — interactive Plotly charts | reads from prices via the public API | implemented |
| fundamentals | TBD (yfinance / FMP / EODHD) | scaffold only — see DESIGN.md |
| macro | TBD (FRED / ECB / ISTAT / Eurostat) | scaffold only — see DESIGN.md |
Architecture & design decisions: ARCHITECTURE.md.
uv sync --extra dev # install
uv run pytest # run the test suite
uv run python examples/external_system_usage.py # live end-to-end demoThe demo is incremental: the first run downloads ~5y of OHLCV for the seeded universe and scrapes one investor; subsequent runs are no-ops unless new bars or filings exist.
External trading systems must depend only on these symbols.
# Prices — subsystem 1
from global_data_storage.prices import (
get_ohlcv, # (ticker, start, end, freq="1d") -> pd.DataFrame
ensure_available, # (ticker) -> bool # idempotent gate
list_universe, # () -> list[str]
last_update, # (ticker) -> date | None
)
# Sentiment — subsystem 4
from global_data_storage.sentiment import (
get_holdings, # (investor) -> pd.DataFrame
get_recent_moves, # (since: date, action: str | None = None) -> pd.DataFrame
get_consensus, # (ticker) -> dict
list_investors, # () -> list[str]
)Tickers are canonical (no .MI suffix). The yfinance source rewrites
ENI → ENI.MI internally; the DB and API never expose the suffix.
The ensure_available pattern is the recommended gate — call it before
any read, and the underlying data will be fetched if missing or stale:
from global_data_storage.prices import ensure_available, get_ohlcv
if ensure_available("ENI"):
df = get_ohlcv("ENI", start, end)Editable hand-curated artefacts:
- config/universe.yaml — tracked tickers (5 IT seed + 11 SPDR; populate FTSE MIB via
scripts/seed_universe_it.py).
Environment variables (all optional; see .env.example):
| Var | Default | Purpose |
|---|---|---|
GDS_DB_PATH |
~/.global_data_storage/store.duckdb |
DuckDB store path |
GDS_HTTP_CACHE_PATH |
~/.global_data_storage/http_cache |
scraping cache |
GDS_LOG_FORMAT |
console |
console (dev) or json (prod) |
GDS_LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
GDS_CONTACT_EMAIL |
"" |
contact appended to scraping User-Agent |
Single DuckDB file, four schemas (prices, sentiment, fundamentals, macro).
Cross-domain joins are first-class:
-- "OHLCV of stocks Buffett bought in 2025Q4"
SELECT p.*
FROM prices.equity_ohlcv p
JOIN sentiment.moves m ON m.ticker = p.ticker
WHERE m.investor_slug = 'BRK'
AND m.action IN ('BUY', 'ADD')
AND m.period_quarter = '2025Q4'
AND p.date >= today() - INTERVAL 90 DAY;External trading systems should open the DB read-only:
from global_data_storage.storage import read_only
with read_only() as con:
df = con.execute("SELECT ...").fetchdf()config/ # universe.yaml (hand-edited)
src/global_data_storage/
prices/ # subsystem 1 — full
sentiment/ # subsystem 4 — full
fundamentals/ # subsystem 2 — DESIGN.md only
macro/ # subsystem 3 — DESIGN.md only
storage/ # DuckDB connection + schema
common/ # logging, config, retry, http (cache + politeness)
scripts/
seed_universe_it.py # one-shot Borsa Italiana → YAML
backup.py # cold-copy snapshot of the DuckDB file
tests/
fixtures/sentiment/ # captured Dataroma HTML for parser tests
examples/
external_system_usage.py # canonical consumer pattern
Backup
uv run python scripts/backup.py # default: ~/.global_data_storage/backups
uv run python scripts/backup.py --keep 30 # keep last 30 snapshotsRefreshing data
Prices are refreshed lazily by ensure_available(ticker). Sentiment is
manual on-demand (13F filings are quarterly):
from global_data_storage.sentiment.ingest import refresh_all
refresh_all() # ~80 investors x 2 pages each, politeness-boundChecking what's new
SELECT ticker, last_date, status, rows_added
FROM prices.update_log
ORDER BY last_run_at DESC
LIMIT 20;- Single-user, single-machine, batch ingestion. DuckDB is single-writer.
- External trading systems are read-only consumers — open the DB with
read_only=True. - No intraday support today; the
freqcolumn onequity_ohlcvis intraday-ready, value is always'1d'. - Scraping politeness: 2.5s ± 0.5s per request, 24h on-disk cache, robots.txt enforced. User-Agent identifies the project.
Local-only interactive charts (Plotly + kaleido). No web server. Each call
returns a fresh go.Figure; HTML and PNG land in output/viz/.
from global_data_storage.viz import quick
# Single ticker, candles + indicators (range selector, range slider, hover OHLC)
fig = quick.candles("ENI", period="1Y", indicators=["sma_50", "sma_200", "bbands"])
fig.show()
# Multi-ticker comparison (default: cumulative log returns — honest on long horizons)
fig = quick.compare(["ENI", "ENEL", "ISP", "UCG"], mode="log_returns", period="2Y")
# Cross-ticker analytics
fig = quick.correlation_heatmap(["XLK","XLF","XLE","XLV","XLI","XLY","XLP","XLU","XLB","XLRE","XLC"], period="1Y")
fig = quick.pair_scatter("XLK", "XLY", period="1Y") # OLS line + R² annotation
fig = quick.rolling_correlation("XLK", "XLY", window=60) # rolling Pearson on log returns
# Sub-panel indicators (RSI, MACD, ATR)
fig = quick.overlay("ENI", indicators=["rsi_14", "macd"], period="1Y")
# Programmatic annotations (survive HTML/PNG round-trip; live drawn shapes don't)
quick.add_horizontal_level(fig, price=22.5, label="resistance")
quick.add_trendline(fig, start=("2025-01-15", 13.0), end=("2026-04-30", 24.0), label="uptrend")
quick.add_vertical_event(fig, when="2025-04-07", label="vol spike")
quick.add_fibonacci(fig, high_date="2026-04-15", high_price=25.0,
low_date="2026-05-08", low_price=22.5)
# Slide-ready
quick.apply_presentation_theme(fig, title="ENI", subtitle="1Y daily")
quick.save_html(fig, "eni_1y") # ~80 KB, opens offline (Plotly via CDN)
quick.save_image(fig, "eni_1y_slide") # PNG @ 1920x1080, scale=2Design choices (palette Okabe-Ito, 16:9 default, footer + theme constants) live in VIZ_DESIGN.md. Five end-to-end examples in examples/viz_examples.py.
Local Flask app: refresh prices and generate any chart in the viz module
without writing a Python script. Loopback only (127.0.0.1); the browser
opens automatically.
uv run python -m global_data_storage.dashboard # or: gds-dashboardOn Windows: double-click dashboard.bat instead — it sets up
PATH, launches the server, and your browser opens automatically.
What it gives you:
- Sidebar: every active ticker from
config/universe.yamlwith itslast_updatedate. Tick the ones you want, click Refresh selected — the dashboard runsprices.ensure_available(...)for each (incremental download, no duplicate work). - Tabs for every chart type: Candles, Compare, Correlation heatmap,
Pair scatter, Rolling correlation, Overlay. Each form picks tickers
from the universe (single dropdown or multi-select); you can't request
a ticker that isn't in
universe.yaml— to add one, edit the YAML and reload the page. - Compare tab: pick ≥ 2 tickers, choose mode (log returns / normalized / rolling / drawdown), period — the chart re-renders inline. This is the part the static HTML exports don't give you.
- Result is rendered as a Plotly chart inside an iframe; pop it out for
a full window or use
quick.save_html/save_imagefrom a script for slide exports.
uv (deps), ruff (lint+format), mypy --strict (types), pytest (tests). All wired in pyproject.toml.