From 3658eba8ff0f347229c5fcf52512ca23a0fc590a Mon Sep 17 00:00:00 2001 From: iuwqyir Date: Mon, 28 Jul 2025 15:11:25 +0300 Subject: [PATCH] update readme and config examples --- README.md | 742 ++++++++++++++++++------------------ configs/config.example.yml | 239 ++++++++++-- configs/secrets.example.yml | 83 +++- 3 files changed, 646 insertions(+), 418 deletions(-) diff --git a/README.md b/README.md index 2779fcb..5f7589f 100644 --- a/README.md +++ b/README.md @@ -1,464 +1,444 @@ # Insight -Insight is a blockchain data processing tool designed to fetch, process, and store on-chain data. It provides a solution for indexing blockchain data, facilitating efficient querying of transactions and logs through a simple API. +**Insight** is a high-performance, modular blockchain indexer and data API for EVM chains. It fetches, processes, and stores on-chain dataβ€”making it easy to query blocks, transactions, logs, token balances, and more via a robust HTTP API. -Insight's architecture consists of five main components: -architecture-diagram +## πŸš€ Getting Started -1. **Poller**: The Poller is responsible for continuously fetching new blocks from the configured RPC and processing them. It uses multiple worker goroutines to concurrently retrieve block data, handles successful and failed results, and stores the processed block data and any failures. - -2. **Worker**: The Worker is responsible for processing batches of block numbers, fetching block data, logs, and traces (if supported) from the configured RPC. It divides the work into chunks, processes them concurrently, and returns the results as a collection of WorkerResult structures, which contain the block data, transactions, logs, and traces for each processed block. +**Quickstart (Local Development):** -3. **Committer**: The Committer is responsible for periodically moving data from the staging storage to the main storage. It ensures that blocks are committed sequentially, handling any gaps in the data, and updates various metrics while performing concurrent inserts of blocks, logs, transactions, and traces into the main storage. - -4. **Failure Recoverer**: The FailureRecoverer is responsible for recovering from block processing failures. It periodically checks for failed blocks, attempts to reprocess them using a worker, and either removes successfully processed blocks from the failure list or updates the failure count for blocks that continue to fail. - -5. **Orchestrator**: The Orchestrator is responsible for coordinating and managing the poller, failure recoverer, and committer. It initializes these components based on configuration settings and starts them concurrently, ensuring they run independently while waiting for all of them to complete their tasks. - -Insight's modular architecture and configuration options allow for adaptation to various EVM chains and use cases. - - -## Getting started - -### Pre-requites -1. Golang v1.23 -2. Clickhouse instance (`docker-compose` includes it for local development) - -### Usage -To run insight and the associated API, follow these steps: -1. Clone the repo -``` +```bash +# 1. Clone the repo git clone https://github.com/thirdweb-dev/insight.git -``` -2. ~~Run the migration scripts here~~ -2. Apply the migrations from [here](internal/tools/) -3. Create `config.yml` from `config.example.yml` and set the values by following the [config guide](#supported-configurations) -4. Create `secrets.yml` from `secrects.example.yml` and set the needed credentials -5. Build an instance -``` -go build -o main -tags=production -``` -6. Run insight -``` -./main orchestrator -``` -7. Run the Data API -``` -./main api -``` -8. API is available at `http://localhost:3000` +cd insight -## Metrics +# 2. Copy example configs and secrets +cp configs/config.example.yml configs/config.yml +cp configs/secrets.example.yml configs/secrets.yml -Insight node exposes Prometheus metrics at `http://localhost:2112/metrics`. Here the exposed metrics [metrics.go](https://github.com/thirdweb-dev/insight/blob/main/internal/metrics/metrics.go) +# 3. (Optional) Start dependencies with Docker Compose +docker-compose up -d clickhouse -## Configuration +# 4. Apply ClickHouse migrations +cat internal/tools/*.sql | docker exec -i clickhouse-client --user admin --password password -You can configure the application in 3 ways. -The order of priority is -1. Command line arguments -2. Environment variables -3. Configuration files - -### Configuration using command line arguments -You can configure the application using command line arguments. -For example to configure the `rpc.url` configuration, you can use the `--rpc-url` command line argument. -Only select arguments are implemented. More on this below. - -### Configuration using environment variables -You can also configure the application using environment variables. You can configure any configuration in the `config.yml` file using environment variables by replacing the `.` in the configuration with `_` and making the variable name uppercase. -For example to configure the `rpc.url` configuration to `https://my-rpc.com`, you can set the `RPC_URL` environment variable to `https://my-rpc.com`. - -### Configuration using configuration files -The default configuration should live in `configs/config.yml`. Copy `configs/config.example.yml` to get started. -Or you can use the `--config` flag to specify a different configuration file. -If you want to add secrets to the configuration file, you can copy `configs/secrets.example.yml` to `configs/secrets.yml` and add the secrets. They won't be committed to the repository or the built image. - -### Supported configurations: - -#### RPC URL -URL to use as the RPC client. +# 5. Build and run Insight +go build -o main -tags=production +./main orchestrator # Starts the indexer +./main api # Starts the API server -cmd: `--rpc-url` -env: `RPC_URL` -yaml: -```yaml -rpc: - url: https://rpc.com +# 6. Access the API +# Default: http://localhost:3000 ``` -#### RPC Blocks Per Request -How many blocks at a time to fetch from the RPC. Default is 1000. +--- -cmd: `--rpc-blocks-blocksPerRequest` -env: `RPC_BLOCKS_BLOCKSPERREQUEST` -yaml: -```yaml -rpc: - blocks: - blocksPerRequest: 1000 -``` +## πŸ— How It Works -#### RPC Blocks Batch Delay -Milliseconds to wait between batches of blocks when fetching from the RPC. Default is 0. +Insight's architecture consists of five main components that work together to continuously index blockchain data: -cmd: `--rpc-blocks-batchDelay` -env: `RPC_BLOCKS_BATCHDELAY` -yaml: -```yaml -rpc: - blocks: - batchDelay: 100 -``` +### 1. **Poller** +The Poller continuously fetches new blocks from the configured RPC endpoint. It uses multiple worker goroutines to concurrently retrieve block data, handles successful and failed results, and stores the processed block data and any failures in staging storage. -#### RPC Logs Blocks Per Request -How many blocks at a time to query logs for from the RPC. Default is 100. -Has no effect if it's larger than RPC blocks per request. +### 2. **Worker** +The Worker processes batches of block numbers, fetching block data, logs, and traces (if supported) from the configured RPC. It divides the work into chunks, processes them concurrently, and returns the results as a collection of WorkerResult structures containing block data, transactions, logs, and traces for each processed block. -cmd: `--rpc-logs-blocksPerRequest` -env: `RPC_LOGS_BLOCKSPERREQUEST` -yaml: -```yaml -rpc: - logs: - blocksPerRequest: 100 -``` +### 3. **Committer** +The Committer periodically moves data from staging storage to main storage. It ensures blocks are committed sequentially, handling any gaps in the data, and updates various metrics while performing concurrent inserts of blocks, logs, transactions, and traces into the main storage. -#### RPC Logs Batch Delay -Milliseconds to wait between batches of logs when fetching from the RPC. Default is 0. +### 4. **Failure Recoverer** +The FailureRecoverer recovers from block processing failures. It periodically checks for failed blocks, attempts to reprocess them using a worker, and either removes successfully processed blocks from the failure list or updates the failure count for blocks that continue to fail. -cmd: `--rpc-logs-batchDelay` -env: `RPC_LOGS_BATCHDELAY` -yaml: -```yaml -rpc: - logs: - batchDelay: 100 -``` +### 5. **Orchestrator** +The Orchestrator coordinates and manages the poller, failure recoverer, and committer. It initializes these components based on configuration settings and starts them concurrently, ensuring they run independently while waiting for all of them to complete their tasks. -#### RPC Block Receipts Enabled -If this is `true`, will use `eth_getBlockReceipts` instead of `eth_getLogs` if the RPC supports it. Allows getting receipt data for transactions, but is not supported by every RPC. Default is `false`. +### Data Flow +1. **Polling**: The Poller continuously checks for new blocks on the blockchain +2. **Processing**: Workers fetch and process block data, transactions, logs, and traces +3. **Staging**: Processed data is stored in staging storage for validation +4. **Commitment**: The Committer moves validated data to main storage +5. **Recovery**: Failed blocks are retried by the Failure Recoverer +6. **API**: The HTTP API serves queries from the main storage -cmd: `--rpc-block-receipts-enabled` -env: `RPC_BLOCKRECEIPTS_ENABLED` -yaml: -```yaml -rpc: - blockReceipts: - enabled: true -``` +### Work Modes +Insight operates in two distinct work modes that automatically adapt based on how far behind the chain head the indexer is: -#### RPC Block Receipts Blocks Per Request -How many blocks at a time to fetch block receipts for from the RPC. Default is 250. -Has no effect if it's larger than RPC blocks per request. +**Backfill Mode** (Catching Up): +- Used when the indexer is significantly behind the latest block +- Processes blocks in large batches for maximum throughput +- Optimized for an error-free indexing process over speed +- Automatically switches to live mode when caught up -cmd: `--rpc-block-receipts-blocksPerRequest` -env: `RPC_BLOCKRECEIPTS_BLOCKSPERREQUEST` -yaml: -```yaml -rpc: - blockReceipts: - blocksPerRequest: 100 -``` +**Live Mode** (Real-time): +- Used when the indexer is close to the chain head (within ~500 blocks by default) +- Processes blocks as they arrive with minimal latency +- Optimized for real-time data availability +- Switches back to backfill mode if falling behind -#### RPC Block Receipts Batch Delay -Milliseconds to wait between batches of block receipts when fetching from the RPC. Default is 0. +The work mode threshold and check interval are configurable via `workMode.liveModeThreshold` and `workMode.checkIntervalMinutes` settings. -cmd: `--rpc-block-receipts-batchDelay` -env: `RPC_BLOCKRECEIPTS_BATCHDELAY` -yaml: -```yaml -rpc: - blockReceipts: - batchDelay: 100 -``` +This modular architecture allows for adaptation to various EVM chains and use cases, with configurable batch sizes, delays, and processing strategies. -#### RPC Traces Enabled -Whether to enable fetching traces from the RPC. Default is `true`, but it will try to detect if the RPC supports traces automatically. +--- -cmd: `--rpc-traces-enabled` -env: `RPC_TRACES_ENABLED` -yaml: -```yaml -rpc: - traces: - enabled: true -``` +## βš™οΈ Installation / Setup -#### RPC Traces Blocks Per Request -How many blocks at a time to fetch traces for from the RPC. Default is 100. -Has no effect if it's larger than RPC blocks per request. +### Prerequisites -cmd: `--rpc-traces-blocksPerRequest` -env: `RPC_TRACES_BLOCKSPERREQUEST` -yaml: -```yaml -rpc: - traces: - blocksPerRequest: 100 -``` +- **Go** 1.23+ +- **ClickHouse** database (Docker Compose included) +- (Optional) **Redis**, **Kafka**, **Prometheus**, **Grafana** for advanced features -#### RPC Traces Batch Delay -Milliseconds to wait between batches of traces when fetching from the RPC. Default is 0. +### Environment Variables & Secrets -cmd: `--rpc-traces-batchDelay` -env: `RPC_TRACES_BATCHDELAY` -yaml: -```yaml -rpc: - traces: - batchDelay: 100 -``` +Insight supports configuration via environment variables, which is especially useful for containerized deployments and CI/CD pipelines. -#### Log Level -Log level for the logger. Default is `warn`. +**Environment Variable Naming Convention:** +- Use uppercase letters and underscores +- Nested YAML paths become underscore-separated variables +- Example: `rpc.url` becomes `RPC_URL` +- Example: `storage.main.clickhouse.host` becomes `STORAGE_MAIN_CLICKHOUSE_HOST` -cmd: `--log-level` -env: `LOG_LEVEL` -yaml: -```yaml -log: - level: debug -``` - -#### Prettify logs -Whether to print logs in a prettified format. Affects performance. Default is `false`. +**Common Environment Variables:** -cmd: `--log-prettify` -env: `LOG_PRETTIFY` -yaml: -```yaml -log: - prettify: true +**RPC Configuration:** +```bash +RPC_URL=https://1.rpc.thirdweb.com/your-client-id +RPC_CHAIN_ID=1 +RPC_BLOCKS_BLOCKS_PER_REQUEST=500 +RPC_BLOCKS_BATCH_DELAY=100 +RPC_LOGS_BLOCKS_PER_REQUEST=250 +RPC_LOGS_BATCH_DELAY=100 +RPC_TRACES_ENABLED=false ``` -#### Poller -Whether to enable the poller. Default is `true`. - -cmd: `--poller-enabled` -env: `POLLER_ENABLED` -yaml: -```yaml -poller: - enabled: true +**Storage Configuration:** +```bash +STORAGE_MAIN_CLICKHOUSE_HOST=localhost +STORAGE_MAIN_CLICKHOUSE_PORT=9000 +STORAGE_MAIN_CLICKHOUSE_USERNAME=admin +STORAGE_MAIN_CLICKHOUSE_PASSWORD=your-password +STORAGE_MAIN_CLICKHOUSE_DATABASE=main +STORAGE_MAIN_CLICKHOUSE_DISABLE_TLS=false ``` -#### Poller Interval -Poller trigger interval in milliseconds. Default is `1000`. - -cmd: `--poller-interval` -env: `POLLER_INTERVAL` -yaml: -```yaml -poller: - interval: 3000 +**API Configuration:** +```bash +API_HOST=0.0.0.0 +API_PORT=3000 +API_THIRDWEB_CLIENT_ID=your-client-id +API_BASIC_AUTH_USERNAME=admin +API_BASIC_AUTH_PASSWORD=your-api-password ``` -#### Poller Blocks Per Poll -How many blocks to poll each interval. Default is `10`. +**Logging Configuration:** +```bash +LOG_LEVEL=info +LOG_PRETTIFY=false +``` -cmd: `--poller-blocks-per-poll` -env: `POLLER_BLOCKSPERPOLL` -yaml: -```yaml -poller: - blocksPerPoll: 3 +**Poller Configuration:** +```bash +POLLER_ENABLED=true +POLLER_INTERVAL=1000 +POLLER_BLOCKS_PER_POLL=500 +POLLER_FROM_BLOCK=0 ``` -#### Poller From Block -From which block to start polling. Default is `0`. +**Complete Example:** +```bash +# Set all configuration via environment variables +export RPC_URL="https://1.rpc.thirdweb.com/your-client-id" +export RPC_CHAIN_ID=1 +export STORAGE_MAIN_CLICKHOUSE_HOST="your-clickhouse-host" +export STORAGE_MAIN_CLICKHOUSE_PASSWORD="your-password" +export API_BASIC_AUTH_USERNAME="admin" +export API_BASIC_AUTH_PASSWORD="your-api-password" +export LOG_LEVEL="info" -cmd: `--poller-from-block` -env: `POLLER_FROMBLOCK` -yaml: -```yaml -poller: - fromBlock: 20000000 +# Run without config files +./main orchestrator +./main api ``` -#### Poller Force Start Block -From which block to start polling. Default is `false`. +**Secrets Management:** +- For sensitive credentials, you can use environment variables instead of `configs/secrets.yml` +- Environment variables take precedence over config files +- See `configs/secrets.example.yml` for the complete structure -cmd: `--poller-force-from-block` -env: `POLLER_FORCEFROMBLOCK` -yaml: -```yaml -poller: - forceFromBlock: false -``` +### Docker -#### Poller Until Block -Until which block to poll. If not set, it will poll until the latest block. +- `docker-compose.yml` provides ClickHouse, Redis, Prometheus, and Grafana for local development. +- Exposes: + - ClickHouse: `localhost:8123` (web UI), `localhost:9440` (native) + - Prometheus: `localhost:9090` + - Grafana: `localhost:4000` + - Redis: `localhost:6379` -cmd: `--poller-until-block` -env: `POLLER_UNTILBLOCK` -yaml: -```yaml -poller: - untilBlock: 20000010 -``` +### Database Migrations -#### Committer -Whether to enable the committer. Default is `true`. +- SQL migration scripts are in `internal/tools/`. +- Apply them to your ClickHouse instance before running the indexer. -cmd: `--committer-enabled` -env: `COMMITTER_ENABLED` -yaml: -```yaml -committer: - enabled: true -``` +--- -#### Committer Interval -Committer trigger interval in milliseconds. Default is `250`. +## πŸ’‘ Usage -cmd: `--committer-interval` -env: `COMMITTER_INTERVAL` -yaml: -```yaml -committer: - interval: 3000 -``` +### CLI Commands -#### Committer Blocks Per Commit -How many blocks to commit each interval. Default is `10`. +- **Indexer (Orchestrator):** + `./main orchestrator` + Starts the block poller, committer, and failure recovery. -cmd: `--committer-blocks-per-commit` -env: `COMMITTER_BLOCKSPERCOMMIT` -yaml: -```yaml -committer: - blocksPerCommit: 1000 -``` +- **API Server:** + `./main api` + Serves the HTTP API at `http://localhost:3000`. -#### Committer From Block -From which block to start committing. Default is `0`. +- **Validation & Utilities:** + Additional commands: `validate`, `validate-and-fix`, `migrate-valid` (see `cmd/`). -cmd: `--committer-from-block` -env: `COMMITTER_FROMBLOCK` -yaml: -```yaml -committer: - fromBlock: 20000000 -``` +### API Endpoints -#### Reorg Handler -Whether to enable the reorg handler. Default is `true`. +All endpoints require HTTP Basic Auth (see `configs/secrets.yml`). -cmd: `--reorgHandler-enabled` -env: `REORGHANDLER_ENABLED` -yaml: -```yaml -reorgHandler: - enabled: true -``` +- **Blocks:** + `GET /{chainId}/blocks` + Query blocks with filters, sorting, pagination, and aggregation. -#### Reorg Handler Interval -Reorg handler trigger interval in milliseconds. Default is `1000`. +- **Transactions:** + `GET /{chainId}/transactions` + `GET /{chainId}/transactions/{to}` + `GET /{chainId}/transactions/{to}/{signature}` + `GET /{chainId}/wallet-transactions/{wallet_address}` -cmd: `--reorgHandler-interval` -env: `REORGHANDLER_INTERVAL` -yaml: -```yaml -reorgHandler: - interval: 3000 -``` +- **Logs/Events:** + `GET /{chainId}/events` + `GET /{chainId}/events/{contract}` + `GET /{chainId}/events/{contract}/{signature}` -#### Reorg Handler Blocks Per Scan -How many blocks to scan for reorgs. Default is `100`. +- **Token Balances & Holders:** + `GET /{chainId}/balances/{owner}/{type}` + `GET /{chainId}/holders/{address}` + `GET /{chainId}/tokens/{address}` -cmd: `--reorgHandler-blocks-per-scan` -env: `REORGHANDLER_BLOCKSPERSCAN` -yaml: -```yaml -reorgHandler: - blocksPerScan: 1000 -``` +- **Token Transfers:** + `GET /{chainId}/transfers` -#### Reorg Handler From Block -From which block to start scanning for reorgs. Default is `0`. +- **Search:** + `GET /{chainId}/search/{input}` + Search by block number, hash, address, or function signature. -cmd: `--reorgHandler-from-block` -env: `REORGHANDLER_FROMBLOCK` -yaml: -```yaml -reorgHandler: - fromBlock: 20000000 -``` +- **Health:** + `GET /health` -#### Reorg Handler Force From Block -Whether to force the reorg handler to start from the block specified in `reorgHandler-from-block`. Default is `false`. +- **Swagger/OpenAPI:** + `GET /swagger/index.html` + `GET /openapi.json` -cmd: `--reorgHandler-force-from-block` -env: `REORGHANDLER_FORCEFROMBLOCK` -yaml: -```yaml -reorgHandler: - forceFromBlock: true -``` +See the [OpenAPI spec](docs/swagger.yaml) for full details. -#### Failure Recoverer -Whether to enable the failure recoverer. Default is `true`. +--- -cmd: `--failure-recoverer-enabled` -env: `FAILURERECOVERER_ENABLED` -yaml: -```yaml -failureRecoverer: - enabled: true -``` +## πŸ›  Configuration -#### Failure Recoverer Interval -Failure recoverer trigger interval in milliseconds. Default is `1000`. +Insight supports configuration via multiple methods with the following priority order: -cmd: `--failure-recoverer-interval` -env: `FAILURERECOVERER_INTERVAL` -yaml: -```yaml -failureRecoverer: - interval: 3000 -``` +1. **Command-line flags** (highest priority) +2. **Environment variables** +3. **YAML config files** (`configs/config.yml`) -#### Failure Recoverer Blocks Per Run -How many blocks to recover each interval. Default is `10`. +### Configuration Methods -cmd: `--failure-recoverer-blocks-per-run` -env: `FAILURERECOVERER_BLOCKSPERRUN` -yaml: +**1. YAML Config Files (Recommended for Development):** ```yaml -failureRecoverer: - blocksPerRun: 100 -``` +# configs/config.yml +rpc: + url: https://1.rpc.thirdweb.com/your-thirdweb-client-id + blocks: + blocksPerRequest: 1000 + batchDelay: 0 + logs: + blocksPerRequest: 400 + batchDelay: 100 + traces: + enabled: true + blocksPerRequest: 200 + batchDelay: 100 -#### Storage -This application has 3 kinds of storage: `main`, `staging` and `orchestrator`. -Each of them takes similar configuration, slightly depending on the driver you want to use. -There are no defaults, so this needs to be configured. +log: + level: debug + pretty: true -For example, this can be a part of `config.yml`: -```yaml -storage: - main: - clickhouse: - port: 3000 - database: "base" - disableTLS: true - staging: - clickhouse: - port: 3000 - database: "staging" -``` -With the corresponding `secrets.yml`: -```yaml storage: main: clickhouse: host: localhost - user: admin - password: password - staging: - clickhouse: - host: localhost + port: 9440 + database: "default" username: admin password: password -``` \ No newline at end of file + disableTLS: true +``` + +**2. Environment Variables (Recommended for Production):** +```bash +# Set configuration via environment variables +export RPC_URL="https://1.rpc.thirdweb.com/your-client-id" +export RPC_BLOCKS_BLOCKS_PER_REQUEST=1000 +export RPC_BLOCKS_BATCH_DELAY=0 +export LOG_LEVEL="debug" +export LOG_PRETTIFY=true +export STORAGE_MAIN_CLICKHOUSE_HOST="localhost" +export STORAGE_MAIN_CLICKHOUSE_PASSWORD="your-password" +``` + +**3. Command-line Flags:** +```bash +# Override specific settings via CLI flags +./main orchestrator --rpc-url="https://1.rpc.thirdweb.com/your-client-id" --log-level=info +./main api --api-host=0.0.0.0 --api-port=8080 +``` + +### Environment Variable Reference + +**RPC Configuration:** +| YAML Path | Environment Variable | Description | Default | +|-----------|---------------------|-------------|---------| +| `rpc.url` | `RPC_URL` | RPC endpoint URL | - | +| `rpc.chainId` | `RPC_CHAIN_ID` | Blockchain network ID | 1 | +| `rpc.blocks.blocksPerRequest` | `RPC_BLOCKS_BLOCKS_PER_REQUEST` | Blocks per RPC request | 500 | +| `rpc.blocks.batchDelay` | `RPC_BLOCKS_BATCH_DELAY` | Delay between batches (ms) | 100 | +| `rpc.logs.blocksPerRequest` | `RPC_LOGS_BLOCKS_PER_REQUEST` | Logs per RPC request | 250 | +| `rpc.logs.batchDelay` | `RPC_LOGS_BATCH_DELAY` | Log batch delay (ms) | 100 | +| `rpc.traces.enabled` | `RPC_TRACES_ENABLED` | Enable trace fetching | false | +| `rpc.traces.blocksPerRequest` | `RPC_TRACES_BLOCKS_PER_REQUEST` | Traces per RPC request | 500 | + +**Storage Configuration:** +| YAML Path | Environment Variable | Description | Default | +|-----------|---------------------|-------------|---------| +| `storage.main.clickhouse.host` | `STORAGE_MAIN_CLICKHOUSE_HOST` | ClickHouse host | localhost | +| `storage.main.clickhouse.port` | `STORAGE_MAIN_CLICKHOUSE_PORT` | ClickHouse port | 9000 | +| `storage.main.clickhouse.username` | `STORAGE_MAIN_CLICKHOUSE_USERNAME` | Database username | admin | +| `storage.main.clickhouse.password` | `STORAGE_MAIN_CLICKHOUSE_PASSWORD` | Database password | password | +| `storage.main.clickhouse.database` | `STORAGE_MAIN_CLICKHOUSE_DATABASE` | Database name | main | +| `storage.main.clickhouse.disableTLS` | `STORAGE_MAIN_CLICKHOUSE_DISABLE_TLS` | Disable TLS | false | + +**API Configuration:** +| YAML Path | Environment Variable | Description | Default | +|-----------|---------------------|-------------|---------| +| `api.host` | `API_HOST` | API server host | localhost | +| `api.port` | `API_PORT` | API server port | 3000 | +| `api.thirdweb.clientId` | `API_THIRDWEB_CLIENT_ID` | ThirdWeb client ID | - | +| `api.basicAuth.username` | `API_BASIC_AUTH_USERNAME` | API username | admin | +| `api.basicAuth.password` | `API_BASIC_AUTH_PASSWORD` | API password | admin | + +**Logging Configuration:** +| YAML Path | Environment Variable | Description | Default | +|-----------|---------------------|-------------|---------| +| `log.level` | `LOG_LEVEL` | Log level (debug, info, warn, error) | debug | +| `log.prettify` | `LOG_PRETTIFY` | Pretty print logs | true | + +**Poller Configuration:** +| YAML Path | Environment Variable | Description | Default | +|-----------|---------------------|-------------|---------| +| `poller.enabled` | `POLLER_ENABLED` | Enable block polling | true | +| `poller.interval` | `POLLER_INTERVAL` | Polling interval (ms) | 1000 | +| `poller.blocksPerPoll` | `POLLER_BLOCKS_PER_POLL` | Blocks per poll | 500 | +| `poller.fromBlock` | `POLLER_FROM_BLOCK` | Starting block number | 0 | + +### Configuration Best Practices + +**Development:** +- Use YAML config files for easy configuration management +- Keep sensitive data in `configs/secrets.yml` (gitignored) + +**Production:** +- Use environment variables for security and flexibility +- Set sensitive credentials via environment variables +- Use container orchestration secrets management + +**Docker/Kubernetes:** +```yaml +# docker-compose.yml example +environment: + - RPC_URL=https://1.rpc.thirdweb.com/your-client-id + - STORAGE_MAIN_CLICKHOUSE_HOST=clickhouse + - STORAGE_MAIN_CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD} + - API_BASIC_AUTH_PASSWORD=${API_PASSWORD} +``` + +- See `configs/config.example.yml` and `configs/secrets.example.yml` for complete configuration options. +- All config options can be overridden by environment variables or CLI flags. + +--- + +## πŸ“ Project Structure + +``` +insight/ + api/ # API layer + cmd/ # CLI commands (orchestrator, api, validation, etc.) + configs/ # Config and secrets templates + docs/ # Swagger/OpenAPI docs + internal/ + common/ # Core blockchain models/utilities + handlers/ # HTTP API handlers + log/ # Logging setup + metrics/ # Prometheus metrics + middleware/ # API middleware (auth, CORS, logging) + orchestrator/# Indexer orchestration logic + publisher/ # Kafka publisher (optional) + rpc/ # RPC client logic + storage/ # ClickHouse connectors + tools/ # SQL migration scripts + validation/ # Data validation logic + worker/ # Block processing workers + test/ # Mocks and test helpers + main.go # Entrypoint + Dockerfile # Container build + docker-compose.yml +``` + +--- + +## 🀝 Contributing + +1. **Fork & clone** the repo. +2. **Install dependencies:** + `go mod download` +3. **Set up local ClickHouse:** + `docker-compose up -d clickhouse` +4. **Apply migrations** (see above). +5. **Run tests:** + `go test ./...` +6. **Open a PR** with your changes! + +--- + +## πŸ§ͺ Testing + +- All core logic is covered by unit tests (see `test/` and `internal/handlers/*_test.go`). +- Run the full suite: + ```bash + go test ./... + ``` + +--- + +## πŸ“š Documentation + +- **API Reference:** + - [Swagger UI](http://localhost:3000/swagger/index.html) (when running) + - [OpenAPI Spec](docs/swagger.yaml) +- **Metrics:** + - Prometheus metrics at [http://localhost:2112/metrics](http://localhost:2112/metrics) + - See `internal/metrics/metrics.go` for all exposed metrics. +- **Architecture & Design:** + - See the top of this README and code comments for architectural details. + +--- + +**License:** Apache 2.0 + +--- + +Let me know if you want this written to your `README.md` or if you want to further tailor any section! \ No newline at end of file diff --git a/configs/config.example.yml b/configs/config.example.yml index 3a66b59..33c46d6 100644 --- a/configs/config.example.yml +++ b/configs/config.example.yml @@ -1,59 +1,246 @@ +# RPC Configuration - Settings for connecting to blockchain RPC endpoints rpc: + # URL of the blockchain RPC endpoint to connect to + url: https://1.rpc.thirdweb.com + + # Block fetching configuration blocks: - blocksPerRequest: 1000 - batchDelay: 0 + # Number of blocks to request in a single RPC call + blocksPerRequest: 500 + # Delay in milliseconds between batch requests to avoid rate limiting + batchDelay: 100 + + # Log fetching configuration logs: - blocksPerRequest: 400 + # Number of blocks to fetch logs for in a single request + blocksPerRequest: 250 + # Delay in milliseconds between log batch requests batchDelay: 100 + + # Block receipts fetching configuration blockReceipts: + # Whether to fetch block receipts (transaction receipts for all transactions in a block) enabled: true - blocksPerRequest: 500 + # Number of blocks to fetch receipts for in a single request + blocksPerRequest: 250 + # Delay in milliseconds between receipt batch requests batchDelay: 100 + + # Trace fetching configuration (for debugging and detailed transaction analysis) traces: - enabled: true - blocksPerRequest: 200 + # Whether to fetch transaction traces (disabled by default due to high resource usage) + enabled: false + # Number of blocks to fetch traces for in a single request + blocksPerRequest: 500 + # Delay in milliseconds between trace batch requests batchDelay: 100 + + # Blockchain network identifier (1 = Ethereum mainnet). Optional. If RPC URL is omitted, will use thirdweb RPC with this chainId. + chainId: 1 +# Logging configuration log: + # Log level: debug, info, warn, error level: debug - pretty: true + # Whether to format logs in a human-readable way (vs JSON) + prettify: true +# Poller configuration - Controls how blocks are fetched from the blockchain poller: + # Whether the poller is enabled enabled: true - interval: 3000 - blocksPerPoll: 10000 + # Interval in milliseconds between polling cycles + interval: 1000 + # Number of blocks to fetch in each polling cycle + blocksPerPoll: 500 + # Starting block number to begin polling from (0 = start from latest) + fromBlock: 0 + # Whether to force starting from fromBlock even if a cursor exists + forceFromBlock: false + # Ending block number (0 = poll indefinitely) + untilBlock: 0 + # Number of parallel poller instances to run + parallelPollers: 1 +# Committer configuration - Controls how data is committed to storage committer: + # Whether the committer is enabled enabled: true + # Interval in milliseconds between commit cycles interval: 1000 - blocksPerCommit: 10000 + # Number of blocks to commit in each cycle + blocksPerCommit: 1000 + # Starting block number for commits (0 = start from latest) + fromBlock: 0 +# Failure recovery configuration - Handles failed block processing failureRecoverer: - enabled: true + # Whether failure recovery is enabled + enabled: false + # Interval in milliseconds between recovery attempts interval: 10000 - blocksPerRun: 100 + # Number of failed blocks to process in each recovery cycle + blocksPerRun: 50 +# Reorganization handler configuration - Detects and handles blockchain reorganizations reorgHandler: + # Whether reorg detection is enabled enabled: true - interval: 1000 - blocksPerScan: 50 - -validation: - mode: strict # "disabled", "minimal", or "strict" + # Interval in milliseconds between reorg scans + interval: 10000 + # Number of blocks to scan for reorgs in each cycle + blocksPerScan: 200 + # Starting block number for reorg detection (0 = start from latest) + fromBlock: 1000000 + # Whether to force starting from fromBlock for reorg detection + forceFromBlock: false +# Storage configuration for different components storage: + # Main storage configuration - Primary data storage main: clickhouse: - port: 9440 - database: "default" - disableTLS: true + # ClickHouse database host + host: localhost + # ClickHouse database port + port: 9000 + # Database username + username: admin + # Database password + password: password + # Database name + database: main + # Whether to disable TLS encryption + disableTLS: false + # Whether to use async inserts for better performance + asyncInsert: true + # Maximum number of rows to insert in a single query + maxRowsPerInsert: 1000 + # Maximum number of open database connections + maxOpenConns: 100 + # Maximum number of idle database connections + maxIdleConns: 100 + # Chain-specific configuration + chainBasedConfig: + "1": # Chain ID 1 (Ethereum mainnet) + # Table name for this chain's data + tableName: main + # Default fields to select when querying blocks + defaultSelectFields: + - block_number + - block_hash + - block_timestamp + - block_gas_limit + - block_gas_used + # Whether to enable parallel processing of materialized views + enableParallelViewProcessing: true + # Maximum query execution time in seconds + maxQueryTime: 60 + + # Staging storage configuration - Temporary storage for data processing staging: clickhouse: - port: 9440 - database: "default" - disableTLS: true + host: localhost + port: 9000 + username: admin + password: password + database: main + disableTLS: false + asyncInsert: true + + # Orchestrator storage configuration - Storage for orchestration metadata orchestrator: clickhouse: - port: 9440 - database: "default" - disableTLS: true \ No newline at end of file + host: localhost + port: 9000 + username: admin + password: password + database: main + disableTLS: false + asyncInsert: true + +# API configuration - Settings for the HTTP API server +api: + # Host address to bind the API server to + host: localhost + # ThirdWeb contract API endpoint for contract metadata + thirdwebContractApi: https://contract.thirdweb.com + # HTTP client configuration for contract API requests + contractApiRequest: + # Maximum number of idle connections in the connection pool + maxIdleConns: 100 + # Maximum number of idle connections per host + maxIdleConnsPerHost: 10 + # Maximum number of connections per host + maxConnsPerHost: 10 + # Timeout for idle connections in seconds + idleConnTimeout: 60 + # Whether to disable HTTP compression + disableCompression: false + # Request timeout in seconds + timeout: 120 + # Whether to enable ABI decoding for contract events + abiDecodingEnabled: true + # ThirdWeb client configuration + thirdweb: + # ThirdWeb client ID for API access + clientId: 123abc + +# Publisher configuration - Settings for publishing data to message queues +publisher: + # Whether the publisher is enabled + enabled: true + # Kafka broker addresses (comma-separated) + brokers: localhost:9092 + + # Block publishing configuration + blocks: + # Whether to publish block data + enabled: true + # Kafka topic name for block data + topicName: blocks + + # Transaction publishing configuration + transactions: + # Whether to publish transaction data + enabled: true + # Kafka topic name for transaction data + topicName: transactions + # Filter transactions by destination address (empty = no filter) + toFilter: + - 0x0000000000000000000000000000000000000000 + # Filter transactions by source address (empty = no filter) + fromFilter: + - 0x0000000000000000000000000000000000000000 + + # Trace publishing configuration + traces: + # Whether to publish transaction trace data + enabled: true + # Kafka topic name for trace data + topicName: traces + + # Event publishing configuration + events: + # Whether to publish contract event data + enabled: true + # Kafka topic name for event data + topicName: events + # Filter events by contract address (empty = no filter) + addressFilter: + - 0x0000000000000000000000000000000000000000 + # Filter events by topic0 (event signature) (empty = no filter) + topic0Filter: + - 0x0000000000000000000000000000000000000000 + +# Work mode configuration - Controls system behavior based on blockchain state +workMode: + # Interval in minutes to check if system should switch between live/historical mode + checkIntervalMinutes: 10 + # Block number threshold to determine if system is in "live mode" (near chain head) + liveModeThreshold: 1000000 + +# Validation configuration - Controls data validation strictness +validation: + # Validation mode: "disabled" (no validation), "minimal" (basic checks), or "strict" (comprehensive validation) + mode: strict diff --git a/configs/secrets.example.yml b/configs/secrets.example.yml index fd248dd..40fcf31 100644 --- a/configs/secrets.example.yml +++ b/configs/secrets.example.yml @@ -1,21 +1,82 @@ -rpc: - url: https://1.rpc.thirdweb.com - -api: - basicAuth: - username: admin - password: admin - thirdweb: - clientId: 123abc - +# Storage configuration for different database connections storage: + # Main storage configuration for primary data operations main: clickhouse: + # Database server hostname or IP address host: localhost + # Database server port number + port: 9000 + # Database username for authentication username: admin + # Database password for authentication password: password + # Database name to connect to + database: default + # Whether to disable TLS encryption (false = use TLS) + disableTLS: false + # Enable asynchronous inserts for better performance + asyncInsert: true + # Maximum number of rows to insert in a single batch + maxRowsPerInsert: 1000 + # Maximum number of open database connections in the pool + maxOpenConns: 100 + # Maximum number of idle connections to keep in the pool + maxIdleConns: 100 + # Chain-specific configuration for different blockchain networks + chainBasedConfig: + # Configuration for chain ID 1 (Ethereum mainnet) + "1": + # config for blocks table + blocks: + # Table name to use for this chain + tableName: blocks_v2 + # Default fields to select when querying blocks + defaultSelectFields: + - block_number + - block_hash + - block_timestamp + - block_gas_limit + - block_gas_used + # Enable parallel processing of materialized views + enableParallelViewProcessing: true + # Maximum query execution time in seconds + maxQueryTime: 60 + + # Staging storage configuration for temporary data processing staging: clickhouse: host: localhost + port: 9000 username: admin - password: password \ No newline at end of file + password: password + database: default + disableTLS: false + asyncInsert: true + + # Orchestrator storage configuration for coordination data + orchestrator: + clickhouse: + host: localhost + port: 9000 + username: admin + password: password + database: default + disableTLS: false + asyncInsert: true + +# API configuration for HTTP endpoints +api: + # Basic authentication credentials for API access + basicAuth: + # Username for API authentication + username: admin + # Password for API authentication + password: admin + +# Publisher configuration for data publishing services +publisher: + # Username for publisher service authentication + username: admin + # Password for publisher service authentication + password: password