Remove FINAL from ClickHouse queries#315
Conversation
Remove the FINAL clause from multiple SELECT queries in internal/libs/clickhouse.go. Affected functions: GetTransactionMismatchRangeFromClickHouseV2, GetLogsMismatchRangeFromClickHouseV2, getBlocksFromV2, getTransactionsFromV2, getLogsFromV2, and getTracesFromV2. Queries now read directly from the respective tables (blocks, transactions, logs, traces) without the FINAL modifier.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Disabled knowledge base sources:
WalkthroughRemoved the ClickHouse Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/libs/clickhouse.go`:
- Line 563: getBlocksFromV2, getTransactionsFromV2, getLogsFromV2, and
getTracesFromV2 now behave inconsistently after removing FINAL (while
queryBlocksByBlockNumbers still uses FINAL), so restore a deterministic dedup
policy across all loaders: either re-add FINAL to the SELECT queries in
getBlocksFromV2/getTransactionsFromV2/getLogsFromV2/getTracesFromV2 to match
queryBlocksByBlockNumbers, or replace FINAL with an explicit dedup expression
(e.g., select latest row per key via argMax/anyLast/ROW_NUMBER filter) in each
function so blocks, transactions, logs and traces all dedupe the same way;
update all four functions (getBlocksFromV2, getTransactionsFromV2,
getLogsFromV2, getTracesFromV2) consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5cdcc2ff-2c8b-4646-9014-73ab68dca937
📒 Files selected for processing (1)
internal/libs/clickhouse.go
| blocksRaw := make([]common.Block, length) | ||
|
|
||
| query := fmt.Sprintf("SELECT %s FROM %s.blocks FINAL WHERE chain_id = %d AND block_number BETWEEN %d AND %d order by block_number", | ||
| query := fmt.Sprintf("SELECT %s FROM %s.blocks WHERE chain_id = %d AND block_number BETWEEN %d AND %d order by block_number", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# In a ClickHouse-connected environment, verify duplicate-key presence in target ranges.
# Required env vars: CH_DB, CHAIN_ID, START_BLOCK, END_BLOCK
clickhouse-client --query "
SELECT 'blocks' AS tbl, count() FROM (
SELECT block_number FROM ${CH_DB}.blocks
WHERE chain_id=${CHAIN_ID} AND block_number BETWEEN ${START_BLOCK} AND ${END_BLOCK}
GROUP BY block_number HAVING count() > 1
)
UNION ALL
SELECT 'transactions', count() FROM (
SELECT block_number, transaction_index FROM ${CH_DB}.transactions
WHERE chain_id=${CHAIN_ID} AND block_number BETWEEN ${START_BLOCK} AND ${END_BLOCK}
GROUP BY block_number, transaction_index HAVING count() > 1
)
UNION ALL
SELECT 'logs', count() FROM (
SELECT block_number, transaction_index, log_index FROM ${CH_DB}.logs
WHERE chain_id=${CHAIN_ID} AND block_number BETWEEN ${START_BLOCK} AND ${END_BLOCK}
GROUP BY block_number, transaction_index, log_index HAVING count() > 1
)
UNION ALL
SELECT 'traces', count() FROM (
SELECT block_number, transaction_index, trace_address FROM ${CH_DB}.traces
WHERE chain_id=${CHAIN_ID} AND block_number BETWEEN ${START_BLOCK} AND ${END_BLOCK}
GROUP BY block_number, transaction_index, trace_address HAVING count() > 1
);
"Repository: thirdweb-dev/insight
Length of output: 106
🏁 Script executed:
# First, let's look at the clickhouse.go file around the specified lines
wc -l internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 95
🏁 Script executed:
# Read the relevant sections around lines 563, 592, 621, 650
sed -n '560,565p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 367
🏁 Script executed:
# Read all four locations mentioned
sed -n '560,565p;589,594p;618,623p;647,652p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 1331
🏁 Script executed:
# Search for FINAL keyword in the file to see if it's present anywhere
grep -n "FINAL" internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 544
🏁 Script executed:
# Search for getBlocksFromV2 function
rg -n "getBlocksFromV2" --type goRepository: thirdweb-dev/insight
Length of output: 534
🏁 Script executed:
# Search for tx/log/trace loader functions
rg -n "append.*row|overwrite|last-seen" --type go internal/libs/Repository: thirdweb-dev/insight
Length of output: 46
🏁 Script executed:
# Read the function around line 558 to see the full structure
sed -n '558,680p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 3915
🏁 Script executed:
# Let's see the full context - what functions use FINAL queries
sed -n '260,330p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 2356
🏁 Script executed:
# Let's understand the git history - was FINAL removed recently?
git log --oneline -n 20 -- internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 108
🏁 Script executed:
# Check if there's a pattern - let's see all the V2 functions and their current state
grep -n "func.*FromV2\|FINAL" internal/libs/clickhouse.go | head -20Repository: thirdweb-dev/insight
Length of output: 1018
🏁 Script executed:
# Let's see what the commit changed
git show a92cf92 --statRepository: thirdweb-dev/insight
Length of output: 4326
🏁 Script executed:
# Let's see the actual diff for the changed lines
git show a92cf92 -- internal/libs/clickhouse.go | head -100Repository: thirdweb-dev/insight
Length of output: 3891
🏁 Script executed:
# Check how the data flows - where are getBlocksFromV2 and friends called?
rg -n "getBlocksFromV2|getTransactionsFromV2|getLogsFromV2|getTracesFromV2" --type goRepository: thirdweb-dev/insight
Length of output: 1309
🏁 Script executed:
# Let's check the data flow - see where these functions are called from
sed -n '200,250p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 1433
🏁 Script executed:
# Check the structure of the data - are blocks supposed to be unique per number?
sed -n '400,450p' internal/libs/clickhouse.goRepository: thirdweb-dev/insight
Length of output: 1867
🏁 Script executed:
# Let's see if there's any comment explaining why FINAL was removed
git show a92cf92 -- internal/libs/clickhouse.go | grep -A 5 -B 5 "FINAL\|getBlocksFromV2\|getTransactionsFromV2"Repository: thirdweb-dev/insight
Length of output: 4648
Range loaders lack consistent dedup strategy after removing FINAL.
Commit a92cf92 removed FINAL from getBlocksFromV2, getTransactionsFromV2, getLogsFromV2, and getTracesFromV2 (lines 563, 592, 621, 650). This creates inconsistent behavior: blocks use index-based overwrite (implicitly deduping to the last row), while transactions, logs, and traces use append (accumulating all rows). Without explicit dedup logic, outputs become dependent on merge state—if tables contain duplicates from out-of-order inserts, results vary between loaders.
Meanwhile, queryBlocksByBlockNumbers and related functions retain FINAL, further inconsistency. Establish an explicit, deterministic dedup policy across all loaders: either add FINAL back, filter for latest versions, or document the reliance on merge order.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/libs/clickhouse.go` at line 563, getBlocksFromV2,
getTransactionsFromV2, getLogsFromV2, and getTracesFromV2 now behave
inconsistently after removing FINAL (while queryBlocksByBlockNumbers still uses
FINAL), so restore a deterministic dedup policy across all loaders: either
re-add FINAL to the SELECT queries in
getBlocksFromV2/getTransactionsFromV2/getLogsFromV2/getTracesFromV2 to match
queryBlocksByBlockNumbers, or replace FINAL with an explicit dedup expression
(e.g., select latest row per key via argMax/anyLast/ROW_NUMBER filter) in each
function so blocks, transactions, logs and traces all dedupe the same way;
update all four functions (getBlocksFromV2, getTransactionsFromV2,
getLogsFromV2, getTracesFromV2) consistently.
internal/libs/clickhouse.go: remove the FINAL clause from SELECT queries in queryBlocksByBlockNumbers, queryTransactionsByBlockNumbers, queryLogsByBlockNumbers and queryTracesByBlockNumbers. The change drops "FINAL" from queries against blocks, transactions, logs and traces tables (keeps same ordering and fields). This avoids using ClickHouse's FINAL merge behavior in these lookups, simplifying the queries and addressing related performance/consistency concerns.
Remove the FINAL clause from multiple SELECT queries in internal/libs/clickhouse.go. Affected functions: GetTransactionMismatchRangeFromClickHouseV2, GetLogsMismatchRangeFromClickHouseV2, getBlocksFromV2, getTransactionsFromV2, getLogsFromV2, and getTracesFromV2. Queries now read directly from the respective tables (blocks, transactions, logs, traces) without the FINAL modifier.
Summary by CodeRabbit