# Realized Cap
Author: [@typerbole](https://twitter.com/typerbole), [GitHub](https://github.com/ty-perbole), [stack-stats.com](http://www.stack-stats.com)

This is the third notebook in the [Stack Stats](http://www.stack-stats.com) repository. See the [Stack Stats readme](https://github.com/ty-perbole/stack-stats/blob/master/README.md) for more about this project and further instructions in how to run the jupyter notebook. This tutorial assumes a basic familiarity with Python and SQL.

In the [previous tutorial](https://ty-perbole.github.io/stack-stats/02_HODLWavesPart1.html) we dove into working with on-chain data and generated the HODL wave charts popularized by [Dhruv Bansal at Unchained Capital](https://unchained-capital.com/blog/hodl-waves-1/). We experimented with using different weightings for calculating the relative width of each age band:
1. BTC value weighted (original HODL waves chart)
2. Flat weighting: total number of UTXO in band (UTXO count)
3. Flat weighting, filtered: Total number of non-dust UTXO in band (UTXO count > 0.01 BTC)
4. Realized cap weighted: USD value (at market price from time of UTXO creation)

In this notebook we're going to tackle the fourth weighting which is a little more complicated.

## Realized Cap methodology

The history and methodology of Realized Cap is discussed in the [original blog post](https://coinmetrics.io/realized-capitalization/) by CoinMetrics. I suggest you read that post before proceeding with this notebook so you have a thorough understanding of what we're calculating.

Realized cap is a Market Cap analogue that values each UTXO at the price when it was created on-chain, rather than the current price. It is roughly an estimate of the cost basis of all Bitcoin hodlers.

## ${Realized\ Cap}=\sum_{}UTXO\ Value\ (BTC) * BTC\ price\ at\ UTXO\ creation$

Realized Cap is a useful input into some of the valuation ratios discussed in the earlier [ratios tutorial](https://ty-perbole.github.io/stack-stats/01_BitcoinNetworkRatios.html).

## Calculation

The calculation for Realized Cap is an extension of the work we did for [HODL Waves](https://ty-perbole.github.io/stack-stats/02_HODLWaves.html), so I recommend going go through that tutorial first before trying this one.

Realized Cap requires a BTC price feed, which isn't available on BigQuery public data. We'll need to upload the data ourselves. Luckily this is pretty easy.

First we need to download the latest CoinMetrics community data:

In [1]:
import os
try:
    os.remove("btc.csv")
except FileNotFoundError:
    pass
!wget https://coinmetrics.io/newdata/btc.csv

--2020-05-17 11:27:44--  https://coinmetrics.io/newdata/btc.csv
Resolving coinmetrics.io (coinmetrics.io)... 104.26.14.66, 104.26.15.66
Connecting to coinmetrics.io (coinmetrics.io)|104.26.14.66|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2482079 (2.4M) [application/octet-stream]
Saving to: ‘btc.csv’


2020-05-17 11:27:45 (24.9 MB/s) - ‘btc.csv’ saved [2482079/2482079]



The CoinMetrics data should now be located in the Stack Stats directory on your computer as btc.csv.

Now you just have to create a new BigQuery table via the [web UI](https://bigquery.cloud.google.com/) as such: 

![Transactions Table Schema](img/cm_bigquery.png)

Once you hit "Create new table" you simply have to upload the btc.csv file, hit "Automatically detect" under Schema, the hit Create Table on the bottom. It will upload the csv and you will then have a BigQuery table with the CoinMetrics community data. I named my table "cm_btc" under my private "bitcoin" BigQuery dataset.

Now that we have that set up, we can start to build our query. Since we are working off the query we built for the HODL Waves tutorial, I will not explain anything that was already covered there.

In [2]:
QUERY = '''
WITH

-- Outputs subquery: contains relevant information about a given output.
-- A TXO is created when it is an output of a transaction, so this contains
-- metadata about the TXO creation
output AS (
  SELECT
    transactions.HASH AS transaction_hash,
    transactions.block_number AS created_block_number,
    transactions.block_timestamp AS created_block_ts,
    outputs.index AS output_index,
    outputs.value AS output_value
  FROM
    `bigquery-public-data.crypto_bitcoin.transactions` AS transactions,
    transactions.outputs AS outputs
    ),

-- Inputs subquery: contains relevant information about a given input.
-- A TXO is consumed when it is the input to a transaction, so this metadata
-- tells us about when a TXO is spent or destroyed
input AS (
  SELECT
    transactions.hash AS spending_transaction_hash,
    inputs.spent_transaction_hash AS spent_transaction_hash,
    transactions.block_number AS destroyed_block_number,
    transactions.block_timestamp AS destroyed_block_ts,
    inputs.spent_output_index,
    inputs.value AS input_value
  FROM
    `bigquery-public-data.crypto_bitcoin.transactions` AS transactions,
    transactions.inputs AS inputs
    ),
'''

Below we are going to define our CoinMetrics data table and use this to get price. You should replace the BigQuery project name to match your private BigQuery CM table.

In [3]:
QUERY += '''
-- Now we can add the table we created and get the daily USD price of bitcoin
cm AS (
SELECT
  date,
  PriceUSD
FROM
-- ** YOU WILL HAVE TO REPLACE THE PROJECT NAME HERE TO REFLECT YOUR OWN BIGQUERY TABLE **
  `replace_this_project.bitcoin.cm_btc`),
'''

The TXO table is similar to the one we defined in the HODLWavesPart1 tutorial, except we now incorporate the price data and use that to calculate the output cost basis.

In [4]:
QUERY += '''
-- txo subquery: joins outputs to inputs so that we know when/if a TXO is spent.
-- NEW: we also join the price data and calculate the cost basis of each TXO
txo AS (
  SELECT
    output.transaction_hash,
    output.created_block_number,
    DATETIME(output.created_block_ts) AS created_block_ts,
    -- Any field from the input table will be NULL if the TXO remains unspent.
    input.spending_transaction_hash,
    input.spent_transaction_hash,
    input.destroyed_block_number,
    DATETIME(input.destroyed_block_ts) AS destroyed_block_ts,
    output.output_value,
    output.output_value * cm.PriceUSD / 100000000 AS output_cost_basis_usd,
    cm.PriceUSD AS output_cost_basis_price
  FROM
    output
  -- Use Left Join, as not all outputs will be linked as inputs in future transactions if they remain unspent.
  LEFT JOIN
    input
  ON
    -- Join an output to a future input based on the output transaction hash
    -- matching the spent transaction hash of the input
    output.transaction_hash = input.spent_transaction_hash
    -- Also make sure the output index matches within the transaction hash
    AND output.output_index = input.spent_output_index
  -- Get the price data from our cm table with coinmetrics price data
  LEFT JOIN
    cm
  ON
  -- Join the price data onto the output creation block ts, to get the price at the time of output creation (cost basis)
    DATE(output.created_block_ts) = cm.date
  ),
'''

The blocks table is the same as HODLWavesPart1, except we also get the price on the date that block occurred.

In [5]:
QUERY += '''
-- blocks subquery: for each date get the final block for that date
-- NEW: we also join the price data so that we can see the price at each block
blocks AS (
  SELECT
    DATE(blocks.timestamp) AS date,
    -- Get last block per day
    MAX(blocks.number) AS block_number,
    MAX(DATETIME(blocks.timestamp)) AS block_ts,
    cm.PriceUSD AS price_usd
  FROM
    `bigquery-public-data.crypto_bitcoin.blocks` AS blocks
  LEFT JOIN
    cm
  ON
    cm.date = DATE(blocks.timestamp)
  GROUP BY
    date, price_usd)
'''

Final aggregation and metric calculation query.

In [6]:
QUERY += '''
-- final data aggregation query: join txo with blocks, keeping only txo
-- that were created and unspent as of that block, then bucket the txo
-- by age and sum the txo value per bucket per that day
-- NEW: Last grouping of SUM() columns, where we sum the output_cost_basis_usd column
--      from the txo table to get realized cap!
SELECT
  -- Time series metadata
  blocks.date AS date,
  blocks.block_number AS block_number,
  blocks.block_ts AS block_ts,
  blocks.price_usd AS price_usd,

-- BTC Value Weighting
  -- Total UTXO value on that date
  SUM(txo.output_value) AS total_utxo_value,
  -- Our HODL Waves buckets, counting value of UTXO
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 1, txo.output_value, 0)) AS utxo_value_under_1d,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 1
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 7,
         txo.output_value, 0)) AS utxo_value_1d_1w,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 7
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28,
         txo.output_value, 0)) AS utxo_value_1w_1m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 3,
         txo.output_value, 0)) AS utxo_value_1m_3m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 6,
         txo.output_value, 0)) AS utxo_value_3m_6m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 6
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12,
         txo.output_value, 0)) AS utxo_value_6m_12m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 18,
         txo.output_value, 0)) AS utxo_value_12m_18m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 18
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 24,
         txo.output_value, 0)) AS utxo_value_18m_24m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 2
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 3,
         txo.output_value, 0)) AS utxo_value_2y_3y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 5,
         txo.output_value, 0)) AS utxo_value_3y_5y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 5
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 8,
         txo.output_value, 0)) AS utxo_value_5y_8y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 8,
         txo.output_value, 0)) AS utxo_value_greater_8y,

-- Flat Weighting
  -- Total UTXO count on that date
  SUM(1) AS total_utxo_count,
  -- Our HODL Waves buckets, counting number of UTXO
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 1, 1, 0)) AS utxo_count_under_1d,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 1
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 7,
         1, 0)) AS utxo_count_1d_1w,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 7
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28,
         1, 0)) AS utxo_count_1w_1m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 3,
         1, 0)) AS utxo_count_1m_3m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 6,
         1, 0)) AS utxo_count_3m_6m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 6
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12,
         1, 0)) AS utxo_count_6m_12m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 18,
         1, 0)) AS utxo_count_12m_18m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 18
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 24,
         1, 0)) AS utxo_count_18m_24m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 2
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 3,
         1, 0)) AS utxo_count_2y_3y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 5,
         1, 0)) AS utxo_count_3y_5y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 5
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 8,
         1, 0)) AS utxo_count_5y_8y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 8,
         1, 0)) AS utxo_count_greater_8y,

-- Flat weighting, filtered
  -- Total UTXO count on that date (> 0.01 BTC)
  SUM(IF(txo.output_value / 100000000 > 0.01, 1, 0)) AS total_utxo_count_filter,
  -- Our HODL Waves buckets, counting number of UTXO (> 0.01 BTC)
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 1
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_under_1d,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 1
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 7
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_1d_1w,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 7
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_1w_1m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 3
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_1m_3m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 6
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_3m_6m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 6
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_6m_12m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 18
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_12m_18m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 18
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 24
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_18m_24m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 2
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 3
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_2y_3y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 5
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_3y_5y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 5
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 8
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_5y_8y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 8
         AND txo.output_value / 100000000 >= 0.01,
         1, 0)) AS utxo_count_filter_greater_8y,

-- BTC USD Value (Realized Cap) Weighting
  -- Realized Cap on that date
  SUM(txo.output_cost_basis_usd) AS realized_cap,
  -- Our HODL Waves buckets, counting value of UTXO
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 1, txo.output_cost_basis_usd, 0)) AS utxo_realcap_under_1d,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 1
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 7,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_1d_1w,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 7
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_1w_1m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 3,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_1m_3m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 6,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_3m_6m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 6
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_6m_12m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 18,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_12m_18m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 18
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 24,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_18m_24m,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 2
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 3,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_2y_3y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 3
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 5,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_3y_5y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 5
         AND DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) < 28 * 12 * 8,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_5y_8y,
  SUM(IF(DATETIME_DIFF(blocks.block_ts, txo.created_block_ts, DAY) >= 28 * 12 * 8,
         txo.output_cost_basis_usd, 0)) AS utxo_realcap_greater_8y
         
FROM
  blocks
CROSS JOIN
  txo
WHERE
  -- Only include transactions that were created on or after the given block
  blocks.block_number >= txo.created_block_number
  -- Only include transactions there were unspent as of the given block
  AND (
    -- Transactions that are spent after the given block, so they are included
    blocks.block_number < txo.destroyed_block_number
    -- Transactions that are never spent, so they are included
    OR txo.destroyed_block_number IS NULL)
GROUP BY
  date, block_number, block_ts, price_usd
ORDER BY
  date ASC;
'''

In [7]:
print(QUERY)


WITH

-- Outputs subquery: contains relevant information about a given output.
-- A TXO is created when it is an output of a transaction, so this contains
-- metadata about the TXO creation
output AS (
  SELECT
    transactions.HASH AS transaction_hash,
    transactions.block_number AS created_block_number,
    transactions.block_timestamp AS created_block_ts,
    outputs.index AS output_index,
    outputs.value AS output_value
  FROM
    `bigquery-public-data.crypto_bitcoin.transactions` AS transactions,
    transactions.outputs AS outputs
    ),

-- Inputs subquery: contains relevant information about a given input.
-- A TXO is consumed when it is the input to a transaction, so this metadata
-- tells us about when a TXO is spent or destroyed
input AS (
  SELECT
    transactions.hash AS spending_transaction_hash,
    inputs.spent_transaction_hash AS spent_transaction_hash,
    transactions.block_number AS destroyed_block_number,
    transactions.block_timestamp AS destroyed_block_ts,

That is our full query to get the HODL waves & Realized Cap data!

I recommend copy/pasting the query into the [BigQuery web UI](https://bigquery.cloud.google.com/) and running from there. You can also run the query from this notebook using [Pandas BigQuery API](https://pandas-gbq.readthedocs.io/en/latest/install.html) if desired. Be sure to set the query dialect to Standard SQL.

The query takes ~50 minutes to run, and when it's done you'll have a time series with the HODL waves distribution. You can then save the query output as a CSV, which we'll now be loading into the notebook and plotting.

# Plot the waves

In [8]:
import pandas as pd
import numpy as np
import os

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import chart_utils

In [9]:
# Save your own version of the HODL waves query output, or use the version from the repo
waves = pd.read_csv("./data/03_hodl_waves_real_cap.csv")

# Load in CoinMetrics BTC data to get daily price
price = pd.read_csv("btc.csv",
                    usecols=['date', 'PriceUSD', 'SplyCur', 'CapRealUSD'])

# Join the price data onto the waves dataframe
waves = waves.merge(price, on='date')

In [10]:
print(waves.columns.values)
waves.head()

['date' 'block_number' 'block_ts' 'price_usd' 'total_utxo_value'
 'utxo_value_under_1d' 'utxo_value_1d_1w' 'utxo_value_1w_1m'
 'utxo_value_1m_3m' 'utxo_value_3m_6m' 'utxo_value_6m_12m'
 'utxo_value_12m_18m' 'utxo_value_18m_24m' 'utxo_value_2y_3y'
 'utxo_value_3y_5y' 'utxo_value_5y_8y' 'utxo_value_greater_8y'
 'total_utxo_count' 'utxo_count_under_1d' 'utxo_count_1d_1w'
 'utxo_count_1w_1m' 'utxo_count_1m_3m' 'utxo_count_3m_6m'
 'utxo_count_6m_12m' 'utxo_count_12m_18m' 'utxo_count_18m_24m'
 'utxo_count_2y_3y' 'utxo_count_3y_5y' 'utxo_count_5y_8y'
 'utxo_count_greater_8y' 'total_utxo_count_filter'
 'utxo_count_filter_under_1d' 'utxo_count_filter_1d_1w'
 'utxo_count_filter_1w_1m' 'utxo_count_filter_1m_3m'
 'utxo_count_filter_3m_6m' 'utxo_count_filter_6m_12m'
 'utxo_count_filter_12m_18m' 'utxo_count_filter_18m_24m'
 'utxo_count_filter_2y_3y' 'utxo_count_filter_3y_5y'
 'utxo_count_filter_5y_8y' 'utxo_count_filter_greater_8y' 'realized_cap'
 'utxo_realcap_under_1d' 'utxo_realcap_1d_1w' 'utxo_r

Unnamed: 0,date,block_number,block_ts,price_usd,total_utxo_value,utxo_value_under_1d,utxo_value_1d_1w,utxo_value_1w_1m,utxo_value_1m_3m,utxo_value_3m_6m,...,utxo_realcap_6m_12m,utxo_realcap_12m_18m,utxo_realcap_18m_24m,utxo_realcap_2y_3y,utxo_realcap_3y_5y,utxo_realcap_5y_8y,utxo_realcap_greater_8y,CapRealUSD,PriceUSD,SplyCur
0,2009-01-03,0,2009-01-03T18:15:05,,5000000000,5000000000,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0
1,2009-01-09,14,2009-01-09T04:33:09,,75000000000,70000000000,5000000000,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,950.0
2,2009-01-10,75,2009-01-10T23:57:02,,380000000000,305000000000,70000000000,5000000000,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4000.0
3,2009-01-11,168,2009-01-11T23:39:41,,845000000000,465000000000,375000000000,5000000000,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,8650.0
4,2009-01-12,262,2009-01-12T23:45:47,,1315000000000,475000000000,835000000000,5000000000,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,13350.0


As a quick sanity check, let's compare our realized_cap to the CapRealUSD column from CoinMetrics. We would expect these to match each other closely, otherwise it's likely we made a mistake in our calculations

In [11]:
# Check against CoinMetrics as a sanity check

waves['OurCapRealUSD'] = waves['realized_cap']
waves['CapRealUSDDelta'] = waves['OurCapRealUSD'] - waves['CapRealUSD']

In [12]:
# See what the delta is between our data and CoinMetrics

chart_utils.two_axis_chart(
    waves[:-5], x_series='date', y1_series=['OurCapRealUSD', 'CapRealUSD'], y2_series='CapRealUSDDelta',
    title='Our calculated Realized Vap vs CoinMetrics', 
    y1_series_axis_type='linear',
    y2_series_axis_type='linear', y2_series_axis_format="{n}")

Our Realized Cap matches CoinMetric's decently well, but there are still some weird outlier days when the difference gets up to 100 million, and one day where it gets up to 400 million. This is a little weird but not unexpected, since the discrepancies occur during periods of maximum BTC price volatility. It's possible they use intra-day data to calculate cost basis, for example.

All in all the occasional 100 million difference is not that large when calculated on a RealizedCap value of 100 billion. You can see this by comparing OurCapRealUSD and CapRealUSD, where you can barely tell they diverge at all.

Now we can get to the good stuff and plot the Realized Cap HODL Wave using the chart_utils.hodl_waves_chart() function as in the HODLWavesPart1 tutorial.

In [13]:
chart_utils.hodl_waves_chart(waves.dropna(), version='realcap')

See [here](https://ty-perbole.github.io/stack-stats/RealCapHODLWaves.html) for the live version of this chart!

In the previous tutorial we calculated something I called the Stack Rate, which was the daily change in share of UTXO held for under 6 months. We can calculate that on a RealizedCap weighted basis now, but we'll use 3 months instead since that data series seems more sensitive for RealCap.

In [14]:
# Calculate share of non-dust UTXO held for under 3 months

waves['short_held_realcap_pct'] = (
    waves['utxo_realcap_under_1d'] + waves['utxo_realcap_1d_1w'] + waves['utxo_realcap_1w_1m']
    + waves['utxo_realcap_1m_3m']
    ) / waves['realized_cap']

chart_utils.two_axis_chart(
    waves, x_series='date', y1_series='short_held_realcap_pct', y2_series='PriceUSD',
    title='Share of RealizedCap held for under 3 months', 
    y1_series_axis_type='linear', y1_series_axis_range=[0, 1], y1_series_axis_format=",.0%",
    y2_series_axis_type='log', y2_series_axis_range=[-2, 6], data_source=None)

This looks like a decent market timing indicator. Pretty crazy that at bubble tops > 80% of RealizedCap value was created in the previous 3 months.

Again, lets calculate the first derivative of this series to get the rate of change AKA the Stack Rate.

In [15]:
# Lets calculate the 14 day delta between datapoints to get the 14 day rolling average detla (or slope) of the
# short term held UTXO series
waves['short_held_realcap_pct_DoD_delta'] = (
    waves['short_held_realcap_pct'] - waves['short_held_realcap_pct'].shift(14)) / 14

chart_utils.two_axis_chart(
    waves, x_series='date', y1_series='short_held_realcap_pct_DoD_delta', y2_series='PriceUSD',
    title='Stack Rate: Daily change in percent of RealCap held for under 3 months', 
    y1_series_axis_type='linear', y1_series_axis_format=",.2%",
    y2_series_axis_type='log', y2_series_axis_range=[-2, 6])

# The end

Thanks for following along and I hope you learned something.

If you enjoyed this tutorial you can follow me on twitter [@typerbole](https://twitter.com/typerbole), where I will continue to publish tutorials and Bitcoin data science content. Feel free to DM me with feedback or suggestions, or email me at [my twitter handle] at pm.me.