# ERC-4626: vault ecosystem comparison across chains

- In this notebook, we examine different ERC-4626 vaults across different EVM blockchains   
    - Currently we do not scan non-ERC-4626 vaults like Enzyme Finance, or any protocol-native vaults like Hyperliquid HPL. This is not an inherit limitation, this is not just yet implemented.
- We assemble various data tables out of the vault data to show and compare the blockchain ecosystems
- The analysis focus on USD-stablecoin nonminatd vaults
    - Currently missing are e.g. WETH vaults and staking vaults for various small cap tokens
    - There is no ERC standard for vaults fees - for some protocols we have manualled added fee reading support  
- The list of chains is somewhat randomly selected and very easy to extend to contain any chain supported by [Envio's HyperSync](https://docs.envio.dev/docs/HyperSync/hypersync-supported-networks)
- Everything is open source: You can run this notebook and associated scripts yourself on your local computer, it will take around an hour

## Usage

See `ERC-4626 scanning all vaults onchain` example in tutorials first how to build a vault database as local `vault_db.pickle` file.




## Setup

- Set up notebook renderinb parmaeters

In [None]:
import pandas as pd

pd.options.display.float_format = '{:,.2f}'.format

## Read scanned data

- Read the Pickle database our scanning script produced earlier 

In [1]:
import pickle
from pathlib import Path

import pandas as pd

from eth_defi.token import is_stablecoin_like

output_folder = Path("~/.tradingstrategy/vaults").expanduser()
vault_db = output_folder / "vault-db.pickle"
assert vault_db.exists(), "Run the vault scanner script first"

vault_db = pickle.load(open(vault_db, "rb"))

print(f"We have data for {len(vault_db)} vaults")

We have data for 6976 vaults


## Transform data

- Prepare the raw vault pickled data as Pandas DataFrame for data research

In [3]:
import pandas as pd
from eth_defi.erc_4626.hypersync_discovery import ERC4262VaultDetection
from eth_defi.chain import get_chain_name
from eth_defi.token import is_stablecoin_like

data = list(vault_db.values())
df = pd.DataFrame(data)

# Build useful columns out of raw pickled Python data
# _detection_data contains entries as ERC4262VaultDetection class
entry: ERC4262VaultDetection
df["Chain"] = df["_detection_data"].apply(lambda entry: get_chain_name(entry.chain))
df["Protocol identified"] = df["_detection_data"].apply(lambda entry: entry.is_protocol_identifiable())
df["Stablecoin denominated"] = df["_denomination_token"].apply(lambda token_data: is_stablecoin_like(token_data["symbol"]) if token_data else False)
df["ERC-7540"] = df["_detection_data"].apply(lambda entry: entry.is_erc_7540())
df["Fee detected"] = df.apply(lambda row: (row["Mgmt fee"] is not None) or (row["Perf fee"] is not None), axis=1)
# Event counts
df["Deposit count"] = df["_detection_data"].apply(lambda entry: entry.deposit_count)
df["Redeem count"] = df["_detection_data"].apply(lambda entry: entry.redeem_count)
df["Total events"] = df["Deposit count"] + df["Redeem count"] 
df = df.sort_values(["Chain", "Address"])
df = df.set_index(["Chain", "Address"])

print("DataFrame MultiIndex is:", ", ".join(x for x in df.index.names))
print("DataFrame columns are:", ", ".join(x for x in df.columns))

display(df.head())

AttributeError: 'float' object has no attribute 'is_stablecoin_like'

# Vault deployment history

- Show how much history we have for each chain


In [None]:
# Assuming your DataFrame is named 'df'
seen_df = df.groupby(level='Chain')['First seen'].agg(['min', 'max']).reset_index()

# Rename columns for clarity
seen_df.columns = ['Chain', 'First vault deployed', 'Last vault deployed']

display(seen_df)

## Vaults per chain summary

- Get a summary of scanned chains at what vaults they have
- *Generic* status means that we do not have classification rules to determine the protocol on which a particular ERC-4626 vault belongs
- *Broken* status means that we could not correctly extract ERC-4626 information out of a smart contract

To detect the protocol of a vault, we need to maintain a [manual rule list here](https://github.com/tradingstrategy-ai/web3-ethereum-defi/blob/master/eth_defi/erc_4626/classification.py). Not all protocols are supported at the moment. as there are too many protocols to manually examine and identify them. Open source contributions welcome.




In [8]:
summary_df = df.groupby(level='Chain').size().reset_index(name='Count')
summary_df = summary_df.sort_values(by='Count', ascending=False)
display(summary_df)

nav_threshold = 10_000

# Built different masks
identified_filter = df["Protocol identified"] == True
stablecoin_denominated = df["Stablecoin denominated"] == True
notable_nav = df["Stablecoin denominated"] & (df["NAV"] >= nav_threshold)
erc_7540 = df["ERC-7540"] == True 
fee_detected = df["Fee detected"] == True 

# Create the summary DataFrame
summary_df = pd.DataFrame({
    'Total vaults detected': df.groupby(level='Chain').size(),
    'Protocol identified': df[identified_filter].groupby(level='Chain').size(),
    'Stablecoin denominated': df[stablecoin_denominated].groupby(level='Chain').size(),
    f'Notable stablecoin NAV (min {nav_threshold} USD)': df[notable_nav].groupby(level='Chain').size(),
    f'ERC-7540': df[erc_7540].groupby(level='Chain').size()
    f'Fee data supported': df[fee_detected].groupby(level='Chain').size()
}).fillna(0).astype(int).set_index("Chain")

Unnamed: 0,Chain,Count
4,Ethereum,2074
0,Arbitrum,1911
8,Polygon,906
3,Binance,314
1,Avalanche,249
2,Berachain,215
7,Mode,56
6,Mantle,37
9,Unichain,7
5,Hyperliquid,6


## Largest USD vaults

- Show the stablecoin-denominated vaults across different chains that have largest USD treasury 

In [None]:
largest_threshold = 20
largest_df = df.sort_values(["NAV"])

largest_df = largest_df[["Chain", "Address", "Name", "Denomination", "NAV"]]

display(largest_df.head(largest_threshold))

## Most active vaults

- Determine vault activity by number of deposit and redeem events
- Events may be driven by bots, so this may not reflect the popularity of a vault amount users


In [None]:
largest_threshold = 20
largest_df = df.sort_values(["Total events"])

largest_df = largest_df[["Chain", "Address", "Name", "Denomination", "NAV", "Total events", "Deposit events", "Redeem events"]]

display(largest_df.head(largest_threshold))

### Most active vault per chain

- Display the number one vault per chain