Classification of Smart Contracts
- a) Separate EOA & Smart Contracts 
    - A multisig has no outgoing transactions 
    - It can have creation event but no necessarily (i.e. proxy contract controls EOA)
        - see 1: Contract creation event - https://etherscan.io/address/0x8392f6669292fa56123f71949b52d883ae57e225
        - see 2: Doesn't have contract creation event - https://etherscan.io/address/0x9e2b6378ee8ad2a4a95fe481d63caba8fb0ebbf9
- b) Filter out Multisigs 
- c) Classify remaining smart contracts using  ABI, bytecode, return values, and manual code review.
    
> Ref.: https://ieeexplore.ieee.org/document/9730412

> Ref.: https://arxiv.org/pdf/2106.15497.pdf 

> Ref.: https://ieeexplore.ieee.org/document/9019682 


## EOA Parser

In [13]:
import pandas as pd
import numpy as np
import dask.dataframe as dd
from os.path import join
import os

from dotenv import load_dotenv
load_dotenv()  

path = os.environ['PROJECT_PATH']

In [14]:
col_name=['hash', 'nonce', "block_hash",'block_number',"transaction_index",'from_address', 'to_address', 'value', 'gas', 'gas_price',"input",'block_timestamp', "max_fee_per_gas","max_priority_fee_per_gas","transaction_type"]
dd_tx = dd.read_csv(join(path,'tx_all_uniq_addresses2.csv'), dtype='str', header=None, names=col_name)

In [None]:
from web3 import Web3, HTTPProvider

df_ua = pd.read_csv(join(path, 'df_unique_addresses2.csv'))

w3 = Web3(Web3.HTTPProvider(os.environ['ETHEREUM_NODE_ENDPOINT']))

for a in df_ua.unique_addresses: 

    address = w3.toChecksumAddress(a)

    if w3.utilis.isAddress(address):
        if w3.eth.getCode(address) == '0x':
            print("This is an EOA.")
        else:
            print("This is a contract.")
    else:
        print("Invalid Ethereum address")

    break

In [30]:
# dd_tx = dd.read_csv(join(path,'transactions_merged_all.csv'), dtype='str')
# dd_tx.head()

## Smart Contract Classification

## External sources of address labeling 

### Lp pair addresses
> Note: For simplification we assume a balanced LP pool of 50-50. 

1. Filter LP address that are in our set of addresses 
2. Extract all holder of LP Pool share token
3. Sum balances of LP share token for snapshot date 
4. Divide by 2 
5. Add to balances of direct wallet holdings of a given token at snapshot height

In [19]:
df_ua = pd.read_csv(join(path, 'df_unique_addresses2.csv'))
df_ua.drop(columns=['Unnamed: 0'], inplace=True)
df_lp_pairs = pd.read_csv('assets/address_labels/dex_lp_pair_addresses.csv')

In [20]:
df_lp_pairs['pair_address_f'] = df_lp_pairs['hex(pair_address)'].apply(lambda x: w3.toChecksumAddress('0x' + x))
df_ua['unique_addresses_f'] = df_ua.unique_addresses.apply(lambda x: w3.toChecksumAddress(x))

In [38]:
df_lp_pairs_relevant = df_lp_pairs[df_lp_pairs.pair_address_f.isin(df_ua.unique_addresses_f)]

In [None]:
## merged df include lp pair data if relevant
df_ua.merge(df_lp_pairs_relevant, left_on='unique_addresses_f', right_on='pair_address_f', how='outer', suffixes=('','dex'))