### TODO:


- shift 'high' & 'low' by one to avoid forward bias for volume / volatility computation
- Plot ACF & PACF to obtain the good number of lags + complete the overleaf part.
- Use GridsearchCV to find the optimal number of trees for random-forest regressor.
- Should we consider log of volume and log of moving averages?

### Assumptions undertaken | done tasks:
- Re-queried data with daily intervals instead of hourly returns
    - updated the period burn to match new frequency
    - updated the rolling plots to match new frequency
    - TODO: update overleaf report accordingly
    - We re-queried on-chain data as of 21/12/19 - data should match at the daily level
    
- add intercept in design matrix
- removal of all transactions with zero 'close' & removal of first two weeks of each trading period  
    - Assumption: on-chain transactions prior to quotation are left aside.
    - Upsides: allows to remove prices equal to 0 and then remove first two weeks of trading (to avoid possible extreme outliers)
- include back on-chain transactions FROM exchanges and add a dummy variable identifying whether it comes FROM or TO an exchange
    - added a count variable for transactions from one exchange's wallet to another
    - underlying data in its current iteration (as of 21/12/19) only has transactions TO exchanges' wallets and not FROM as i.e. if someone had sent money FROM the exchange TO an unknown wallet
    - in its previous iteration, we missed transactions from one exchange to another, while they were legitimate

We will consider the GARCH(1,1) with Student-t innovations model of the log-returns process \{$y_t$, $t\in \mathcal{Z}$\}, for $t = 1, \dots, T$:

$$    y_t = \epsilon_t \cdot h_t^{1/2} $$

where $\epsilon_t$ is a sequence of $i.i.d.$ variables $\sim$ ${\mathcal{t}}(\nu)$, 

$$ h_t = \omega + \alpha \cdot y^2_{t-1} + \beta \cdot h_{t-1}$$

for $\omega > 0$; $\alpha,\beta \geq 0$; ${\mathcal{t}}(\nu)$ denotes the Students-t distribution with $\nu$ degrees of freedom. The restriction on the GARCH parameters $\omega$, $\alpha$, $\beta$ guarantee the conditional variance's positivity.

The term $(\alpha + \beta)$ is the degree of persistence in the autocorrelation of the squares which controls the intensity of the clustering in the variance process. With a value close to one, past shocks and past variances will have a longer impact on the future conditional variance.


In [1]:
import numpy as np 
import pandas as pd 
import os

In [2]:
token_data = pd.read_csv('ethereum_erc20_tokens.csv')

In [3]:
def compute_for_token(token_address,
    token_initials,
    initial_date = None
    ):
    
    import pandas as pd
    import numpy as np
    import os
    from datetime import datetime
    import matplotlib.pyplot as plt
    import scipy
    import sklearn
    import time
    import os.path
    from arch import arch_model
    from arch.univariate import ConstantMean, GARCH, StudentsT
    
    from sklearn.model_selection import train_test_split
        
    """######################## Load the dataframe of on-chain transactions #############################"""

    # Read the .csv file containing all exchanges relevant data
    exchange_data = pd.read_csv('all_exchanges.csv')
    
    df = pd.read_csv('./data/{}/raw_big_query.csv'.format(token_initials))
    display(df.shape)
    df.block_timestamp = df.block_timestamp.apply(lambda x: datetime.strptime(x[:-6],'%Y-%m-%d %H:%M:%S') )

    # transactions towards the different exchanges
    exchange_txn_count = (
        df
        .groupby('to_address')
        .count()
        .reset_index()
        .loc[:,['to_address','token_address']]
        .merge(exchange_data, right_on='Address',left_on='to_address')
        .drop(['Address','to_address', 'Txn Count', 'Balance'], axis = 1)
        .rename(columns={'token_address':'transaction_count'})
        .sort_values('transaction_count', ascending=False)
    )
    
    #display(exchange_txn_count.head(20))
    tokens.loc[tokens.initials == token_initials,'most active exchange'] = exchange_txn_count.iloc[0,1]

    # We convert the values (stored as strings) to floats
    df.value=df.value.astype(float)

    exchange_data.Address

    # We create a dummy variable to identify transactions
    # from one exchange's wallet to another   
    df['from_exchange'] = 1 
    # we subselect the dataframe where the 'from_address' is not in the list of exchange addresses 
    df.loc[~df['from_address'].isin(exchange_data.Address), 'from_exchange'] = 0
    
    trans = df
    print('We found {} transactions towards exchanges.'.format( df.shape[0]) )
    tokens.loc[tokens.initials == token_initials,'transactions to exchanges'] = df.shape[0]
        
    address_tran_counts = (
        trans
        .groupby('from_address')
        .count()
        .sort_values(by='value', ascending=False)
        .to_address
    )
    
    trans = (
        df
        .sort_values('block_timestamp')
        .reset_index()
        .drop('index', axis=1)
    )
    
    trans = trans.rename(columns={'block_timestamp': 'time'})

    # We transform the times UTC to naive times
    trans.time = trans.time.values

    min_date = datetime.timestamp(trans.time[0])
    max_date = datetime.timestamp(trans.time.iloc[-1])
    max_date, min_date
            
    """######################## Load the dataframe of off-chain transactions #############################"""  
    
    price_raw = pd.read_csv('./data/{}/raw_crypto_compare.csv'.format(token_initials))
    price = price_raw
    price = price.sort_values('time').reset_index()

    price.loc[:,'time_readable'] = (
        price
        .time
        .apply(lambda x:datetime.utcfromtimestamp(x).strftime('%Y-%m-%d %H:%M:%S') )
    )
    
    # We convert the price dates into datetime objects
    price.time=price.time.apply(lambda x: datetime.fromtimestamp(x))

    if initial_date:
        price = price.loc[price.time >initial_date]

    # "volumeto" means the volume in the currency that is being traded
    # "volumefrom" means the volume in the base currency that things are traded into.
    price = (
        price
        .drop(
            ['time_readable', 'conversionSymbol','conversionType', 'open', 'volumefrom','index'],
            axis=1
        )
    )
    
    # We consider the BTC volume as what we refer to as 'volume'
    price = (
        price
        .rename(columns={'volumeto':'volume'}) 
        .drop('Unnamed: 0', axis=1)
    )
    
    # filters out all zero prices
    price = price[(price[['close']] != 0).all(axis=1)]
    print('We have {} observations.'.format( price.shape[0]) )
    
    '''############################ COMPUTE ON-CHAIN AGGREGATES ##################################'''
    
    # We compute the returns of the closing prices
    price.loc[:,'returns'] = np.log(price['close']).diff()
        
    # In order to transform dates to non local time
    trans.loc[:,'time']= trans.time.values

    min_time, max_time = price.time.iloc[[0,-1]].values
    min_trans_time, max_trans_time = trans.time.iloc[[0,-1]].values

    # We only keep the transactions for which we have price data
    #trans = trans.loc[(trans['time'] >= min_time)]   
    '''
    TEST PURPOSES: 
    we only keep the transactions for which we have on-chain data
    Current maximum off_chain timestamp as of 21/12/2019 is 21/12/2019
    Current maximum on_chain time is 2019-12-21 19:29:00 (as of 21/12/2019)
    '''
    price = price.loc[(price['time'] <= max_trans_time)]   
        
    # We create bins of hourly intervals and we aggregate the on-chain transactions in the bins
    cut = pd.cut(trans.time, bins = price.time, duplicates='drop')

    '''
    GOAL: merge on-chain and off-chain datasets.
    PROCEDURE:
    Group by the number of tokens exchanged (value) and compute the sum
    and count of this feature.
    Yields: 'onchain_volume' & 'onchain_trans'
    Perform the same by grouping by 'from_exchange' dummy variable
    and taking a count of it, to obtain the number of 
    transactions coming FROM exchanges
    '''
    data = price

    for _GROUPBY in ['value','from_exchange']:
        # sum variable of FROM exchanges transactions
        trans_cut = trans.groupby(cut)[_GROUPBY].aggregate(['count','sum'])
              
        # We show 5 examples of intervals for readability
        trans_cut.sample(5)
        trans_cut = trans_cut.reset_index()
        trans_cut.time = trans_cut.time.apply(lambda x:x.right)
        trans_cut.time = pd.to_datetime(trans_cut.time)
               
        data = (
            trans_cut
            .merge(data, left_on='time', right_on='time')
        )
        
        if _GROUPBY == 'value':
            data = data.rename(columns={'sum':'onchain_volume', 'count':'onchain_trans'})
        else:
            data = (
                data
                .rename(columns={'sum':'from_exchanges_transactions'})
                .drop('count', axis=1)
            )
    
    # if we have daily intervals: burn the first two weeks i.e. 14 days
    data = data[14:]
    
    
    '''############################ GARCH PROCEDURE ##################################'''

    am = ConstantMean(data['returns'][1:])
    am.volatility = GARCH(1, 0, 1)
    am.distribution = StudentsT()

    res = am.fit(disp = 'off', update_freq = 7)
    display(res)
    GARCH_param = (
        pd
        .DataFrame(
            {
                'Asset' : 'Stock',
                'omega' : [res.params['omega']],
                'alpha' : [res.params['alpha[1]']],
                'beta' : [res.params['beta[1]']],
                'nu' : [res.params['nu']]                            
            }
        )
    )
    
    GARCH_param.set_index('Asset', inplace = True)

    display(GARCH_param)
    display(res.summary())   


    '''############################ DATA CLEANING ##################################'''
    # We remove the rows where one value is missing
    data = (
        data
        .replace([np.inf, -np.inf], np.nan) # remove rows with infinite values
        .dropna()
        .reset_index()
        .drop('index', axis=1)
    )
    
    lower_bound = -100
    upper_bound = 100
    data = data.loc[(data['returns'] >= lower_bound) & (data['returns'] <= upper_bound)]
    data.sort_values('time')   


    '''############################ PLOTS ##################################'''

    # The style of the figure can be set globally using the matplotlib rc parameters.
    plt.rcParams['axes.grid'] = True
    plt.rcParams["figure.figsize"] = [10,6]
    
    from statsmodels.graphics.tsaplots import plot_pacf
    from statsmodels.graphics.tsaplots import plot_acf

    plot_pacf(data.returns, lags=list(range(1,48)))
    plt.title('Partial Autocorrelation of {} (starting at 1 lag)'.format(token_initials))
    plt.show()
    plot_acf(data.returns, lags=list(range(1,48)))
    plt.title('Autocorrelation of {} (starting at 1 lag)'.format(token_initials))
    plt.show()
    

    nb_figures = 6
    
    fig, (ax1,ax11, ax5, ax2, ax3, ax4, ) = plt.subplots(nb_figures, 1, figsize=(15,10), sharex =True)
    for ax in fig.axes:
        plt.sca(ax)
        plt.xticks(rotation=90)
    plt.suptitle('{} data'.format(token_initials))
    plt.subplots_adjust(hspace = 0.25) # the amount of height reserved for space between subplots
    ax1.set_title('Price evolution in BTC')
    ax1.plot(data.time, data.close)
    
    ax11.set_title('Daily return of token price')
    ax11.plot(data.time, data.returns)
    ax11.set_ylim((-0.3,0.3))
    
    ax5.set_title('Volume evolution in BTC')
    ax5.plot(data.time, data.volume)

    # Plots if data consist of daily intervals        
    ax2.set_title('Rolling 7-day exchange volume evolution in BTC')
    ax2.plot(data.time,data.volume.rolling(7).mean())
    
    ax3.set_title('Rolling 7-day on-chain transaction count to exchanges')
    ax3.plot(data.time,data.onchain_trans.rolling(7).mean())
    
    ax4.set_title('Rolling 7-day on-chain volume evolution in BTC')
    ax4.plot(data.time,data.onchain_volume.rolling(7).mean())
    
    # Rotates and right aligns the x labels, and moves the bottom of the
    # Axes up to make room for them
    fig.autofmt_xdate()
    
    plt.savefig('./data/{}/all_plots.png'.format(token_initials))

    data['intercept'] = 1
    
    #display(data)
    
    '''############################ DESIGN MATRICES ##################################'''
    
    y_returns = data.returns.values
    y_volume = data.volume.values
    
    X_returns = (
        data
        .loc[
            :,
             [
              'intercept', 'onchain_trans', 'onchain_volume',
              'high', 'low', 'volume', 'from_exchanges_transactions'
             ]
            ]
        .values
    )
    
    X_volume = (
        data
        .loc[
            :,
            [
                'intercept', 'onchain_trans', 'onchain_volume',
                'high', 'low', 'from_exchanges_transactions'
            ]
        ]
        .values
    )
    
    '''
    The following lines creates an iterator to iterate by pair of y_ | x_
    We would like to run for pairs of y_returns, X_returns, likewise for volume
    '''
    lis = (y_returns, X_returns, y_volume, X_volume)
    it = iter(lis)
    
    for y in it:
        
        X = next(it)
        
        from sklearn.impute import SimpleImputer

        results = {}
        X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=2, shuffle=False)
        X_train_train, X_train_test, y_train_train, y_train_test = train_test_split(X_train,y_train,test_size=0.33, random_state=2)
        
        from sklearn.preprocessing import StandardScaler
        scaler = StandardScaler()
        scaler.fit(X_train_train)

        X_train_train = scaler.transform(X_train_train)
        X_train_test = scaler.transform(X_train_test)
        X_test = scaler.transform(X_test)
        
        '''############################ Linear Regression ##################################'''
        from sklearn.linear_model import LinearRegression

        reg = LinearRegression().fit(X_train_train, y_train_train)
              
        # If the the series is Returns, set column_name to 'returns'
        # Else set the column name to 'off-chain volume'
        col_name = 'returns' if np.array_equiv(X, X_returns) else 'off-chain volume' 
        # col_name = 'returns' if (X == X_returns).all() else 'off-chain volume' 
        
        results['linear regression score of ' + col_name] = reg.score(X_test, y_test)
        print('linear regression score of ' + col_name, reg.score(X_test, y_test))

        reg_score = reg.score(X_test, y_test)
        tokens.loc[tokens.initials == token_initials,'linear regression score of '+col_name] = reg_score

        # Start the scatter plots here
        nb_rows = 2
        nb_columns = 1
        
        fig, (ax_linreg, ax_tree) = plt.subplots(nb_rows, nb_columns, figsize=(15,10), sharex =True)
        for ax in fig.axes:
            plt.sca(ax)
            plt.xticks(rotation=90)
        plt.suptitle('{} linear regression and RF Regressor for '.format(token_initials) + col_name)
        ax_linreg.scatter(y_test, reg.predict(X_test))
        ax_linreg.plot(y_test,y_test, 'k')
        ax_linreg.set_title(
            'Linear regression of ' + col_name +
            ' prediction vs. true validation data token {}\n score: {:.3f} '
            .format(token_initials, reg_score)
        )
        # Rotates and right aligns the x labels, and 
        # Moves the bottom of the axes up 
        # To make room for them
        fig.autofmt_xdate()
        
        '''############################ RF Regressor ##################################'''
        from sklearn.ensemble import RandomForestRegressor
        
        clf = RandomForestRegressor(n_estimators = 100)
        clf = clf.fit(X_train_train, y_train_train)
        results['RF Regressor score of ' + col_name] = clf.score(X_test,y_test)
        print('RF Regressor score of ' + col_name, clf.score(X_test,y_test) )
        clf_score = clf.score(X_test,y_test) 

        tokens.loc[tokens.initials == token_initials,'RF Regressor score of ' + col_name] = clf_score

        ax_tree.scatter(y_test, clf.predict(X_test))
        ax_tree.plot(y_test,y_test, 'k')
        ax_tree.set_title(
            'RF Regressor prediction of ' + col_name +
            ' vs. true validation data token {}\n score:{:.3f} '
            .format(token_initials, clf_score)
        )
        # Rotates and right aligns the x labels, and 
        # Moves the bottom of the axes up 
        # To make room for them
        fig.autofmt_xdate()
    
        plt.savefig('./data/{}/regression_plots.png'.format(token_initials))
        plt.savefig('./data/{}/'.format(token_initials) + col_name + '_regression_plots.png')


### Run the program

In [4]:
tokens = os.listdir('data')

In [5]:
# We keep the tokens for which we have both on-chain and off-chain data
tokens = pd.Series(tokens)
keep = (
    tokens
    .apply(lambda x: 
           os.path.exists('data/{}/raw_big_query.csv'.format(x)) 
           and 
           os.path.exists('data/{}/raw_crypto_compare.csv'.format(x))
    )
)
remove_tokens = tokens.loc[~keep]
tokens = tokens.loc[keep]
tokens = token_data.merge(tokens.to_frame(), left_on='initials', right_on=0).drop(0, axis=1) #token_data contains all info
tokens.loc[:,'linear model score'] = 0
tokens.loc[:,'transactions to exchanges'] = 0
tokens = tokens.drop(['holders','daily_volume','price'], axis = 1)
#tokens.dtypes

### Test procedure

In [6]:
# hardcode our test procedure
compute_for_token('0xe41d2489571d322189246dafa5ebde1f4699f498', 'ZRX')

(176283, 9)

We found 176283 transactions towards exchanges.
We have 863 observations.


                        Constant Mean - GARCH Model Results                         
Dep. Variable:                      returns   R-squared:                      -0.005
Mean Model:                   Constant Mean   Adj. R-squared:                 -0.005
Vol Model:                            GARCH   Log-Likelihood:                1324.32
Distribution:      Standardized Student's t   AIC:                          -2638.64
Method:                  Maximum Likelihood   BIC:                          -2614.94
                                              No. Observations:                  847
Date:                      Sat, Dec 21 2019   Df Residuals:                      842
Time:                              21:04:04   Df Model:                            5
                                   Mean Model                                  
                  coef    std err          t      P>|t|        95.0% Conf. Int.
---------------------------------------------------------------------------

Unnamed: 0_level_0,omega,alpha,beta,nu
Asset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Stock,5.8e-05,0.115103,0.878178,4.497665


0,1,2,3
Dep. Variable:,returns,R-squared:,-0.005
Mean Model:,Constant Mean,Adj. R-squared:,-0.005
Vol Model:,GARCH,Log-Likelihood:,1324.32
Distribution:,Standardized Student's t,AIC:,-2638.64
Method:,Maximum Likelihood,BIC:,-2614.94
,,No. Observations:,847.0
Date:,"Sat, Dec 21 2019",Df Residuals:,842.0
Time:,21:04:04,Df Model:,5.0

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
mu,-5.7160e-03,1.444e-03,-3.959,7.526e-05,"[-8.546e-03,-2.886e-03]"

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
omega,5.7528e-05,3.608e-05,1.594,0.111,"[-1.320e-05,1.283e-04]"
alpha[1],0.1151,4.455e-02,2.583,9.781e-03,"[2.778e-02, 0.202]"
beta[1],0.8782,4.474e-02,19.630,8.606e-86,"[ 0.790, 0.966]"

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
nu,4.4977,0.636,7.077,1.471e-12,"[ 3.252, 5.743]"


<Figure size 1000x600 with 1 Axes>

<Figure size 1000x600 with 1 Axes>

linear regression score of returns -0.02471576443475687
RF Regressor score of returns -0.8485243286096709
linear regression score of off-chain volume -2.513522298584115
RF Regressor score of off-chain volume 0.581185496822494


### Run the program for all tokens

In [None]:
for i in range(len(tokens.initials)):
    try:
        print(i)
        compute_for_token(tokens.address[i], tokens.initials[i])
    except Exception as e:
        print(e)

In [8]:
tokens.to_csv('tokens_all_regressions.csv', index=False)
tokens.head(1)

Unnamed: 0,address,name,description,market_cap,initials,linear model score,transactions to exchanges,most active exchange,linear regression score of returns,RF Regressor score of returns,linear regression score of off-chain volume,RF Regressor score of off-chain volume
0,0xe41d2489571d322189246dafa5ebde1f4699f498,ZRX (ZRX),"Description: 0x is an open, permissionless pro...","$176,738,366",ZRX,0,176283,9.0,-0.024716,-0.719294,-2.513522,0.57499


### Printing results in LaTeX

In [9]:
import pandas as pd
tokens = pd.read_csv('tokens_all_regressions.csv')
tokens.head(3)

Unnamed: 0,address,name,description,market_cap,initials,linear model score,transactions to exchanges,most active exchange,linear regression score of returns,RF Regressor score of returns,linear regression score of off-chain volume,RF Regressor score of off-chain volume
0,0xe41d2489571d322189246dafa5ebde1f4699f498,ZRX (ZRX),"Description: 0x is an open, permissionless pro...","$176,738,366",ZRX,0,176283,9.0,-0.024716,-0.719294,-2.513522,0.57499
1,0x3883f5e181fccaf8410fa61e12b59bad963fb645,Theta Token (THETA),A decentralized peer-to-peer network that aims...,"$75,948,820",THETA,0,29863,9.0,-0.088587,-0.145234,-0.094154,0.269438
2,0xf629cbd94d3791c9250152bd8dfbdf380e2a3b9c,EnjinCoin (ENJ),Customizable cryptocurrency and virtual goods ...,"$48,686,825",ENJ,0,65174,9.0,-0.56764,0.03299,-7.067294,-14.555281


In [10]:
# Make a proper format to improve clarity
tokens['market_cap'] = (
    tokens['market_cap']
    .str.replace(',', '')
    .str.replace('$', '')
    .astype(int)
)
tokens = tokens.sort_values('market_cap', ascending=False)
tokens.head(3)

Unnamed: 0,address,name,description,market_cap,initials,linear model score,transactions to exchanges,most active exchange,linear regression score of returns,RF Regressor score of returns,linear regression score of off-chain volume,RF Regressor score of off-chain volume
0,0xe41d2489571d322189246dafa5ebde1f4699f498,ZRX (ZRX),"Description: 0x is an open, permissionless pro...",176738366,ZRX,0,176283,9.0,-0.024716,-0.719294,-2.513522,0.57499
1,0x3883f5e181fccaf8410fa61e12b59bad963fb645,Theta Token (THETA),A decentralized peer-to-peer network that aims...,75948820,THETA,0,29863,9.0,-0.088587,-0.145234,-0.094154,0.269438
2,0xf629cbd94d3791c9250152bd8dfbdf380e2a3b9c,EnjinCoin (ENJ),Customizable cryptocurrency and virtual goods ...,48686825,ENJ,0,65174,9.0,-0.56764,0.03299,-7.067294,-14.555281


In [11]:
tokens['market_cap'] = (tokens['market_cap']/1000000).apply(lambda x: '${:,.2f}M'.format(x))
tokens.head(3)

Unnamed: 0,address,name,description,market_cap,initials,linear model score,transactions to exchanges,most active exchange,linear regression score of returns,RF Regressor score of returns,linear regression score of off-chain volume,RF Regressor score of off-chain volume
0,0xe41d2489571d322189246dafa5ebde1f4699f498,ZRX (ZRX),"Description: 0x is an open, permissionless pro...",$176.74M,ZRX,0,176283,9.0,-0.024716,-0.719294,-2.513522,0.57499
1,0x3883f5e181fccaf8410fa61e12b59bad963fb645,Theta Token (THETA),A decentralized peer-to-peer network that aims...,$75.95M,THETA,0,29863,9.0,-0.088587,-0.145234,-0.094154,0.269438
2,0xf629cbd94d3791c9250152bd8dfbdf380e2a3b9c,EnjinCoin (ENJ),Customizable cryptocurrency and virtual goods ...,$48.69M,ENJ,0,65174,9.0,-0.56764,0.03299,-7.067294,-14.555281


### Print ML techniques scores

In [12]:
nb_decimals_to_round_to = 2 

print(tokens
      .drop(['address','description',
             'initials','linear model score',
             'most active exchange','transactions to exchanges'], axis=1)
      .round(
          {
              'RF Regressor score of returns': nb_decimals_to_round_to,
              'RF Regressor score of off-chain volume': nb_decimals_to_round_to,
              'linear regression score of returns': nb_decimals_to_round_to,
              'linear regression score of off-chain volume': nb_decimals_to_round_to,
          }
      )
      .reset_index(drop=True)
      .rename(columns = 
              {
                  'RF Regressor score of returns' : 'RF Regressor returns score',
                  'RF Regressor score of off-chain volume' : 'RF Regressor volume score',
                  'linear regression score of returns' : 'Lin-reg returns score',
                  'linear regression score of off-chain volume' : 'Lin-reg volume score'                  
              }
       )
      .to_latex()
)

\begin{tabular}{lllrrrr}
\toprule
{} &                 name & market\_cap &  Lin-reg returns score &  RF Regressor returns score &  Lin-reg volume score &  RF Regressor volume score \\
\midrule
0 &            ZRX (ZRX) &   \$176.74M &                  -0.02 &                       -0.72 &                 -2.51 &                       0.57 \\
1 &  Theta Token (THETA) &    \$75.95M &                  -0.09 &                       -0.15 &                 -0.09 &                       0.27 \\
2 &      EnjinCoin (ENJ) &    \$48.69M &                  -0.57 &                        0.03 &                 -7.07 &                     -14.56 \\
\bottomrule
\end{tabular}



### Print the descriptions

In [13]:
lzt = tokens.columns.tolist()
lzt.remove('description')
lzt.remove('name')
pd.options.display.max_colwidth = 100
print(tokens
      .drop(lzt, axis=1)
      .to_latex()
)

\begin{tabular}{lll}
\toprule
{} &                 name &                                                                                          description \\
\midrule
0 &            ZRX (ZRX) &  Description: 0x is an open, permissionless protocol allowing for tokens to be traded on the Ethe... \\
1 &  Theta Token (THETA) &      A decentralized peer-to-peer network that aims to offer improved video delivery at lower costs. \\
2 &      EnjinCoin (ENJ) &                                   Customizable cryptocurrency and virtual goods platform for gaming. \\
\bottomrule
\end{tabular}

