### Import library

In [1]:
!pip install -r requirements.txt





In [2]:
import ace_lib as ace
import helpful_functions as hf
import pandas as pd
import requests
import plotly.express as px

### Start session
Enter credentials once - they will be saved to local folder and loaded each time

In [3]:
s = ace.start_session()

Complete biometrics authentication and press any key to continue: 
https://api.worldquantbrain.com/authentication/persona?inquiry=inq_NMN98x2DSskDwJnF6e9bDCCy




### Create list of alpha expressions
#### Step 1. Download datsets

In [4]:
datasets_df = hf.get_datasets(s) # by default we load all datasets USA TOP3000 delay 1
datasets_df.head() # DataFrame.head() shows first 5 rows of the dataframe 

Unnamed: 0,id,name,description,category,subcategory,region,delay,universe,coverage,turnover,valueScore,userCount,alphaCount,fieldCount
0,analyst11,ESG scores,Environmental Social Governance scores that ex...,analyst,analyst-esg,USA,1,TOP3000,0.6818,,2.0,140,10172,197
1,analyst14,Estimations of Key Fundamentals,This dataset reports many items from financial...,analyst,analyst-analyst-estimates,USA,1,TOP3000,0.6176,,2.0,238,58072,870
2,analyst15,Earnings forecasts,This dataset provides bottom-up forecast data ...,analyst,analyst-analyst-estimates,USA,1,TOP3000,0.9892,,2.0,277,138916,2538
3,analyst16,Real Time Estimates,This dataset provides real-time access to the ...,analyst,analyst-crowdsourced-estimates,USA,1,TOP3000,0.8252,,2.0,139,6904,113
4,analyst2,Analysts Estimates of Key Fundamentals,This data contains forecasts of key fundamentals.,analyst,analyst-crowdsourced-estimates,USA,1,TOP3000,0.4702,,2.0,388,14812,27


In [14]:
# select needed datasets
selected_datasets_df = datasets_df.query("""
        delay == 1 &\
        0.4 < coverage <= 1 &\
        0 < fieldCount < 10000 &\
        region == 'USA' &\
        universe == 'TOP3000' &\
        0 < userCount < 1000 &\
        1 < valueScore < 10 &\
        name.str.contains('opt', case=False) == 1 &\
        (category == 'option' | category == 'option')
    """, engine='python').sort_values(by=['valueScore'], ascending=False)
selected_datasets_df

Unnamed: 0,id,name,description,category,subcategory,region,delay,universe,coverage,turnover,valueScore,userCount,alphaCount,fieldCount
179,option11,Equity Option Ratings Data,Relative measures of the implied volatility (I...,option,option-option,USA,1,TOP3000,0.8665,,3.0,30,350,15
178,option1,Options Volatility Surfaces Data,The dataset provides standardized option data ...,option,option-option-volatility,USA,1,TOP3000,0.7799,,2.0,42,476,18
181,option4,Implied Volatility and Pricing for Equity Options,This dataset provides information of options p...,option,option-option,USA,1,TOP3000,0.8218,,2.0,274,108406,1196
182,option6,Forecasted Volatility for Equity Options,This dataset has various estimates of Options ...,option,option-option-volatility,USA,1,TOP3000,0.894,,2.0,189,34450,140


For demonstration we are picking the dataset with the highest value score, this is not mandatory but definitely a recommended practice

#### Step 2. Select the needed datafields


In [15]:
dataset_id = selected_datasets_df.id.values.tolist()[0] # create a list of selected datasets ids, our list has only one element
dataset_id

'option11'

In [16]:
datafields_df = hf.get_datafields(s, dataset_id=dataset_id) # doenload all fields of dataset news92
datafields_df.head()

Unnamed: 0,id,description,dataset,category,subcategory,region,delay,universe,type,coverage,turnover,userCount,alphaCount
0,opt11_calendardaysuntilearnings,The number of calendar days until the next ear...,"{'id': 'option11', 'name': 'Equity Option Rati...",option,option-option,USA,1,TOP3000,VECTOR,0.8361,,9,36
1,opt11_earningscrushrate,The expected crush rate for the ATM IV of the ...,"{'id': 'option11', 'name': 'Equity Option Rati...",option,option-option,USA,1,TOP3000,VECTOR,0.8135,,15,46
2,opt11_iv30percentile,IV Percentile based on 30-day IV using one yea...,"{'id': 'option11', 'name': 'Equity Option Rati...",option,option-option,USA,1,TOP3000,VECTOR,0.875,,6,30
3,opt11_iv30rank,IV Rank based on 30-day IV using one year of h...,"{'id': 'option11', 'name': 'Equity Option Rati...",option,option-option,USA,1,TOP3000,VECTOR,0.8758,,5,14
4,opt11_iv30rating,IV Rating based on 30-day IV using one year of...,"{'id': 'option11', 'name': 'Equity Option Rati...",option,option-option,USA,1,TOP3000,VECTOR,0.8776,,7,26


#### Step 3. Create expression list, using selected datafields
Here is an example: creating expressions that assign weights as per the time series skewness of vector average of the field value

In [17]:
expression_list = [f'ts_delta(vec_avg({x}),120)' for x in datafields_df.id.values.tolist()] # create a list of alpha expressions ts_skewness(vec_avg({x}),120) where x is a datafield id

#### Step 4. Apply generate_alpha function to the expression list
In generate alpha function you can specify region, universe, decay, delay and other simulation settings

In [18]:
#when you send multiple alphas for simulation, please make sure all alphas of a single list should have common settings
#alphas with different settings should be sent in a different list, for instance below list has all alphas with same settings

alpha_list = [ace.generate_alpha(x, region= "USA", universe = "TOP3000",) for x in expression_list]

alpha_list[0]

{'type': 'REGULAR',
 'settings': {'nanHandling': 'OFF',
  'instrumentType': 'EQUITY',
  'delay': 1,
  'universe': 'TOP3000',
  'truncation': 0.08,
  'unitHandling': 'VERIFY',
  'pasteurization': 'ON',
  'region': 'USA',
  'language': 'FASTEXPR',
  'decay': 0,
  'neutralization': 'INDUSTRY',
  'visualization': False},
 'regular': 'ts_delta(vec_avg(opt11_calendardaysuntilearnings),120)'}

This is an example - how alpha actually looks like when you send it to the platform.

### Simulate alpha list, get simulation result

simulate_alpha_list_multi will do a multi-simulation if list of alphas is greater than 10, which is the case here

the returned object will contain simulation results for all alphas as a list

In [19]:
#alpha expressions are sliced to first 10 for demonstration purpose

result = ace.simulate_alpha_list_multi(s, alpha_list[:10])

100%|██████████| 4/4 [01:23<00:00, 20.77s/it]


In [20]:
#we received a list of 10 simulation results since we sent in alpha_list[:10]
len(result)

10

__Accessing the result of the first alpha, let's take a look at all the keys of this dictionary__

In [21]:
result[0].keys()

dict_keys(['alpha_id', 'simulate_data', 'is_stats', 'pnl', 'stats', 'is_tests'])

In [22]:
#is_stats key will return a dataframe storing IS summary results

result[0]['is_stats']

Unnamed: 0,pnl,bookSize,longCount,shortCount,turnover,returns,drawdown,margin,fitness,sharpe,alpha_id
0,1914834,20000000,1228,1229,0.5739,0.0195,0.1282,6.8e-05,0.08,0.41,xPrNAAm


In [14]:
#prettify_result function can be used from the helpful_functions library to take a look at IS stats of all the simulated alphas

result_st1 = hf.prettify_result(result, detailed_tests_view=False)
result_st1

Unnamed: 0,pnl,book_size,long_count,short_count,turnover,returns,drawdown,margin,fitness,sharpe,alpha_id,expression,concentrated_weight,high_turnover,is_ladder_sharpe,low_fitness,low_sharpe,low_sub_universe_sharpe,low_turnover,matches_competition
0,2338646,20000000,1233,1195,0.0525,0.0238,0.0821,0.000906,0.25,0.57,zP3wjv8,"ts_skewness(vec_avg(mws38_action),120)",FAIL,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
1,7087469,20000000,1182,1529,0.0797,0.0721,0.602,0.001808,0.24,0.31,o3EwAlb,"ts_skewness(vec_avg(mws38_negative_freq),120)",FAIL,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS
2,70758,20000000,1335,1169,0.0544,0.0007,0.0646,2.6e-05,0.0,0.02,X29wPnb,"ts_skewness(vec_avg(mws38_negative_score),120)",PASS,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
3,-508710,20000000,1441,1343,0.0609,-0.0052,0.1415,-0.00017,-0.04,-0.22,7aLq012,"ts_skewness(vec_avg(mws38_headlinetones_time),...",PASS,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
4,-793456,20000000,1484,1299,0.0602,-0.0081,0.1587,-0.000268,-0.09,-0.35,GMW0YLQ,ts_skewness(vec_avg(mws38_headlinetones_tc_tim...,PASS,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
5,-1224111,20000000,1458,1325,0.0503,-0.0124,0.2607,-0.000495,-0.09,-0.29,zP3wQoK,ts_skewness(vec_avg(mws38_headlinetones_sg_ent...,PASS,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS
6,-793456,20000000,1484,1299,0.0602,-0.0081,0.1587,-0.000268,-0.09,-0.35,Eg60YaJ,ts_skewness(vec_avg(mws38_headlinetones_sg_tim...,PASS,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
7,-1224111,20000000,1458,1325,0.0503,-0.0124,0.2607,-0.000495,-0.09,-0.29,rJR6EzJ,ts_skewness(vec_avg(mws38_headlinetones_tc_ent...,PASS,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS
8,-1916107,20000000,1421,1362,0.0499,-0.0195,0.3051,-0.000782,-0.18,-0.46,Eg60bZm,"ts_skewness(vec_avg(mws38_entitlement),120)",PASS,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS
9,-1916107,20000000,1421,1362,0.0499,-0.0195,0.3051,-0.000782,-0.18,-0.46,LgMb6Pm,ts_skewness(vec_avg(mws38_headlinetones_entitl...,PASS,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS


### Visualise pnl of an alpha

In [23]:
alpha_pnl = ace.get_alpha_pnl(s, result_st1['alpha_id'][0])
px.line(x = alpha_pnl.index, y = alpha_pnl.Pnl, title=f'<b>alpha_id={hf.make_clickable_alpha_id(alpha_pnl.alpha_id[0])}</b>')\
    .update_layout(xaxis_title="Date", yaxis_title="Pnl", title_x=0.5)

NameError: name 'result_st1' is not defined

### Select prospect alphas, that are worth improving (in your opinion)
In this example we are selecting alphas with high fitness, to resimulate it with improved logic

In [24]:
prospect_alphas = result_st1.loc[lambda x: x.fitness>0.2]['expression'].values
prospect_alphas

NameError: name 'result_st1' is not defined

### Change the expression - what would you do to improve alpha's results?
<br>Use your own logic to improve the alphas.</br>
<br>Here we will apply **group_rank** to overcome poor weight distribution.</br>

In [17]:

new_expression_list = ['group_rank(' + a + ',sector)' for a in prospect_alphas]
new_expression_list

['group_rank(ts_skewness(vec_avg(mws38_action),120),sector)',
 'group_rank(ts_skewness(vec_avg(mws38_negative_freq),120),sector)']

In [18]:
#generating new simulation data for new expressions

new_alpha_list = [ace.generate_alpha(x, region= "USA", universe = "TOP3000",) for x in new_expression_list]

### Re-simulation

In [19]:
new_result = ace.simulate_alpha_list_multi(s, new_alpha_list)



100%|██████████| 2/2 [01:28<00:00, 44.17s/it]


In [20]:
result_st2 = hf.prettify_result(new_result, clickable_alpha_id=False)
result_st2

Unnamed: 0,pnl,book_size,long_count,short_count,turnover,returns,drawdown,margin,fitness,sharpe,alpha_id,expression,concentrated_weight,high_turnover,is_ladder_sharpe,low_fitness,low_sharpe,low_sub_universe_sharpe,low_turnover,matches_competition
0,2302572,20000000,1214,1214,0.0512,0.0234,0.0727,0.000915,0.28,0.65,dEW8m8X,"group_rank(ts_skewness(vec_avg(mws38_action),1...",FAIL,PASS,FAIL,FAIL,FAIL,PASS,PASS,PASS
1,56156,20000000,1348,1362,0.0566,0.0006,0.1057,2e-05,0.0,0.02,ve1n9zr,group_rank(ts_skewness(vec_avg(mws38_negative_...,PASS,PASS,FAIL,FAIL,FAIL,FAIL,PASS,PASS


#### Compare alphas stats before and after changes:

In [21]:
## We will join before and after dataframes by field

result_st1['field'] = result_st1['expression'].apply(lambda st: st[st.find("(vec_avg(")+1:st.find(",")]).copy()
result_st2['field'] = result_st2['expression'].apply(lambda st: st[st.find("(vec_avg(")+1:st.find(",")]).copy()

In [22]:
#merging results before and after improvement

compare_results = pd.merge(result_st1, result_st2, on='field', suffixes=('_before', '_after'))

In [23]:
#selecting column list returned by merging the two alpha results

col_list = ['fitness_before', 'fitness_after','sharpe_before', 'sharpe_after', 'alpha_id_before', 'alpha_id_after',
                 'expression_before', 'expression_after']

compare_results[col_list]\
    .style.format({'alpha_id_before': hf.make_clickable_alpha_id, 'alpha_id_after': hf.make_clickable_alpha_id})

Unnamed: 0,fitness_before,fitness_after,sharpe_before,sharpe_after,alpha_id_before,alpha_id_after,expression_before,expression_after
0,0.25,0.28,0.57,0.65,zP3wjv8,dEW8m8X,"ts_skewness(vec_avg(mws38_action),120)","group_rank(ts_skewness(vec_avg(mws38_action),120),sector)"
1,0.24,0.0,0.31,0.02,o3EwAlb,ve1n9zr,"ts_skewness(vec_avg(mws38_negative_freq),120)","group_rank(ts_skewness(vec_avg(mws38_negative_freq),120),sector)"


### Check merged alpha performance

run the following code to do a before and after comparison of your merged pool post alpha submission

In [24]:
performance_comparison = ace.performance_comparison(s, result_st2['alpha_id'][0])
performance_comparison['score']

{'before': 0, 'after': 899.0}

### How to submit?

Create a list of submittable alphas - alphas that have no FAIL in is_tests

In [25]:
#to take a look at the combined result of all new alphas

is_tests_df = hf.concat_is_tests(new_result)
is_tests_df.head()

Unnamed: 0,name,result,limit,value,matched,unmatched,year,startDate,endDate,alpha_id
0,LOW_SHARPE,FAIL,1.58,0.65,,,,,,dEW8m8X
1,LOW_FITNESS,FAIL,1.0,0.28,,,,,,dEW8m8X
2,LOW_TURNOVER,PASS,0.01,0.0512,,,,,,dEW8m8X
3,HIGH_TURNOVER,PASS,0.7,0.0512,,,,,,dEW8m8X
4,CONCENTRATED_WEIGHT,FAIL,,,,,,,,dEW8m8X


In [26]:
#making a list of failed alphas
failed_alphas = is_tests_df.query('result=="FAIL"')['alpha_id'].unique()

#making a list of passed alphas
passed_alphas = list(set(is_tests_df['alpha_id']).difference(failed_alphas))

print(f'Failed alphas:{failed_alphas}\nPassed alphas:{passed_alphas}')

Failed alphas:['dEW8m8X' 've1n9zr']
Passed alphas:[]


When you got a list of submittable alphas, you can call function submit_alpha()

In [27]:
#calling submit_alpha on all alphas that have passed the submission tests

submit_result = {alpha_id: submit_alpha(s, alpha_id) for alpha_id in passed_alphas}

In [28]:
#submit_result will have return values from the submit_result function

submit_result

{}

### Library Fuctions.

following are some other functions that you can use for your own analysis

**get_alpha_pnl(s, alpha_id)** - to get the pnl for an alpha

**get_alpha_yearly_stats(s, alpha_id)** - to get yearly statistics for an alpha

**get_self_corr(s, alpha_id)** - to get self correlation results for an alpha

**get_prod_corr(s, alpha_id)** - to get prod correlation results for an alpha

**get_check_submission(s, alpha_id)** - to get check submission result for an alpha

**check_self_corr_test(s, alpha_id)** - to check if alpha passes self correlation test (self_corr<0.7)

**check_prod_corr_test(s, alpha_id)** - to check if alpha passes prod correlation test (prod_corr<0.7)

**perfomance_comparison(s, alpha_id)** - to get the result of performance comparison for an alpha merged performance