In this file we will explore how a CachedSource class works. Specifically, we will take the example of a Su, which outputs a DataFrame which has both time-series and cross-sectional dimensions. Focus on how the class automatically caches data generated during previous calls to the instance of the class.

Let us start by importing all the necessay libraries. You may have to change the system path here.

In [2]:
from zpmeta.superclasses.source import Su
from zpmeta.metaclasses.singletons import MultitonMeta
from pandas import DataFrame, Series, concat, MultiIndex, date_range, IndexSlice
import numpy as np
from datetime import datetime
import logging

logging.basicConfig(level=logging.INFO)

Now let us create a subclass of Su that generates a dataframe of random numbers. All we have to do is to implement the "execute" method of the superclass.

In [4]:
class RandomSu(Su, metaclass=MultitonMeta):
    '''Subclasses Su to create a dataframe of random numbers.
    Accepts a dictionary of parameters, including:
    cols: list of column names
    '''
    def __init__(self, params: dict = None):
        super(RandomSu, self).__init__(params)
        self.appendable = dict(xs=True, ts=True)
    
    def _execute(self, call_type=None, entities=None, period=None):
        cols = MultiIndex.from_product([val for val in entities.values()], names=entities.keys())
        idx = date_range(period[0], period[1], freq=self.params['freq'])
        result = DataFrame(np.random.randn(len(idx), len(cols)), columns=cols, index=idx)
        
        return result
    

Now let us insantiate it. Notice how we can set the frequency of data generated in the params while instantiating the class. 

In [5]:
daily_df_source = RandomSu(dict(freq='B'))

INFO:root:args: ({'freq': 'B'},) ; kwds: {}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"freq": "B"}')
INFO:root:Multiton No Instance of <class '__main__.RandomSu'> {"freq": "B"}
INFO:root:Multiton Registering Instance of <class '__main__.RandomSu'> {"freq": "B"}


Once instantiated, the instance of this class behaves like a function. A function that has "memory". This is a more sophisticated form of memoization.

Let us call this function object to create some initial dataframe.

In [6]:
df = daily_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:EXEC INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -0.934348  1.348178 -1.008590 -1.009998  0.407684  0.569892
2019-01-15 -1.093173 -2.068498  0.902066 -0.057532  1.086023  1.346712
2019-01-16 -1.254036  0.152006  0.167813 -1.148591  0.282965 -0.556388
2019-01-17 -0.219429  0.431363  0.059553 -0.415651  1.046993 -0.282705
2019-01-18 -0.120632 -0.155869  0.141650  0.238249  0.609768 -1.437770
2019-01-21  1.288646 -0.151641 -1.356019 -1.797361  0.406475 -1.547839
2019-01-22 -0.364274  0.028487 -0.622089 -1.641916  0.169292 -1.176435
2019-01-23 -0.162016  1.088112  1.389108 -1.173184 -0.291912  1.619836
2019-01-24  0.079664 -0.061402 -1.677258 -1.095664  1.346008 -1.207647
2019-01-25  0.482326  0.086248  0.812211  0.265602  0.030575 -1.425877
2019-01-28 -1.219299  1.839446  0.157506 -0.617554 -0.388732  1.980366
2019-01-29 -0.964681 -1.782021 -1.168522 -1.819608 -0.056750  1.283246
2019-0

Now let us give it some incremental columns. Notice how the class automatically recognizes the additional columns given and generates data only for that additional column and appends it to the final result.

In [5]:
df_xs_incremental = daily_df_source(entities=dict(Type=['C', 'D'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df_xs_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['C', 'D'], 'ID': [1, 2]} 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['D'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['B', 'A', 'D', 'C'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['D'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14  1.844280  0.020412  1.246756  1.154022  0.734224  0.398374   
2019-01-15 -0.114657 -0.637401 -0.849754 -0.945679  0.153935 -0.035244   
2019-01-16  0.655751  0.201677 -0.830324 -0.337273  0.793627  1.544500   
2019-01-17  0.020507 -0.791126 -2.276656 -0.839476  0.066704 -1.295300   
2019-01-18 -1.641192 -1.924791  0.368263 -0.100437  1.495483  0.126473   
2019-01-21 -0.569048 -1.294371 -0.860440  2.993533  0.947108  1.632600   
2019-01-22 -0.038089  1.513185 -0.966126  0.839116  0.114526  1.954067   
2019-01-23  0.979712 -0.934915 -1.301877 -0.384024 -0.338961 -0.865196   
2019-01-24  0.780730 -0.231587  1.679492 -0.675422 -0.951835  1.861684   
2019-01-25  1.298808 -0.207203  1.145511  1.011837  0.501611  0.685172   
2019-01-28  0.392768  0.316277 -0.845618 -1.210021 -0.932847  0.594189   
2019-01-29 -0.917573 -0.614059  1.1177

Now, we give it the same set of columns but additional time period. Now it generates data only for the "incremental" period.

In [6]:
df_ts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D'],ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,5)))
print(df_ts_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'D'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-05 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['B', 'D', 'A', 'C'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-01-31 00:00:00 - 2019-02-05 00:00:00
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-05 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['B', 'A', 'D', 'C'], 'ID': [1, 2]}] 2019-01-31 00:00:00 - 2019-02-05 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14  1.844280  0.020412  1.246756  1.154022  0.734224  0.398374   
2019-01-15 -0.114657 -0.637401 -0.849754 -0.945679  0.153935 -0.035244   
2019-01-16  0.655751  0.201677 -0.830324 -0.337273  0.793627  1.544500   
2019-01-17  0.020507 -0.791126 -2.276656 -0.839476  0.066704 -1.295300   
2019-01-18 -1.641192 -1.924791  0.368263 -0.100437  1.495483  0.126473   
2019-01-21 -0.569048 -1.294371 -0.860440  2.993533  0.947108  1.632600   
2019-01-22 -0.038089  1.513185 -0.966126  0.839116  0.114526  1.954067   
2019-01-23  0.979712 -0.934915 -1.301877 -0.384024 -0.338961 -0.865196   
2019-01-24  0.780730 -0.231587  1.679492 -0.675422 -0.951835  1.861684   
2019-01-25  1.298808 -0.207203  1.145511  1.011837  0.501611  0.685172   
2019-01-28  0.392768  0.316277 -0.845618 -1.210021 -0.932847  0.594189   
2019-01-29 -0.917573 -0.614059  1.1177

Now, let us give it an example where we feed it both additional columns and additional period. As we can see, it will generate data first for only the incremental columns for the existing period, and then incremental dates for all the columns. This helps minimze calculations.

In [7]:
df_xsts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,10)))
print(df_xsts_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'D', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['E'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['B', 'D', 'E', 'A', 'C'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['E'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-02-05 00:00:00
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['B', 'D', 'E', 'A', 'C'], 'ID': [1, 2]}] 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14  1.844280  0.020412  1.246756  1.154022  0.734224  0.398374   
2019-01-15 -0.114657 -0.637401 -0.849754 -0.945679  0.153935 -0.035244   
2019-01-16  0.655751  0.201677 -0.830324 -0.337273  0.793627  1.544500   
2019-01-17  0.020507 -0.791126 -2.276656 -0.839476  0.066704 -1.295300   
2019-01-18 -1.641192 -1.924791  0.368263 -0.100437  1.495483  0.126473   
2019-01-21 -0.569048 -1.294371 -0.860440  2.993533  0.947108  1.632600   
2019-01-22 -0.038089  1.513185 -0.966126  0.839116  0.114526  1.954067   
2019-01-23  0.979712 -0.934915 -1.301877 -0.384024 -0.338961 -0.865196   
2019-01-24  0.780730 -0.231587  1.679492 -0.675422 -0.951835  1.861684   
2019-01-25  1.298808 -0.207203  1.145511  1.011837  0.501611  0.685172   
2019-01-28  0.392768  0.316277 -0.845618 -1.210021 -0.932847  0.594189   
2019-01-29 -0.917573 -0.614059  1.1177

As another example, let us ask it for data which is a subset of previously generated data - no incremental columns or dates. It should not execute for any data, it will just use the prior generate data to returnt the correct values.

In [8]:
df_xsts_subset = daily_df_source(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['B', 'D', 'E', 'A', 'C'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14  1.844280  0.020412  1.246756  1.154022  0.734224  0.398374   
2019-01-15 -0.114657 -0.637401 -0.849754 -0.945679  0.153935 -0.035244   
2019-01-16  0.655751  0.201677 -0.830324 -0.337273  0.793627  1.544500   
2019-01-17  0.020507 -0.791126 -2.276656 -0.839476  0.066704 -1.295300   
2019-01-18 -1.641192 -1.924791  0.368263 -0.100437  1.495483  0.126473   
2019-01-21 -0.569048 -1.294371 -0.860440  2.993533  0.947108  1.632600   
2019-01-22 -0.038089  1.513185 -0.966126  0.839116  0.114526  1.954067   
2019-01-23  0.979712 -0.934915 -1.301877 -0.384024 -0.338961 -0.865196   
2019-01-24  0.780730 -0.231587  1.679492 -0.675422 -0.951835  1.861684   
2019-01-25  1.298808 -0.207203  1.145511  1.011837  0.501611  0.685172   
2019-01-28  0.392768  0.316277 -0.845618 -1.210021 -0.932847  0.594189   
2019-01-29 -0.917573 -0.614059  1.1177

When combined with the MultitonMeta metaclass, this becomes even more powerful, leading to significant efficiencies and resuability of data in a complex simuation. Examples of using the MultitonMeta metaclass follow.

Let us first try to instantiate another object RandomSu with the same params. As can be seen here, it found the prior instance in the registry and returns us the same instance. 

In [9]:
daily_df_source_new = RandomSu(params=dict(freq='B'))

INFO:root:args: () ; kwds: {'params': {'freq': 'B'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"freq": "B"}')
INFO:root:Multiton Found Instance of <class '__main__.RandomSu'> {"freq": "B"}


This prior instance already has the data in its cache, let us check for that.

In [30]:
df_xsts_subset_2 = daily_df_source_new(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset_2)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['B', 'D', 'E', 'A', 'C'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14  1.844280  0.020412  1.246756  1.154022  0.734224  0.398374   
2019-01-15 -0.114657 -0.637401 -0.849754 -0.945679  0.153935 -0.035244   
2019-01-16  0.655751  0.201677 -0.830324 -0.337273  0.793627  1.544500   
2019-01-17  0.020507 -0.791126 -2.276656 -0.839476  0.066704 -1.295300   
2019-01-18 -1.641192 -1.924791  0.368263 -0.100437  1.495483  0.126473   
2019-01-21 -0.569048 -1.294371 -0.860440  2.993533  0.947108  1.632600   
2019-01-22 -0.038089  1.513185 -0.966126  0.839116  0.114526  1.954067   
2019-01-23  0.979712 -0.934915 -1.301877 -0.384024 -0.338961 -0.865196   
2019-01-24  0.780730 -0.231587  1.679492 -0.675422 -0.951835  1.861684   
2019-01-25  1.298808 -0.207203  1.145511  1.011837  0.501611  0.685172   
2019-01-28  0.392768  0.316277 -0.845618 -1.210021 -0.932847  0.594189   
2019-01-29 -0.917573 -0.614059  1.1177

TypeError: eval() got an unexpected keyword argument 'axis'

No additional execution was necessary.

Now let us create an instance of RandomSu but with a different set of params for annual data generation. It will not find the class in the registry and will create a new one.

In [11]:
annual_df_source = RandomSu(params=dict(freq='A'))

INFO:root:args: () ; kwds: {'params': {'freq': 'A'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"freq": "A"}')
INFO:root:Multiton No Instance of <class '__main__.RandomSu'> {"freq": "A"}
INFO:root:Multiton Registering Instance of <class '__main__.RandomSu'> {"freq": "A"}


Now, wherever in the code an annual RandomSu is instantiated, it will access the same instance which also has the data for all the prior calls saved in it.