In this file we will explore how a CachedSource class works. Specifically, we will take the example of a PanelCachedSource, which outputs a DataFrame which has both time-series and cross-sectional dimensions. Focus on how the class automatically caches data generated during previous calls to the instance of the class.

Let us start by importing all the necessay libraries. You may have to change the system path here.

In [1]:
import sys
sys.path.append("C:\\Users\\raman\\zeroth\\zeroth-meta\\")

from zpmeta.superclasses.panelcachedsource import PanelCachedSource
from zpmeta.metaclasses.singletons import MultitonMeta
from pandas import DataFrame, Series, concat, MultiIndex, date_range
import numpy as np
from datetime import datetime
import logging

logging.basicConfig(level=logging.INFO)

Now let us create a subclass of PanelCachedSource that generates a dataframe of random numbers. All we have to do is to implement the "execute" method of the superclass.

In [2]:
class RandomPanelCachedSource(PanelCachedSource, metaclass=MultitonMeta):
    '''Subclasses PanelCachedSource to create a dataframe of random numbers.
    Accepts a dictionary of parameters, including:
    cols: list of column names
    '''
    def __init__(self, params: dict = None):
        super(RandomPanelCachedSource, self).__init__(params)
        self.appendable = dict(xs=True, ts=True)
    
    def execute(self, call_type=None, entities=None, period=None):
        cols = MultiIndex.from_product([val for val in entities.values()], names=entities.keys())
        idx = date_range(period[0], period[1], freq=self.params['freq'])
        result = DataFrame(np.random.randn(len(idx), len(cols)), columns=cols, index=idx)
        
        return result
    

Now let us insantiate it. Notice how we can set the frequency of data generated in the params while instantiating the class. 

In [3]:
daily_df_source = RandomPanelCachedSource(dict(freq='B'))

INFO:root:args: ({'freq': 'B'},) ; kwds: {}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomPanelCachedSource'>, '{"freq": "B"}')
INFO:root:Multiton No Instance of <class '__main__.RandomPanelCachedSource'> {"freq": "B"}
INFO:root:Multiton Registering Instance of <class '__main__.RandomPanelCachedSource'> {"freq": "B"}


Once instantiated, the instance of this class behaves like a function. A function that has "memory". This is a more sophisticated form of memoization.

Let us call this function object to create some initial dataframe.

In [4]:
df = daily_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:EXEC INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.497544  0.330641
2019-01-28 -0.033412 -0.269330 -0.913541 -0.049269 -0.456778  0.386928
2019-01-29  1.399672 -0.796335  0.229888  1.513334 -0.520934 -0.791678
2019-0

Now let us give it some incremental columns. Notice how the class automatically recognizes the additional columns given and generates data only for that additional column and appends it to the final result.

In [5]:
df_xs_incremental = daily_df_source(entities=dict(Type=['C', 'D'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df_xs_incremental)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['C', 'D'], 'ID': [1, 2]} 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['D'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['C', 'B', 'D', 'A'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['D'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


(datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 1, 31, 0, 0))
{'Type': ['C', 'D'], 'ID': [1, 2]}
Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829   
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284   
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735   
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320   
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366   
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390   
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845   
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372   
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096   
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.497544  0.330641   


Now, we give it the same set of columns but additional time period. Now it generates data only for the "incremental" period.

In [6]:
df_ts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D'],ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,5)))
print(df_ts_incremental)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'D'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-05 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['C', 'D', 'A', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-01-31 00:00:00 - 2019-02-05 00:00:00
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-05 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['C', 'B', 'D', 'A'], 'ID': [1, 2]}] 2019-01-31 00:00:00 - 2019-02-05 00:00:00
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


(datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
{'Type': ['A', 'B', 'C', 'D'], 'ID': [1, 2]}
Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829   
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284   
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735   
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320   
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366   
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390   
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845   
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372   
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096   
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.497544  0.3

Now, let us give it an example where we feed it both additional columns and additional period. As we can see, it will generate data first for only the incremental columns for the existing period, and then incremental dates for all the columns. This helps minimze calculations.

In [7]:
df_xsts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,10)))
print(df_xsts_incremental)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'D', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['E'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['C', 'D', 'A', 'E', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['E'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-02-05 00:00:00
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['C', 'D', 'A', 'E', 'B'], 'ID': [1, 2]}] 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


(datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 10, 0, 0))
{'Type': ['A', 'B', 'C', 'D', 'E'], 'ID': [1, 2]}
Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829   
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284   
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735   
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320   
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366   
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390   
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845   
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372   
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096   
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.49754

As another example, let us ask it for data which is a subset of previously generated data - no incremental columns or dates. It should not execute for any data, it will just use the prior generate data to returnt the correct values.

In [8]:
df_xsts_subset = daily_df_source(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['C', 'A', 'D', 'E', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00


(datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 1, 0, 0))
{'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]}


INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829   
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284   
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735   
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320   
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366   
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390   
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845   
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372   
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096   
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.497544  0.330641   
2019-01-28 -0.033412 -0.269330 -0.913541 -0.049269 -0.456778  0.386928   
2019-01-29  1.399672 -0.796335  0.2298

When combined with the MultitonMeta metaclass, this becomes even more powerful, leading to significant efficiencies and resuability of data in a complex simuation. Examples of using the MultitonMeta metaclass follow.

Let us first try to instantiate another object RandomPanelCachedSource with the same params. As can be seen here, it found the prior instance in the registry and returns us the same instance. 

In [9]:
daily_df_source_new = RandomPanelCachedSource(params=dict(freq='B'))

INFO:root:args: () ; kwds: {'params': {'freq': 'B'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomPanelCachedSource'>, '{"freq": "B"}')
INFO:root:Multiton Found Instance of <class '__main__.RandomPanelCachedSource'> {"freq": "B"}


This prior instance already has the data in its cache, let us check for that.

In [10]:
df_xsts_subset_2 = daily_df_source_new(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset_2)

INFO:root:RUN RandomPanelCachedSource {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['C', 'A', 'D', 'E', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomPanelCachedSource {'freq': 'B'}


(datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 1, 0, 0))
{'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]}
Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-14 -0.323599 -0.021022  0.314629 -2.611771  0.065448  0.498829   
2019-01-15 -0.680330 -1.223275 -0.735135  2.948655  0.930340 -0.724284   
2019-01-16  0.258483  0.151749 -0.372498  0.416993 -1.550087  0.207735   
2019-01-17  1.041824 -0.315572 -0.403987  0.743811 -0.032790 -1.019320   
2019-01-18 -0.668419  0.349751 -1.574504 -0.661031  0.847232 -1.770366   
2019-01-21 -0.347372 -0.613787 -0.027499 -0.511402  0.873496 -0.518390   
2019-01-22  2.208016  0.287923 -0.201551 -0.132711 -0.308634  0.487845   
2019-01-23 -0.567070 -1.748627  0.202055  1.292627 -0.656767 -1.014372   
2019-01-24 -0.208159  0.247512 -0.411951 -0.961023 -0.737373 -0.320096   
2019-01-25 -1.426859 -0.392819  0.832277 -0.716505  0.497544  0.3

No additional execution was necessary.

Now let us create an instance of RandomPanelCachedSource but with a different set of params for annual data generation.

In [11]:
annual_df_source = RandomPanelCachedSource(params=dict(freq='A'))

INFO:root:args: () ; kwds: {'params': {'freq': 'A'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomPanelCachedSource'>, '{"freq": "A"}')
INFO:root:Multiton No Instance of <class '__main__.RandomPanelCachedSource'> {"freq": "A"}
INFO:root:Multiton Registering Instance of <class '__main__.RandomPanelCachedSource'> {"freq": "A"}
