In this file we will explore how a CachedSource class works. Specifically, we will take the example of a PanelCachedSource, which outputs a DataFrame which has both time-series and cross-sectional dimensions. Focus on how the class automatically caches data generated during previous calls to the instance of the class.

Let us start by importing all the necessay libraries. You may have to change the system path here.

In [None]:
import sys
sys.path.append("C:\\Users\\raman\\zeroth\\zeroth-meta\\")

from zpmeta.superclasses.panelcachedsource import PanelCachedSource
from pandas import DataFrame, Series, concat, MultiIndex, date_range
import numpy as np
from datetime import datetime
import logging

logging.basicConfig(level=logging.INFO)

Now let us create a subclass of PanelCachedSource that generates a dataframe of random numbers. All we have to do is to implement the "execute" method of the superclass.

In [None]:
class RandomPanelCachedSource(PanelCachedSource):
    '''Subclasses PanelCachedSource to create a dataframe of random numbers.
    Accepts a dictionary of parameters, including:
    cols: list of column names
    '''
    def __init__(self, params: dict = None):
        super(RandomPanelCachedSource, self).__init__(params)
        self.appendable = dict(xs=True, ts=True)
    
    def execute(self, call_type=None, entities=None, period=None):
        cols = MultiIndex.from_product([val for val in entities.values()], names=entities.keys())
        idx = date_range(period[0], period[1], freq=self.params['freq'])
        result = DataFrame(np.random.randn(len(idx), len(cols)), columns=cols, index=idx)
        
        return result
    

Now let us insantiate it. Notice how we can set the frequency of data generated in the params while instantiating the class. 

In [None]:
daily_df_source = RandomPanelCachedSource(dict(freq='B'))


Once instantiated, the instance of this class behaves like a function. A function that has "memory". This is a more sophisticated form of memoization.

Let us call this function object to create some initial dataframe.

In [None]:
df = daily_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)

Now let us give it some incremental columns. Notice how the class automatically recognizes the additional columns given and generates data only for that additional column and appends it to the final result.

In [None]:
df_xs_incremental = daily_df_source(entities=dict(Type=['C', 'D'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df_xs_incremental)

Now, we give it the same set of columns but additional time period. Now it generates data only for the "incremental" period.

In [None]:
df_ts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D'],ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,5)))
print(df_ts_incremental)

Now, let us give it an example where we feed it both additional columns and additional period. As we can see, it will generate data first for only the incremental columns for the existing period, and then incremental dates for all the columns. This helps minimze calculations.

In [None]:
df_xsts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,10)))
print(df_xsts_incremental)

As another example, let us ask it for data which is a subset of previously generated data - no incremental columns or dates. It should not execute for any data, it will just use the prior generate data to returnt the correct values.

In [None]:
df_xsts_subset = daily_df_source(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset)

When combined with the MultitonMeta metaclass, this becomes even more powerful, leading to significant efficiencies and resuability of data in a complex simuation. Examples of using the MultitonMeta metaclass follow.

In [None]:
period = (datetime(2019, 1, 1), datetime(2019, 12, 31))
print(*period)
entities = {'Type': ['C', 'D'], 'ID': [1, 2]}
print("RUN Nth: [%s] %s-%s" %(entities, *period))