In this file we will explore how a CachedSource class works. Specifically, we will take the example of a PanelCachedSource, which outputs a DataFrame which has both time-series and cross-sectional dimensions. Focus on how the class automatically caches data generated during previous calls to the instance of the class.

Let us start by importing all the necessay libraries. You may have to change the system path here.

In [1]:
import sys
sys.path.append("C:\\Users\\raman\\zeroth\\zeroth-meta\\")

from zpmeta.superclasses.panelcachedsource import PanelCachedSource
from pandas import DataFrame, Series, concat, MultiIndex, date_range
import numpy as np
from datetime import datetime

Now let us create a subclass of PanelCachedSource that generates a dataframe of random numbers. All we have to do is to implement the "execute" method of the superclass.

In [2]:
class RandomPanelCachedSource(PanelCachedSource):
    '''Subclasses PanelCachedSource to create a dataframe of random numbers.
    Accepts a dictionary of parameters, including:
    cols: list of column names
    '''
    def __init__(self, params: dict = None):
        super(RandomPanelCachedSource, self).__init__(params)
        self.appendable = dict(xs=True, ts=True)
    
    def execute(self, call_type=None, entities=None, period=None):
        period_idx = date_range(period[0], period[1], freq=self.params['freq'])
        result = DataFrame(np.random.randn(len(period_idx), len(entities['cols'])), columns=entities['cols'], index=period_idx)
        
        return result
    

Now let us insantiate it. Notice how we can set the frequency of data generated in the params while instantiating the class. 

In [3]:
daily_df_source = RandomPanelCachedSource(dict(freq='B'))


RUN RandomPanelCachedSource {'freq': 'B'}
RUN INITIAL:  {'cols': ['A', 'B', 'C']} (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 1, 31, 0, 0))
EXEC INITIAL: [{'cols': ['A', 'B', 'C']}] (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 1, 31, 0, 0))
DONE RandomPanelCachedSource {'freq': 'B'}
                   A         B         C
2019-01-14  0.441478  0.929580 -2.042512
2019-01-15  0.379893  0.831835 -1.101505
2019-01-16 -1.511699 -0.625444  0.470315
2019-01-17  0.748549 -3.339419 -0.169446
2019-01-18  1.263152 -0.367059  1.319189
2019-01-21 -1.045336  0.679184 -1.877273
2019-01-22 -2.147359 -0.189573 -1.299528
2019-01-23  0.458948 -0.550585 -0.526745
2019-01-24  0.412842 -0.891124 -0.232959
2019-01-25  1.305914  1.115277  1.229875
2019-01-28 -1.496769  0.688425 -2.200632
2019-01-29 -1.021893  0.011651  0.365392
2019-01-30 -0.554501 -0.880985 -0.947708
2019-01-31 -0.745128  0.890632  1.154647


Once instantiated, the instance of this class behaves like a function. A function that has "memory". This is a more sophisticated form of memoization.

Let us call this function object to create some initial dataframe.

In [None]:
df = daily_df_source(entities=dict(cols=['A','B','C']), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)

Now let us give it some incremental columns. Notice how the class automatically recognizes the additional columns given and generates data only for that additional column and appends it to the final result.

In [5]:
df_xs_incremental = daily_df_source(entities=dict(cols=['C', 'D']))
print(df_xs_incremental)

RUN RandomPanelCachedSource {'freq': 'B'}
RUN Nth:  {'cols': ['C', 'D']} None
INCREMENTAL Items:  None
TOTAL Items:  {'cols': ['C', 'D', 'B', 'A']}
DECREMENTAL Items:  {}
INCREMENTAL Period:  None
TOTAL Period:  (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 1, 31, 0, 0))
APPENDABLE XS:True TS:True
DONE RandomPanelCachedSource {'freq': 'B'}
                   A         B         C         D
2019-01-14  0.441478  0.929580 -2.042512  0.887791
2019-01-15  0.379893  0.831835 -1.101505 -0.678325
2019-01-16 -1.511699 -0.625444  0.470315  0.225756
2019-01-17  0.748549 -3.339419 -0.169446 -0.199534
2019-01-18  1.263152 -0.367059  1.319189 -0.219332
2019-01-21 -1.045336  0.679184 -1.877273 -0.914854
2019-01-22 -2.147359 -0.189573 -1.299528 -1.099639
2019-01-23  0.458948 -0.550585 -0.526745  1.170923
2019-01-24  0.412842 -0.891124 -0.232959 -0.175174
2019-01-25  1.305914  1.115277  1.229875  2.122222
2019-01-28 -1.496769  0.688425 -2.200632  0.215265
2019-01-29 -1.021893  0.01165

Now, we give it the same set of columns but additional time period. Now it generates data only for the "incremental" period.

In [7]:
df_ts_incremental = daily_df_source(entities=dict(cols=['A','B','C','D']), period=(datetime(2019,1,20), datetime(2019,2,5)))
print(df_ts_incremental)

RUN RandomPanelCachedSource {'freq': 'B'}
RUN Nth:  {'cols': ['A', 'B', 'C', 'D']} (datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
INCREMENTAL Items:  None
TOTAL Items:  {'cols': ['C', 'A', 'B', 'D']}
DECREMENTAL Items:  None
INCREMENTAL Period:  (datetime.datetime(2019, 1, 31, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
TOTAL Period:  (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
APPENDABLE XS:True TS:True
EXEC INCREMENTAL TS1: [{'cols': ['C', 'D', 'B', 'A']}] (datetime.datetime(2019, 1, 31, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
DONE RandomPanelCachedSource {'freq': 'B'}
                   A         B         C         D
2019-01-14  0.441478  0.929580 -2.042512  0.887791
2019-01-15  0.379893  0.831835 -1.101505 -0.678325
2019-01-16 -1.511699 -0.625444  0.470315  0.225756
2019-01-17  0.748549 -3.339419 -0.169446 -0.199534
2019-01-18  1.263152 -0.367059  1.319189 -0.219332
2019-01-21 -1.045336  0.679184 -1.877273 -0.914854


Now, let us give it an example where we feed it both additional columns and additional period. As we can see, it will generate data first for only the incremental columns for the existing period, and then incremental dates for all the columns. This helps minimze calculations.

In [8]:
df_xsts_incremental = daily_df_source(entities=dict(cols=['A','B','C','D','E']), period=(datetime(2019,1,20), datetime(2019,2,10)))
print(df_xsts_incremental)

RUN RandomPanelCachedSource {'freq': 'B'}
RUN Nth:  {'cols': ['A', 'B', 'C', 'D', 'E']} (datetime.datetime(2019, 1, 20, 0, 0), datetime.datetime(2019, 2, 10, 0, 0))
INCREMENTAL Items:  {'cols': ['E']}
TOTAL Items:  {'cols': ['C', 'A', 'B', 'E', 'D']}
DECREMENTAL Items:  None
INCREMENTAL Period:  (datetime.datetime(2019, 2, 5, 0, 0), datetime.datetime(2019, 2, 10, 0, 0))
TOTAL Period:  (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 2, 10, 0, 0))
APPENDABLE XS:True TS:True
EXEC INCREMENTAL XS1: [{'cols': ['E']}] (datetime.datetime(2019, 1, 12, 0, 0), datetime.datetime(2019, 2, 5, 0, 0))
EXEC INCREMENTAL TS1: [{'cols': ['C', 'A', 'B', 'E', 'D']}] (datetime.datetime(2019, 2, 5, 0, 0), datetime.datetime(2019, 2, 10, 0, 0))
DONE RandomPanelCachedSource {'freq': 'B'}
                   A         B         C         D         E
2019-01-14  0.441478  0.929580 -2.042512  0.887791 -0.196169
2019-01-15  0.379893  0.831835 -1.101505 -0.678325 -0.095286
2019-01-16 -1.511699 -0.625444

As another example, let us ask it for data which is a subset of previously generated data - no incremental columns or dates.

In [None]:
df_xsts_subset = daily_df_source(entities=dict(cols=['A','B','C','E']), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset)