# Caching with PanelSource and MultitonMeta
In this file we will explore how a PanelSource class works. Specifically, we will take the example of a Su, which outputs a DataFrame which has both time-series and cross-sectional dimensions. Focus on how the class automatically caches data generated during previous calls to the instance of the class.

Let us start by importing all the necessary libraries. You may have to change the system path here.

In [1]:
from zpmeta.sources.panelsource import PanelSource
from zpmeta.singletons.singletons import MultitonMeta
from pandas import DataFrame, Series, concat, MultiIndex, date_range, IndexSlice
import numpy as np
from datetime import datetime
import logging

logging.basicConfig(level=logging.INFO)

Now let us create a subclass of Su that generates a dataframe of random numbers. All we have to do is to implement the "execute" method of the superclass.

In [2]:
class RandomSu(PanelSource, metaclass=MultitonMeta):
    '''Subclasses Su to create a dataframe of random numbers.
    Accepts a dictionary of parameters, including:
    cols: list of column names
    '''
    _appendable = dict(xs=True, ts=True)
    
    def _execute(self, call_type=None, entities=None, period=None):
        cols = MultiIndex.from_product([val for val in entities.values()], names=entities.keys())
        idx = date_range(period[0], period[1], freq=self.params['freq'])
        result = DataFrame(np.random.randn(len(idx), len(cols)), columns=cols, index=idx)
        
        return result
    

Now let us instantiate it. Notice how we can set the frequency of data generated in the params while instantiating the class. 

In [3]:
daily_df_source = RandomSu(params=dict(freq='B'))

INFO:root:args: () ; kwds: {'params': {'freq': 'B'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"params": {"freq": "B"}}')
INFO:root:Multiton No Instance of <class '__main__.RandomSu'> {"params": {"freq": "B"}}
INFO:root:Multiton Registering Instance of <class '__main__.RandomSu'> {"params": {"freq": "B"}}


Once instantiated, the instance of this class behaves like a function. A function that has "memory". This is a more sophisticated form of memoization.

Let us call this function object to create some initial dataframe.

In [4]:
df = daily_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:EXEC INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -0.113589  1.127684 -0.425210  0.090671 -0.683757 -1.602456
2019-01-15 -0.945201  0.070418 -0.655966 -0.351708  2.257358 -1.421704
2019-01-16 -0.406979  0.357922 -1.378429 -1.163527 -1.040953  1.017062
2019-01-17  0.173099  0.976578 -1.965628 -0.732950  0.186258 -0.864349
2019-01-18 -0.513780  0.343738  0.819087 -0.684189  1.479855  1.892667
2019-01-21 -0.817289 -0.133922 -0.847299  0.566611  0.569337 -0.307188
2019-01-22 -0.303030  0.147259 -0.158122  0.955804  1.199698  0.346873
2019-01-23  1.400057 -0.480437  2.117858  1.026802 -0.811281  0.218896
2019-01-24  0.221238  0.804325  0.981598 -0.370401  0.352303  0.568744
2019-01-25  0.133674 -1.121936  0.650222 -0.981805 -0.127501 -0.906032
2019-01-28  1.165822 -0.571892 -1.292981  0.145317  0.007182 -0.194428
2019-01-29  0.513804  0.487645  0.565450  0.245766 -0.256470  1.493737
2019-0

Now let us give it some incremental columns. Notice how the class automatically recognizes the additional columns given and generates data only for that additional column and appends it to the final result.

In [5]:
df_xs_incremental = daily_df_source(entities=dict(Type=['C', 'D'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df_xs_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['C', 'D'], 'ID': [1, 2]} 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['D'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['D', 'C', 'A', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['D'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


None


Now, we give it the same set of columns but additional time period. Now it generates data only for the "incremental" period.

In [6]:
df_ts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D'],ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,5)))
print(df_ts_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN INITIAL: [{'Type': ['A', 'B', 'C', 'D'], 'ID': [1, 2]}] 2019-01-20 00:00:00 - 2019-02-05 00:00:00
INFO:root:EXEC INITIAL: [{'Type': ['A', 'B', 'C', 'D'], 'ID': [1, 2]}] 2019-01-20 00:00:00 - 2019-02-05 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C            \
ID                 1         2         1         2         1         2   
2019-01-21  0.291468  0.144811  0.219353 -0.694899  1.221999  1.128885   
2019-01-22  1.324409 -0.415895  1.324625  1.586012 -0.548481  0.043651   
2019-01-23  0.688719 -0.814662  1.241875  0.045986 -0.447418  0.506325   
2019-01-24 -0.070301  0.019660 -0.306092  1.136743 -0.036766 -0.607202   
2019-01-25  1.005094  0.230973  1.793772  0.643682 -0.042197  0.717138   
2019-01-28  1.330388  2.484320  1.624623  0.506643  0.021312  0.193877   
2019-01-29 -0.255180 -0.050188 -1.259297  0.716407  0.187235  0.126505   
2019-01-30 -0.967916 -0.351965 -0.734921 -0.377098 -1.017045  0.008172   
2019-01-31  0.113475 -1.013812 -0.004875 -1.119156  1.189673 -0.505255   
2019-02-01 -0.759758 -1.750040  1.194653  1.464802 -0.083039 -1.105587   
2019-02-04  0.310569 -0.507014 -1.119550 -1.468665  0.674651 -0.248070   
2019-02-05 -0.263937  0.175136 -0.7980

Now, let us give it an example where we feed it both additional columns and additional period. As we can see, it will generate data first for only the incremental columns for the existing period, and then incremental dates for all the columns. This helps minimze calculations.

In [7]:
df_xsts_incremental = daily_df_source(entities=dict(Type=['A','B','C','D','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,10)))
print(df_xsts_incremental)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'D', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:INCREMENTAL Items: {'Type': ['E'], 'ID': [1, 2]}
INFO:root:TOTAL Items: {'Type': ['D', 'E', 'B', 'C', 'A'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:TOTAL Period: 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL XS1: [{'Type': ['E'], 'ID': [1, 2]}] 2019-01-20 00:00:00 - 2019-02-05 00:00:00
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['D', 'E', 'B', 'C', 'A'], 'ID': [1, 2]}] 2019-02-05 00:00:00 - 2019-02-10 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               D                   E                   B            \
ID                 1         2         1         2         1         2   
2019-02-05  0.041553 -1.528800 -1.068862  0.488556 -0.070887  0.749038   
2019-02-06  0.407147  0.155818  1.868483 -0.948123  0.265276 -1.257401   
2019-02-07  0.092758  0.974562 -1.075754 -0.577542 -0.581040  0.966101   
2019-02-08 -0.856259  0.937009 -2.511874  0.226069 -0.972299 -0.323508   

Type               C                   A            
ID                 1         2         1         2  
2019-02-05  0.498114  0.227783  0.242945 -0.732580  
2019-02-06 -0.199864  0.679090  0.430896  0.419969  
2019-02-07 -0.568589  0.616249  0.168297  0.234796  
2019-02-08  1.750102 -0.715190  2.008337 -1.583508  


As another example, let us ask it for data which is a subset of previously generated data - no incremental columns or dates. It should not execute for any data, it will just use the prior generate data to returnt the correct values.

In [8]:
df_xsts_subset = daily_df_source(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['D', 'E', 'B', 'C', 'A'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomSu {'freq': 'B'}


Type               D                   E                   B            \
ID                 1         2         1         2         1         2   
2019-02-05  0.041553 -1.528800 -1.068862  0.488556 -0.070887  0.749038   
2019-02-06  0.407147  0.155818  1.868483 -0.948123  0.265276 -1.257401   
2019-02-07  0.092758  0.974562 -1.075754 -0.577542 -0.581040  0.966101   
2019-02-08 -0.856259  0.937009 -2.511874  0.226069 -0.972299 -0.323508   

Type               C                   A            
ID                 1         2         1         2  
2019-02-05  0.498114  0.227783  0.242945 -0.732580  
2019-02-06 -0.199864  0.679090  0.430896  0.419969  
2019-02-07 -0.568589  0.616249  0.168297  0.234796  
2019-02-08  1.750102 -0.715190  2.008337 -1.583508  


When combined with the MultitonMeta metaclass, this becomes even more powerful, leading to significant efficiencies and resuability of data in a complex simuation. Examples of using the MultitonMeta metaclass follow.

Let us first try to instantiate another object RandomSu with the same params. As can be seen here, it found the prior instance in the registry and returns us the same instance. 

In [9]:
daily_df_source_new = RandomSu(params=dict(freq='B'))

INFO:root:args: () ; kwds: {'params': {'freq': 'B'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"params": {"freq": "B"}}')
INFO:root:Multiton Found Instance of <class '__main__.RandomSu'> {"params": {"freq": "B"}}


This prior instance already has the data in its cache, let us check for that.

In [10]:
df_xsts_subset_2 = daily_df_source_new(entities=dict(Type=['A','B','C','E'], ID=[1,2]), period=(datetime(2019,1,20), datetime(2019,2,1)))
print(df_xsts_subset_2)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C', 'E'], 'ID': [1, 2]} 2019-01-20 00:00:00 - 2019-02-01 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['D', 'E', 'B', 'C', 'A'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: {}
INFO:root:INCREMENTAL Period: None - None
INFO:root:TOTAL Period: 2019-01-20 00:00:00 - 2019-02-10 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:DONE RandomSu {'freq': 'B'}


Type               D                   E                   B            \
ID                 1         2         1         2         1         2   
2019-02-05  0.041553 -1.528800 -1.068862  0.488556 -0.070887  0.749038   
2019-02-06  0.407147  0.155818  1.868483 -0.948123  0.265276 -1.257401   
2019-02-07  0.092758  0.974562 -1.075754 -0.577542 -0.581040  0.966101   
2019-02-08 -0.856259  0.937009 -2.511874  0.226069 -0.972299 -0.323508   

Type               C                   A            
ID                 1         2         1         2  
2019-02-05  0.498114  0.227783  0.242945 -0.732580  
2019-02-06 -0.199864  0.679090  0.430896  0.419969  
2019-02-07 -0.568589  0.616249  0.168297  0.234796  
2019-02-08  1.750102 -0.715190  2.008337 -1.583508  


No additional execution was necessary.

Now let us create an instance of RandomSu but with a different set of params for annual data generation. It will not find the class in the registry and will create a new one.

In [11]:
annual_df_source = RandomSu(params=dict(freq='A'))

INFO:root:args: () ; kwds: {'params': {'freq': 'A'}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"params": {"freq": "A"}}')
INFO:root:Multiton No Instance of <class '__main__.RandomSu'> {"params": {"freq": "A"}}
INFO:root:Multiton Registering Instance of <class '__main__.RandomSu'> {"params": {"freq": "A"}}


Now, wherever in the code an annual RandomSu is instantiated, it will access the same instance which also has the data for all the prior calls saved in it.

## Refreshing the cache
Sometimes we might e in a situation, where we want to refresh the last few rows of the cached data everytime we need make a call. Think of a live-trading or an online system for example, where the data in the database may be updated after an initial entry. For this we have to set the 'caching' parameter dict while instantiating the class. Then we can mention for how many periods the data will need to be refreshed. Let us look at an example.

In [12]:
# instantiate with proper caching parameters (here it will refresh last two periods of data
refreshed_df_source = RandomSu(params=dict(freq='B'), caching=dict(ts_anchor='cache', ts_refresh=2))

# fill initial data
df = refreshed_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df)


INFO:root:args: () ; kwds: {'params': {'freq': 'B'}, 'caching': {'ts_anchor': 'cache', 'ts_refresh': 2}}
INFO:root:Multiton checking registry for key: (<class '__main__.RandomSu'>, '{"caching": {"ts_anchor": "cache", "ts_refresh": 2}, "params": {"freq": "B"}}')
INFO:root:Multiton No Instance of <class '__main__.RandomSu'> {"caching": {"ts_anchor": "cache", "ts_refresh": 2}, "params": {"freq": "B"}}
INFO:root:Multiton Registering Instance of <class '__main__.RandomSu'> {"caching": {"ts_anchor": "cache", "ts_refresh": 2}, "params": {"freq": "B"}}
INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:EXEC INITIAL: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -1.860632  0.591817  1.065688 -0.521971 -0.365940  0.174923
2019-01-15 -0.451329  1.105575 -1.302726 -0.260153 -1.644813 -1.388995
2019-01-16  0.749883 -0.658047 -0.257758 -0.100584 -0.999681  1.019733
2019-01-17 -0.782119  0.088730 -2.116978  1.109668 -0.644203 -0.228887
2019-01-18  0.569310 -0.769615 -1.136831 -0.948201  0.332714 -0.641188
2019-01-21  0.199768  0.349603  1.412867  2.387307  0.692177 -0.784905
2019-01-22 -0.570967  0.319126  0.257885  0.592910 -0.588269  1.143655
2019-01-23  1.499787  0.707194  1.178393  0.328974  0.129723  0.163481
2019-01-24  0.009799 -0.163553 -0.837591 -0.724882 -0.264518  0.670216
2019-01-25  0.718019  0.467600  0.961503  0.287439  0.602721 -0.130074
2019-01-28  0.506791 -0.789897 -0.823635  0.232176  0.684665  0.440938
2019-01-29 -0.272146 -0.813697  0.411256 -1.371706  1.563803  1.623825
2019-0

Let us check the 'period' attribute of the instance. It will show us the period for which the data is cached adjusted for the refresh period.

In [13]:
print(refreshed_df_source.period)
print(refreshed_df_source.value)

(datetime.datetime(2019, 1, 12, 0, 0), Timestamp('2019-01-29 00:00:00', freq='B'))
Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -1.860632  0.591817  1.065688 -0.521971 -0.365940  0.174923
2019-01-15 -0.451329  1.105575 -1.302726 -0.260153 -1.644813 -1.388995
2019-01-16  0.749883 -0.658047 -0.257758 -0.100584 -0.999681  1.019733
2019-01-17 -0.782119  0.088730 -2.116978  1.109668 -0.644203 -0.228887
2019-01-18  0.569310 -0.769615 -1.136831 -0.948201  0.332714 -0.641188
2019-01-21  0.199768  0.349603  1.412867  2.387307  0.692177 -0.784905
2019-01-22 -0.570967  0.319126  0.257885  0.592910 -0.588269  1.143655
2019-01-23  1.499787  0.707194  1.178393  0.328974  0.129723  0.163481
2019-01-24  0.009799 -0.163553 -0.837591 -0.724882 -0.264518  0.670216
2019-01-25  0.718019  0.467600  0.961503  0.287439  0.602721 -0.130074
2019-01-28  0.506791 -0.789897 -0.823635  0.232176  0.684665  0.4

In [14]:
df_refreshed = refreshed_df_source(entities=dict(Type=['A','B','C'], ID=[1,2]), period=(datetime(2019,1,12), datetime(2019,1,31)))
print(df_refreshed)

INFO:root:RUN RandomSu {'freq': 'B'}
INFO:root:RUN Nth: {'Type': ['A', 'B', 'C'], 'ID': [1, 2]} 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:INCREMENTAL Items: None
INFO:root:TOTAL Items: {'Type': ['C', 'A', 'B'], 'ID': [1, 2]}
INFO:root:DECREMENTAL Items: None
INFO:root:INCREMENTAL Period: 2019-01-29 00:00:00 - 2019-01-31 00:00:00
INFO:root:TOTAL Period: 2019-01-12 00:00:00 - 2019-01-31 00:00:00
INFO:root:APPENDABLE XS:True TS:True
INFO:root:EXEC INCREMENTAL TS1: [{'Type': ['A', 'B', 'C'], 'ID': [1, 2]}] 2019-01-29 00:00:00 - 2019-01-31 00:00:00
INFO:root:DONE RandomSu {'freq': 'B'}


Type               A                   B                   C          
ID                 1         2         1         2         1         2
2019-01-14 -1.860632  0.591817  1.065688 -0.521971 -0.365940  0.174923
2019-01-15 -0.451329  1.105575 -1.302726 -0.260153 -1.644813 -1.388995
2019-01-16  0.749883 -0.658047 -0.257758 -0.100584 -0.999681  1.019733
2019-01-17 -0.782119  0.088730 -2.116978  1.109668 -0.644203 -0.228887
2019-01-18  0.569310 -0.769615 -1.136831 -0.948201  0.332714 -0.641188
2019-01-21  0.199768  0.349603  1.412867  2.387307  0.692177 -0.784905
2019-01-22 -0.570967  0.319126  0.257885  0.592910 -0.588269  1.143655
2019-01-23  1.499787  0.707194  1.178393  0.328974  0.129723  0.163481
2019-01-24  0.009799 -0.163553 -0.837591 -0.724882 -0.264518  0.670216
2019-01-25  0.718019  0.467600  0.961503  0.287439  0.602721 -0.130074
2019-01-28  0.506791 -0.789897 -0.823635  0.232176  0.684665  0.440938
2019-01-29 -0.272146 -0.813697  0.411256 -1.371706  1.563803  1.623825
2019-0