-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data / persistence - make data insert more efficient #18
Comments
Hey @westonplatter, Been researching this challenge of storing time series data a bit myself, not sure if you've considered any of these approaches or not but I figured it couldn't hurt to chime in with some of the stuff I've come across. This is the setup I'm currently using:
See also https://github.com/ranaroussi/pystore and pikers/piker#90 |
@briancappello thanks for sharing the persistence strategy. I like the idea of pystore. I did something similar with caching data in parquet files. I think for now, I'll keep things as simple as possible in the pg db (performance is probably 4 versions ahead of me). How much effort was it for you to get the Alpaca Marketstore working for you? |
Cool, makes sense. re: Marketstore, I think they have a docker setup that should work out of the box? I run it locally, which on *nix is relatively painless (at least assuming you have your go development environment configured). Clone it, and add the following to the
and then next run lastly, It comes with some plugins for automatically fetching and storing data from various providers (both historical and realtime streaming quotes), which are all optional. I use the polygon websocket plugin for realtime-ish minutely bars, but otherwise haven't experimented with any others. On the python side, this snippet should get you started: import pymarketstore as pymkts # pip install pymarketstore
from pymarketstore.jsonrpc_client import JsonRpcClient
class Marketstore:
def __init__(self, endpoint: str = 'http://localhost:5993/rpc'):
self.client = JsonRpcClient(endpoint)
def get_df(self, symbol: str, timeframe: str):
p = pymkts.Param(symbol.upper(), timeframe, 'OHLCV')
return self.client.query(p).first().df()
def write_df(self, df: pd.DataFrame, symbol: str, timeframe: str):
return self.client.rpc.call('DataService.Write', requests=[
self._make_write_request(df, f'{symbol.upper()}/{timeframe}/OHLCV'),
])
def _make_write_request(self, df, tbk):
epoch = df.index.to_numpy(dtype='i8') / 10 ** 9
dataset = dict(length=len(df),
startindex={tbk: 0},
lengths={tbk: len(df)},
types=['i8'] + [dtype.str.replace('<', '') for dtype in df.dtypes],
names=['Epoch'] + df.columns.to_list(),
data=[bytes(memoryview(epoch.astype('i8')))] + [
bytes(memoryview(df[name].to_numpy()))
for name in df.columns],
)
return dict(dataset=dataset, is_variable_length=False) This supports pandas Supported timeframe strings can be found here, and the |
@briancappello thanks for leaving notes. Sounds like there's a learning curve to getting things setup. What are the reasons to switch to For now, my focus is to do indicator cohort analysis (eg, https://docs.google.com/spreadsheets/d/1CqzEjzP0m2XuylhQdHwKZ5qOK_YP7hcdbitEdsSzDWI/edit?usp=sharing). |
Yea, definitely not as simple as just using a single table in postgres :) Mind you, I haven't benchmarked any of this, and I am very much thinking in terms of doing more upfront architectural work on the backtesting side of things so that backtested algos are more or less directly deployable to live trading once desirable strategies are discovered - but the benefits as I understand them:
So from the perspective of indicator analysis,
Obviously this is much more involved than just doing analysis on a dataframe! But, my hope is that it would also allow for much more sophisticated strategies to be tested. But, it may honestly also be trying to solve a rather different kind of problem than what you're currently focused on :) |
I guess it would probably help to give an example heh. So sticking with moving average crossovers, is it possible to add more "factors" to the analysis? For instance, also considering the slope of the slow MA. So maybe when the 20 crosses the 50, if the 50 is still trending downwards, it may not be a good cross to take, but if the 50 is flat and/or trending upwards, then maybe it is a good cross to buy on. Or perhaps you want to consider volume, or the value of another (non-MA) indicator, as a "filter" for whether or not to take crossover signals. |
Within
data.load_and_cache
, we have this line,While this works, it's inefficient. Let's come up with a way to insert all the df rows into the DB at once while still respecting the unique index.
The text was updated successfully, but these errors were encountered: