You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.
A few advantages of this way:
facilitate market data study
fewer botched run clean-ups
pushes for / could benefit from a db-based storage
if done right, maybe could help with stockx perimeterx situation?
helps with combining everything into a single (more or less 'idempotent') decision making unit
Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.
This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.
The text was updated successfully, but these errors were encountered:
zhehaowang
changed the title
feed: more reasonable structure for rescraping?
feed: more reasonable structure for rescraping: static-map sampling
Jul 20, 2019
feed v2 was an overhaul of feed in favor of this approach: build a static mapping (cross reference from multiple sources) rarely.
For each item in the static mapping, retrieve price snapshot and transaction history daily from all sources.
Strategy reads the daily scraped data and makes recommendations.
one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.
A few advantages of this way:
Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.
This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.
The text was updated successfully, but these errors were encountered: