feed: more reasonable structure for rescraping: static-map sampling #31

zhehaowang · 2019-07-20T04:40:22Z

one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.

A few advantages of this way:

facilitate market data study
fewer botched run clean-ups
pushes for / could benefit from a db-based storage
if done right, maybe could help with stockx perimeterx situation?
helps with combining everything into a single (more or less 'idempotent') decision making unit

Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.

This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.

zhehaowang · 2019-12-21T20:52:37Z

feed v2 was an overhaul of feed in favor of this approach: build a static mapping (cross reference from multiple sources) rarely.
For each item in the static mapping, retrieve price snapshot and transaction history daily from all sources.
Strategy reads the daily scraped data and makes recommendations.

zhehaowang added enhancement New feature or request feed feed from exchanges labels Jul 20, 2019

This was referenced Jul 20, 2019

du: smarter starting point #30

Closed

history / goal / roadmap: feed + strategy #34

Closed

zhehaowang changed the title ~~feed: more reasonable structure for rescraping?~~ feed: more reasonable structure for rescraping: static-map sampling Jul 20, 2019

zhehaowang mentioned this issue Jul 20, 2019

feed: set up stable cron rescrape and email notification #6

Open

zhehaowang added the strategy label Jul 20, 2019

zhehaowang closed this as completed Dec 21, 2019

zhehaowang mentioned this issue Dec 21, 2019

architecture: reasonable format and storage for future market data research and identifying / autocleaning botched runs #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feed: more reasonable structure for rescraping: static-map sampling #31

feed: more reasonable structure for rescraping: static-map sampling #31

zhehaowang commented Jul 20, 2019 •

edited

zhehaowang commented Dec 21, 2019

feed: more reasonable structure for rescraping: static-map sampling #31

feed: more reasonable structure for rescraping: static-map sampling #31

Comments

zhehaowang commented Jul 20, 2019 • edited

zhehaowang commented Dec 21, 2019

zhehaowang commented Jul 20, 2019 •

edited