Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it keep track of what it has already downloaded? #3

Closed
Romanmir opened this issue Nov 8, 2020 · 3 comments
Closed

Can it keep track of what it has already downloaded? #3

Romanmir opened this issue Nov 8, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@Romanmir
Copy link

Romanmir commented Nov 8, 2020

This should be able to track what it last downloaded and then download only everything after that.

In this way, it could be "cron"-able and used as some sort of a regular backup of your data from reddit.

@uxdxdev uxdxdev added the enhancement New feature or request label Nov 8, 2020
@uxdxdev
Copy link
Owner

uxdxdev commented Nov 8, 2020

It seems the reddit api uses before and after options that take a coded Id in listings, e.g. for comments the Id of the comment is prefixed with t1, such as t1_e3ykxbc.

To do this I'd probably need to save a metadata file locally that stores the Ids of the latest data, then subsequent runs of the tool would first read from the metadata file if it exists and use the Ids to fetch the latest content.

Also would need a new CLI option to enable this feature, options might be:

  • -p, --persist
  • -c, --continue
  • -c, --cron
  • -c, --config <filename>
  • -l, --latest

Any ideas?

@Romanmir
Copy link
Author

Romanmir commented Nov 14, 2020

I like either "cron" or "latest". You could also use "-r, --resume".

@uxdxdev
Copy link
Owner

uxdxdev commented Nov 23, 2020

I went with an --only-latest flag for this feature, its available now.

Orca now generates an orca.config.js file in the CWD.

  • when data is downloaded the latest entry for each data type is recorded, and the data is written to the file system
  • if using the --only-latest flag the last entry is used when fetching data from the Reddit API, the latest data will overwrite the files previously written to the file system with only the latest data.
  • if not using the --only-latest all data is downloaded and the data files are overwritten on the file system

So now just run the below command in a cron job to get only the latest data since the last download.

npx @mortond/orca --access-token=70162531-eWBggyup_FAKE_Usdf1cz7u-G9pM_dhrVf3g --only-latest

Make sure to copy or process the data files written to the file system before running Orca multiple times.

@uxdxdev uxdxdev closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants