Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a script to fill the local development database with data from Tournesol's public database #42

Closed
lfaucon opened this issue Jun 13, 2021 · 5 comments

Comments

@lfaucon
Copy link
Member

lfaucon commented Jun 13, 2021

The script or django management command should be commited to the repository (probably in a scripts/ folder) and usage should be documented

@lfaucon
Copy link
Member Author

lfaucon commented Jun 13, 2021

Assigned @jstainer-kleis because I believe you already had such a script laying around. Please unassign yourself otherwise :)

@jstainer-kleis jstainer-kleis moved this from Ready and selected for dev to In progress in main-project-board Jun 14, 2021
@jstainer-kleis
Copy link
Collaborator

I have a quick and dirty set of scripts. It allows:

  • importing the public dataset that used to be distributed into the database with the new schema (from tournesol-backend repository)
  • fetching the metadata of referenced videos (for now I only insert the title in DB, but the full metadata is retrieved)
  • dumping SQL commands that could be directly used to fill the DB (without going through the previous steps)
  • exporting a tarball containing the videos metadata

I'm not sure what and how to commit though.

  • should I put the original .csv files somewhere?
  • should I push the scripts to fill the DB from these or just the SQL file?
  • should I put the tarball with the videos metadata somewhere? just the scripts to fetch / export them?
  • should I insert more metadata in the video table (i.e. duration or things like that instead of just the title)?

If the public dataset distribution format is kept as-is for the foreseeable future I guess that the scripts should be made more robust but the SQL file may be more convenient. However, in the current form it contains data crawled from Youtube and I'm not sure we're allowed to distribute that. The same question arises for the videos metadata tarball, maybe even for the script to fetch them using youtube-dl...

IANAL, before pushing anything I'll wait for an informed PoV (@mahdi ?) on this topic.

@lfaucon
Copy link
Member Author

lfaucon commented Jun 18, 2021

Edited:

NOT Blocked by #45

@jstainer-kleis
Copy link
Collaborator

Blocked by #45

I don't think it's a blocker, migration from the current schema to the future one will be easier than modifying the scripts to adapt them to the new schema.

@lfaucon
Copy link
Member Author

lfaucon commented Jun 19, 2021

Correct! I agree

@jstainer-kleis jstainer-kleis moved this from In progress to Under review in main-project-board Jun 19, 2021
@lfaucon lfaucon moved this from Under review to Done in main-project-board Jun 21, 2021
@lfaucon lfaucon closed this as completed Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants