Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should loop through all configured suites when one isn't specified #2

Closed
neilmayhew opened this issue Apr 22, 2016 · 6 comments
Closed

Comments

@neilmayhew
Copy link
Contributor

The --help option says:

Usage:
  appstream-generator <subcommand> [OPTION...] - AppStream Generator.

AppStream Metadata Generator

Subcommands:
  process [SUITE]      - Process new metadata for the given distribution suite.
  cleanup              - Cleanup old metadata and media files.
  remove-found [SUITE] - Drop all valid processed metadata and hints.

Help Options:
  -h, --help       Show help options

Application Options:
  --version        Show the program version
  --verbose        Show extra debugging information
  -w|--workspace   Define the workspace location

So [SUITE] is shown as being optional for process and remove-found. However, if SUITE is omitted, process and remove-found say:

Invalid number of parameters: You need to specify a suite name.
@ximion
Copy link
Owner

ximion commented Apr 22, 2016

The SUITE argument really isn't optional for those...
Do you want/need a process-all-like command?

@neilmayhew
Copy link
Contributor Author

Yes, please, that would be nice. It seems a bit unnecessary to have to write a bash for-loop to process the entire repo, when asgen already knows about all the suites via the config file.

However, I guess this is a bit of a wishlist item, since in production the generation run will be triggered by individual suite updates (eg when a package is uploaded) so the suite will always be specified.

Thanks for fixing the help message. I guess I saw that as the main problem.

@ximion
Copy link
Owner

ximion commented Apr 22, 2016

Yes, please, that would be nice. It seems a bit unnecessary to have to write a bash for-loop to process the entire repo, when asgen already knows about all the suites via the config file.

Thing is that you don't normally want to process everything, but only a subset of suites (the ones that are receiving new packages, not the frozen ones), so a process-all-like feature is a fringe case. We could add it anyway, there is also no real drawback of having it.

However, I guess this is a bit of a wishlist item, since in production the generation run will be triggered by individual suite updates (eg when a package is uploaded) so the suite will always be specified.

Uhh, I hope the generator is fast enough for that, since this will require loading the full package index for all architectures - at worst - once per package, which is never fast. Also, creating a new instance of the IconHandler class is also an expensive operation, which is why it makes sense to batch-process packages.
Some data for the icon-handler could probably be cached to make it faster, but there is no way around the load-all-indices phase.

Thanks for fixing the help message. I guess I saw that as the main problem.
:-)

@ximion ximion closed this as completed Apr 22, 2016
@neilmayhew
Copy link
Contributor Author

… this will require loading the full package index for all architectures - at worst - once per package, which is never fast. Also, creating a new instance of the IconHandler class is also an expensive operation, which is why it makes sense to batch-process packages.

So do you imagine this being run by cron job, eg once a day or every few hours? I don't suppose the AppStream data will change all that often, but it seems a pity for there potentially to be a significant delay in bringing the metadata up to date.

Some data for the icon-handler could probably be cached to make it faster, but there is no way around the load-all-indices phase.

How would it be if you only looked at newly-added packages? You already have data about the existing packages cached in the database, so I don't see why they all have to be rescanned. The job that processes incoming packages could tell asgen which packages to consider by passing it the .changes files to look at. That would avoid the expensive loading of indices. I don't know how the Debian and Ubuntu infrastructures work, but this would be a good option in my infrastructure where we have a cron job scanning the incoming directories for changes files.

It would still be necessary to regenerate the whole of the Components and icons files, but this would be done from cached data in the database and would be considerably less expensive than reading the indices since there are far fewer apps than packages. Alternatively, it could be done by a cron job, but that could be run more frequently since that part of the operation would be less costly.

So I'm proposing that the current process operation be split into two separate operations, scan and generate. The scan operation would take a list of .changes or .deb files to look at. There could still be a process operation to scan an entire suite, but it would use these two other operations internally.

@ximion
Copy link
Owner

ximion commented Apr 28, 2016

So do you imagine this being run by cron job, eg once a day or every few hours?

At time, yes, I expect it to be run about every 2-4h - depends on the speed at which stuff is added to the repo to set useful values there. Also depends on how many system resources the generator has to process stuff ^^

The job that processes incoming packages could tell asgen which packages to consider by passing it the .changes files to look at.

Jap, something like that would be useful. Last night I wished I had a feature like that to process just one file ^^
If whatever data is passed to the generator and describing new packages can be read quickly, then this will be incredibly fast.
Debian can't easily use it, because the archive software and the appstream-generator are not coupled (unless lots of work is done to read mails sent by dak, or have dak emit fedmsg messages (and make Debian use fedmsg in the first place ^^)).

In general though, this is a useful feature, but it will also require quite some refactoring and would need a new interface to be implemented by backends in order to support this.
Can you maybe open another bug report for this, as a feature request?

@neilmayhew
Copy link
Contributor Author

Done, see #6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants