New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split processing into separate scan and generate operations #6
Comments
|
No ETA for this yet, this would be a rather big change, needing changes on backends and the engine - but it's a useful feature. |
|
I am looking into this now, and there is no way around index-loading, because we need to know which packages are currently in a suite. What I can (and will) add is a way to process a certain package given its filename and section and suite (maybe we'll also parse a .changes file at some point) - this does not require much cache loading, so should be really fast. At the moment, there is also a lot of internal state which decides which steps do or don't happen (e.g. we don't publish metadata if we didn't just scan a package which had new data, or the index wasn't changed) - those would have to be written to disk. So, at first I assume the split version of asgen will be much less efficient. |
See #6 This is really inefficient right now and will need to be improved.
See #6 This needs backend support now, as well as a bunch of experimental changes on how we deal with the arch:all architecture.
See #6 This needs backend support now, as well as a bunch of experimental changes on how we deal with the arch:all architecture.
|
This is implemented now, although not very efficient (and it likely never will be). For now, you can play with this feature using the |
I understand that speed and cost are an issue for generating metadata. How would it be if you only looked at newly-added packages? You already have data about the existing packages cached in the database, so they don't all need to be rescanned. The job that processes incoming packages could tell
asgenwhich packages to consider by passing it the.changesfiles to look at. That would avoid the expensive loading of indices. (This would be a good option in my infrastructure, where we could callasgenfrom the cron job that scans the incoming directories for changes.)It would still be necessary to regenerate the whole of the
Componentsandiconsfiles, but this would be done from cached data in the database and would be considerably less expensive than reading the indices since there are far fewer apps than packages. Alternatively, it could be done by a cron job, but that could be run more frequently since this part of the operation is less costly.I propose that the current
processoperation be split into two separate operations,scanandgenerate. Thescanoperation would take a list of.changesor.debfiles to look at. There would still be aprocessoperation to scan an entire suite, but it would use these two other operations internally.The text was updated successfully, but these errors were encountered: