Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates all suites if any interesting package exists #35

Closed
iainlane opened this issue Nov 17, 2016 · 13 comments
Closed

Updates all suites if any interesting package exists #35

iainlane opened this issue Nov 17, 2016 · 13 comments

Comments

@iainlane
Copy link
Collaborator

Because there's no suite tracking, seedContentsData causes the output for all suites to be regenerated if there are any interesting packages at all, even if those haven't changed.

This is a problem because then every client will end up redownloading the output Components and icons files with every archive update even if nothing has changed. I'm a bit worried about deploying asgen with this unfixed.

Can we think of a way to avoid this happening? Like

  1. If a suite hasn't changed since we last ran, don't update it. (How do we know? Backend specific - for Debian, store a checksum of Packages?)
  2. Or, compare the output to the previous output by parsing it back in and then check if we have the same packages or if any of those have been reprocessed this time (for forget).
  3. Or, do suite tracking and then we can only regenerate suites if any packages in them have changed.

Any thoughts/ideas?

@ximion
Copy link
Owner

ximion commented Nov 17, 2016

Eh... 1) is already implemented - backends just need to provide a method to check whether they have changes. To store information about that, the LMDB cache can be used.
See https://github.com/ximion/appstream-generator/blob/master/src/asgen/backends/interfaces.d#L134

How the backend determines the "needs update" state is up to the backend, it could even be the archive kit just telling asgen to regenerate the data at the next best oportunity.

Furthermore, the icon tarballs should build reproducibly, timestamps etc. are disabled and I followed the recommendations for reproducible tarballs with libarchive - still there seem to be differences even on zero-change builds, and I haven't yet pinned down why that happens (all input is sorted).

Suite tracking would be hard with LMDB, at least if we want to keep the fast query times.

@iainlane
Copy link
Collaborator Author

iainlane commented Nov 17, 2016

Nice, I didn't notice that one yet. It's even implemented for Debian.

But seedContentsData is already skipping if the index hasn't changed, so why isn't that working?

Huh, this has never been hit

ubuntu@juju-stg-ue-appstream-back-machine-1:/srv/appstream/logs/2016/11$ xzgrep "index has not changed" *.log.xz
ubuntu@juju-stg-ue-appstream-back-machine-1:/srv/appstream/logs/2016/11$

so maybe the bug is that this doesn't work properly somehow.

@ximion
Copy link
Owner

ximion commented Nov 17, 2016

Yes, maybe something changes the files' mtime even if there are no changes?

@ximion
Copy link
Owner

ximion commented Nov 17, 2016

Urgh, of course it does if you re-download the file every time.
So, we want some fast checksumming for this instead of mtime/ctime checks.

@iainlane
Copy link
Collaborator Author

I'll first look into making downloading set the mtime properly - this is not destroyed by HTTP so should be possible in theory.

@iainlane
Copy link
Collaborator Author

From asgen

-rw-rw-r--  1 ubuntu ubuntu 1221756 Nov 17 11:08 Packages.xz

From wget

-rw-rw-r-- 1 ubuntu ubuntu 1221756 Oct 13 13:27 Packages.xz

@ximion
Copy link
Owner

ximion commented Nov 17, 2016

I'd bet curl can do that somehow.

@iainlane
Copy link
Collaborator Author

I can get the last modified time from the headers, but dlang doesn't seem to have a function to parse that into something I can make a SysTime out of :(

@iainlane
Copy link
Collaborator Author

Can I add https://github.com/JackStouffer/date-parser as a submodule?

@iainlane
Copy link
Collaborator Author

iainlane commented Nov 17, 2016

Meh, that depends on some dynamicarray module which I don't have and can't find in Debian

@ximion
Copy link
Owner

ximion commented Nov 17, 2016

Any 3rd-party deps are fine, as long as we can easily package them (= upstream uses Automake/CMake/Meson or accepts patches for that, since using dub is a pain, or is a source-only module)

Doesn't https://dlang.org/phobos/std_datetime.html#.parseRFC822DateTime work, or is this a non-RFC822 date?

@iainlane
Copy link
Collaborator Author

Actually it is, I didn't see that one at first - just trying to fix a small tz issue now then will see if it works before PRing later

iainlane pushed a commit to iainlane/appstream-generator that referenced this issue Nov 17, 2016
…t file

This only works for HTTP, not FTP.

Should hopefully fix asgen considering suites as changed each time,
which I think happened because the Packages file got the current mtime,
so it was always considered to be newer.

Fixes ximion#35
iainlane pushed a commit to iainlane/appstream-generator that referenced this issue Nov 17, 2016
…t file

This only works for HTTP, not FTP.

Should hopefully fix asgen considering suites as changed each time,
which I think happened because the Packages file got the current mtime,
so it was always considered to be newer.

Fixes ximion#35
@iainlane
Copy link
Collaborator Author

Yeahhh, this works

17_1601.log:2016-11-17 16:01:25 - DEBUG: Skipping contents cache update for zesty/main [amd64], index has not changed.
17_1601.log:2016-11-17 16:01:25 - DEBUG: Skipping contents cache update for zesty/main [arm64], index has not changed.
17_1601.log:2016-11-17 16:01:26 - DEBUG: Skipping contents cache update for zesty/main [armhf], index has not changed.
17_1601.log:2016-11-17 16:01:26 - DEBUG: Skipping contents cache update for zesty/main [i386], index has not changed.
17_1601.log:2016-11-17 16:01:26 - DEBUG: Skipping contents cache update for zesty/main [powerpc], index has not changed.
17_1601.log:2016-11-17 16:01:26 - DEBUG: Skipping contents cache update for zesty/main [ppc64el], index has not changed.
17_1601.log:2016-11-17 16:01:27 - DEBUG: Skipping contents cache update for zesty/main [s390x], index has not changed.

iainlane pushed a commit to iainlane/appstream-generator that referenced this issue Nov 17, 2016
…t file

This only works for HTTP, not FTP.

Should hopefully fix asgen considering suites as changed each time,
which I think happened because the Packages file got the current mtime,
so it was always considered to be newer.

Fixes ximion#35
@ximion ximion closed this as completed in c287d96 Nov 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants