Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/repeatable #9

Closed
wants to merge 3 commits into from
Closed

Topic/repeatable #9

wants to merge 3 commits into from

Conversation

neilmayhew
Copy link
Contributor

@neilmayhew neilmayhew commented May 20, 2016

These are some changes I've made for use with my own repo and workflow, but you may find some or all of them useful.

I run the generator on all suites one per day, from a cronjob. However, most days the repo hasn't changed, since this is a small, 3rd-party repo. I then copy the new metadata into the repo. However, asgen produces different output every time, even when there have been no changes to the repo, This results in a lot of churn when the master copy of the repo is pushed to the mirror that people actually have access to, which in turn produces unnecessary emails to my inbox.

I found that the churn was occurring in three areas:

  1. All gzipped files
  2. "Last updated" timestamps in the html
  3. Non-deterministic ordering of the icons archives

Each of these commits fixes one of those.

I copy the data from workspace to repo using rsync -rlHc. That changes a destination file only if the source file has a different checksum. However, it doesn't copy across the modification time like -a does, so a destination file with the same checksum doesn't have its modification date changed, but files with a different checksum have their modification time set to the time of the copy. I can then use find to see if there are any files in the dep11 subdirectory that are newer than the Release file, and if so the suite needs to be reexported (in my case, with reprepo export SUITE) and a mirror push initiated.


This change is Reviewable

The gz format puts a timestamp in the compressed file. Therefore
repeated runs produce different gz files even though the input data
is the same. Use a timestamp of 0 to ensure the compressed data is
the same when the input data is the same.

Neither the gzip executable nor the libarchive library provide a way
to specify the timestamp explicitly, so we set the modification time
of the input file. This is going to be removed anyway, so it doesn't
matter what it's set to.
Parallelism gives rise to a non-deterministic ordering of the icons and
hints. Removing parallelism makes no appreciable difference to the
speed.
@ximion
Copy link
Owner

ximion commented May 20, 2016

Ha, nice! Fixing nondeterminism was on my TODO list right after fonts support, so this is very welcome!
The HTML timestamps will stay, though, and for the icons I am thinking of a different solution.
The HTML timestamps thing will be mitigated by skipping suite-processing in case the data hasn't changed in future.
I don't like the changes in the compression code much, I'll search for something better. If I recall correctly, I made libarchive produce reproducible files already a while ago.

@ximion
Copy link
Owner

ximion commented May 22, 2016

Oh, btw, if you want to modify templates in a way that doesn't involve modifying upstream-shipped ones: You can supply your own templates, even partial ones, via the templates directory in the same folder where your asgen-config.json file is located.

And: removing parallelism for the icon search doesn't make any difference on small repos, it has an impact on very large ones (once you have more than 1000 packages), like in Debian.

@ximion
Copy link
Owner

ximion commented May 23, 2016

Okay, the changes done in master in the past few days should catch all the use-cases those patches would have provided (but are a bit more holistic in their approach).
Timestamps are really useful and - in case of having them in the metadata itself - were even requested by users, so dropping them is bad.
The changes in master should however make them only change now if we actually updated something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants