New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate test data with git-lfs into the main source repo #1978

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@bilke
Copy link
Member

bilke commented Nov 2, 2017

Follow-up on and closes #1972.

Usage:

  • Use git lfs clone https://github.com/ufz/ogs to speed up the download of files (takes around 10s for the test data files on a fast internet connection, equals to around 175 MB at the moment)
    • To speed this up even more you can pass -I and -X parameters to git lfs clone for cloning just specific paths, see the man page
  • Use git lfs fetch or git lfs pull for updating, git lfs push for pushing
  • From time to time you may clean-up with git lfs prune

Benefits

  • Much faster for downloading
  • No hassle with separate PRs for test data
  • No more checkout failures because of the current shallow clone behaviour which does not work as we expected
  • One system for different types of additional files (test data, web content, example content)
  • If you are lazy like me and always forget to type the right thing: alias gcl='git lfs clone' or git config --global alias.clone 'clone lfs'

Attention

  • You may run into API rate limits of GitHub when cloning a repo with the regular clone command -> use git lfs clone
  • On upgrading your branches you should not have changed files in Tests/Data -> please clean this directory before updating, merging, rebasing
  • There is a storage limit on GitHub on lfs files (I think it is 1 GB per repo), we will hit that at some point -> we can migrate our lfs storage somewhere else, or move entirely to our own Gitlab

How to migrate from submodule to lfs

We will keep the original submodule in Tests/Data for a while. The migrated files can be found in Tests/lfs-data. In order to update an existing branch (with modifications to the test data files) follow these steps:

  • git fetch origin && git rebase origin/master
  • rsync -a Tests/Data/* Tests/lfs-data and then checking with git status if the modified files are the intended ones before committing. Thereby ignore the following :
    • Tests/lfs-data/FileIO/swmm_input_example.inp (change should be ignored / discarded)
    • Tests/lfs-data/Parabolic/ComponentTransport/elder/elder.prj (change should be ignored / discarded)
    • Tests/lfs-data/Jenkinsfile (can be deleted)
    • Tests/lfs-data/README.md (can be deleted)
  • From now on use Tests/lfs-data only!

After a transition period Tests/Data submodule will be removed and Tests/lfs-data will be renamed to Tests/Data.

TODO (after merge)

  • Deprecate ogs-data repo (leave it as it but put a large warning into the README)
  • Check for wrongly committed files in Jenkins, see SO question

@bilke bilke added the please review label Nov 2, 2017

@bilke bilke requested a review from endJunction Nov 2, 2017

@bilke bilke force-pushed the bilke:lfs branch 2 times, most recently from 92a0079 to 5ae6669 Nov 2, 2017

bilke added some commits Nov 2, 2017

[Git] Removed submodule Tests/Data.
To fully remove all traces left in an existing repo run:
rm -rf .git/modules/Tests/Data
git config -f .git/config --remove-section submodule.Tests/Data

@bilke bilke force-pushed the bilke:lfs branch 7 times, most recently from 84fb09b to 6bf9c76 Nov 2, 2017

@bilke bilke force-pushed the bilke:lfs branch from aefde7c to 79e5966 Nov 2, 2017

@endJunction

This comment has been minimized.

Copy link
Member

endJunction commented Nov 2, 2017

@bilke What shall we do to migrate our branches to git lfs?

Update: It kind of worked:

> git lfs uninstall
> rm -rf Tests/Data
> git checkout lb/lfs
> git lfs install

But then rebasing any other branch fails, because it tries to checkout the submodule Tests/Data and everything possible go wrong....


Old comment, skip it.
I tried to checkout this branch but failed:
on ufz/master branch:

> rm -rf Tests/Data
> git checkout lb/lfs
Downloading Tests/Data/Elliptic/cube_1x1x1_GroundWaterFlow/cube_1e3_bottom_neumann.vtu (23 KB)
Error downloading object: Tests/Data/Elliptic/cube_1x1x1_GroundWaterFlow/cube_1e3_bottom_neumann.vtu (4cc553a): Smudge error: Error downloading Tests/Data/Elliptic/cube_1x1x1_GroundWaterFlow/cube_1e3_bottom_neumann.vtu (4cc553a84b7b0f26a0e7631b51c25af1e8b4baebe1173ee889ee3438a6e7f371): batch response: Post /objects/batch: unsupported protocol scheme ""

Errors logged to /home/naumov/w/ogs/s/.git/lfs/objects/logs/20171102T161814.811604628.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: Tests/Data/Elliptic/cube_1x1x1_GroundWaterFlow/cube_1e3_bottom_neumann.vtu: smudge filter lfs failed
@endJunction

This comment has been minimized.

Copy link
Member

endJunction commented Nov 3, 2017

@bilke I started collecting ideas for the migration. Until then this is WIP.

@endJunction endJunction added WIP 🏗 and removed please review labels Nov 3, 2017

@bilke

This comment has been minimized.

Copy link
Member

bilke commented Nov 3, 2017

@endJunction I prepared a branch for testing the migration (I somehow was not able to rewrite this PRs branch history..). New lfs files are in Tests/lfs-data. Can you try rebasing an existing branch? Thanks!

@endJunction

This comment has been minimized.

Copy link
Member

endJunction commented Nov 6, 2017

@bilke The 'lfs2' branch looks good! I was able to rebase on top of it w/o problems. What do you suggest for the two tests/data folder synchronization?

@bilke

This comment has been minimized.

Copy link
Member

bilke commented Nov 7, 2017

@endJunction Good! For synchronisation:

Either you know which files to synchronize and simply copy them or running rsync from old to new (e.g. cd ogs-source-dir && rsync -a Tests/Data/* Tests/lfs-data) and then checking with git status if the modified files are the intended ones before committing. Thereby ignore the following :

  • Tests/lfs-data/FileIO/swmm_input_example.inp (change should be ignored / discarded)
  • Tests/lfs-data/Parabolic/ComponentTransport/elder/elder.prj (change should be ignored / discarded)
  • Tests/lfs-data/Jenkinsfile (can be deleted)
  • Tests/lfs-data/README.md (can be deleted)

Edit: Updated the description with migration steps.

@endJunction

This comment has been minimized.

Copy link
Member

endJunction commented Nov 8, 2017

If there are no objections to putting tests into git lfs instead of a submodule, this can be merged tonight and the transition begins with all open PRs.

Again, the background for this change is slow checkout of Tests/Data submodule and a not working shallow clone.

@endJunction endJunction added please review and removed WIP 🏗 labels Nov 8, 2017

@wenqing

This comment has been minimized.

Copy link
Member

wenqing commented Nov 8, 2017

@TomFischer

This comment has been minimized.

Copy link
Member

TomFischer commented Nov 8, 2017

@endJunction endJunction referenced this pull request Nov 8, 2017

Merged

Git LFS for Tests/Data #1982

0 of 2 tasks complete
@endJunction

This comment has been minimized.

Copy link
Member

endJunction commented Nov 8, 2017

Moved to #1982

@endJunction endJunction closed this Nov 8, 2017

@bilke bilke deleted the bilke:lfs branch Jun 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment