Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate DwC code #16

Closed
peterdesmet opened this issue Nov 5, 2014 · 21 comments
Closed

Migrate DwC code #16

peterdesmet opened this issue Nov 5, 2014 · 21 comments
Assignees
Labels

Comments

@peterdesmet
Copy link
Member

Migrate Darwin Core code to GitHub

  1. Migrate SVN repo to git (locally). There should be tools for this.
  2. Push all code to GitHub
  3. Create release (as an archive)
  4. Clean up code to remove irrelevant elements (see Clean up code #19)
  5. Document this process (will be useful for other migrations, e.g. GBIF repos)
@tucotuco
Copy link
Member

tucotuco commented Nov 5, 2014

Let's do this as soon as I make the current release. I am about one day
away from this if the Executive can decide on the dcterms:license issue.

On Wed, Nov 5, 2014 at 5:28 PM, Peter Desmet notifications@github.com
wrote:

Migrate Darwin Core code to GitHub

  1. Migrate SVN repo to git (locally). There should be tools for this.
  2. Push all code to GitHub
  3. Create release (as an archive)
  4. Clean up code to remove irrelevant elements.
  5. Document this process (will be useful for other migrations, e.g.
    GBIF repos)


Reply to this email directly or view it on GitHub
#16.

@mdoering
Copy link
Contributor

mdoering commented Nov 5, 2014

In the 4th step we should get rid of the following directories in the current svn trunk as they only serve archival purposes which can be better achieved by using tags:
From https://code.google.com/p/darwincore/source/browse/trunk/

  • archive
  • all dated folders
    • 2009-02-12
    • 2009-02-20
    • 2009-04-29
    • 2009-05-25
    • 2009-07-06
    • 2009-09-23
    • 2009-10-08
    • 2009-12-07
    • 2011-10-26
    • 2013-10-22

What about downloads/old ?

@tucotuco
Copy link
Member

tucotuco commented Nov 5, 2014

We can store the contents of those folders as zipped archives such as DarwinCoreStandard-2013-10-22.zip.

The files in the downloads/old folder are those that are available on the Google Code site downloads at https://code.google.com/p/darwincore/downloads/list. We want the files as supporting documents, but we don't need that folder structure.

@peterdesmet
Copy link
Member Author

@mdoering, @tucotuco, can you create separate issues for what needs to be done with the old files (tagging, releasing, or zipping)?

The cleanest option would be to tag those in the git commit history and create releases (which are zip files).

@peterdesmet
Copy link
Member Author

For cleanup stuff, I created #19.

@peterdesmet peterdesmet mentioned this issue Nov 5, 2014
3 tasks
@tucotuco
Copy link
Member

Working on the release in the branch version/2014-11-08 https://github.com/tdwg/dwc/tree/version/2014-11-08

@tucotuco
Copy link
Member

Moved the SVN repository in #22. Not much to document in this process. Had SVN locally, cloned this repo, copied what I wanted to keep into structure I wanted in new branch in this repo, push the branch, created pull request (#22). Now on to Issue #19.

@peterdesmet
Copy link
Member Author

See my comment in #22. By copying, we loose the SVN history and cannot tag older versions of the standard.

@tucotuco
Copy link
Member

We only need to tag the zip files. They contain everything for that version.

On Wed, Nov 12, 2014 at 1:04 PM, Peter Desmet notifications@github.com
wrote:

See my comment in #22 #22. By copying,
we loose the SVN history and cannot tag older versions of the standard.


Reply to this email directly or view it on GitHub
#16 (comment).

@peterdesmet
Copy link
Member Author

Releases in GitHub works differently. You tag a certain point in your commit timeline and that becomes a release. That means that releases are serial and do not exist as parallel versions in your code. So, to do this nice and clean, we would need the complete SVN history.

We can also do it more dirty, and create several releases for the same point in the commit history, but add a different binary zip file for each.

@mdoering
Copy link
Contributor

Does the SVN have a single archive file for the standard that changes over time?
Is it this one?
https://code.google.com/p/darwincore/source/list?path=/trunk/archive/darwincore.zip&start=1680

If SVN has the versions in parallel (folders named by release date) in the trunk we should not really tag them all. I fear the cleanest would be to replay the versions in git based on the new structure we give to the repo/files?

@tucotuco
Copy link
Member

We need to have a repository of the past versions of the standard that
people can get to easily, for historical purposes, without having to go
through revision histories. To do that, I think we should keep making the
darwincore.zip files of releases and accululating those, as always. We do
not need the dated folders in which those versions are unzipped, nor do we
need to keep the whole history of SVN changes. To me, the snapshot from
2014-11-08 is our Github starting point and all other relevant history is
in the accumulated zip files in the versions directory. From here on out we
can tag releases, but I would also insist on creating a zip file of the
standard at that point to put in the versions directory with the latest on
the TDWG page for the standard (http://www.tdwg.org/standards/450/) for
downloading from there.

On Wed, Nov 12, 2014 at 3:50 PM, Markus Döring notifications@github.com
wrote:

Does the SVN have a single archive file for the standard that changes over
time?
Is it this one?

https://code.google.com/p/darwincore/source/list?path=/trunk/archive/darwincore.zip&start=1680

If SVN has the versions in parallel (folders named by release date) in the
trunk we should not really tag them all. I fear the cleanest would be to
replay the versions in git based on the new structure we give to the
repo/files?


Reply to this email directly or view it on GitHub
#16 (comment).

@peterdesmet
Copy link
Member Author

@tucotuco, I think we more or less agree on how to do it in the future. Each time the standard is at a stable release, you tag and release it through GitHub. This will automatically create a zip file (for example: https://github.com/tdwg/prior-standards/releases/tag/website-archive) that can be referenced. I have added a webhook to this repository so we can even have DOIs for those. There is no need to keep those zips as version-named files IN the repository.

The main thing to decide is how to handle previous releases. I see three options:

  1. Make sure we have the complete SVN history in this repository. We then manually tag the historical releases (based on date), so they become available through GitHub here: https://github.com/tdwg/dwc/releases. This is more work, but historical and recent releases are handled equally.
  2. We cheat and replay the version history manually, by adding and committing the different versions one by one (I think this is what @mdoering proposes). We than tag those commits and we have releases. This has the advantage that historical and recent releases are handled equally, and we don't need to import the whole SVN history. All of this can be tested on a branch.
  3. We tag the current version as 2014 11 08, which includes named version files for previous releases. We indicate in the release description that historical releases can be found in a folder. We then remove the folder and use the standard method for creating releases. The disadvantage is that historical and recent releases are not treated equally.

@mdoering
Copy link
Contributor

That summarises the options pretty well, Peter. I would be in favor of #1 or #2. But like I said I fear that we create huge accumulating archives by strictly following SVN. This is

  • due to the entire standard being redundantly included as a zip archive
  • and old versions are also part of the trunk

For that reason alone I tend to lean to version #2. Or maybe there is a solution in between by importing the complete SVN, then doing the cleanup and finally replay the proper dwc versions in rdf?

@timrobertson100
Copy link
Member

For my understanding - does #2 mean:

  1. svn co ...trunk/"release date"
  2. git commit, push, release
  3. repeat until all historical releases are done

If so, I'd vote +1 on that.

@peterdesmet
Copy link
Member Author

I also prefer option 1 or 2. @timrobertson100, if I understand correctly, option 2 is, more verbosely:

  1. Copy all files from trunk locally to somewhere else
  2. Empty trunk
  3. Populate trunk with all files from first release
  4. Commit
  5. Push
  6. Release (= tag a commit)
  7. Start over from step 2, but this time upload files from the next version, until we are at the last version.

@tucotuco, would that be fine to you as well?

@mdoering, if so, is this something you can do? Maybe on a historical-releases branch to practice. I think you can actually do steps 5 and 6 later, since tags (and I assume releases) can be added retro-actively: http://git-scm.com/book/en/v2/Git-Basics-Tagging

@peterdesmet peterdesmet reopened this Nov 12, 2014
@tucotuco
Copy link
Member

I'm not convinced yet of the utility of capturing the commit history aside
from competeness and consistency. Not a bad thing to have, but with what
effort?

If it was just to capture the diffs between the contents of the dated
folders (as opposed to the commit history), then I would recommend a slight
variation.

  1. Make a branch off of master
  2. Copy all files from the branch locally to somewhere else
  3. Empty the branch
  4. Unzip the archive for the first version darwincore-2009-02-12.zip
    into the root of the branch
  5. Commit
  6. Push
  7. Release (= tag a commit)
  8. Start over from step 3, but this time unzipping files from the next
    version (e.g.darwincore-2009-02-20.zip), until we are at the most recent
    version.

This is tractable and I wouldn't mind this.

On Wed, Nov 12, 2014 at 5:11 PM, Peter Desmet notifications@github.com
wrote:

I also prefer option 1 or 2. @timrobertson100
https://github.com/timrobertson100, if I understand correctly, option 2
is, more verbosely:

  1. Copy all files from trunk locally to somewhere else
  2. Empty trunk
  3. Populate trunk with all files from first release
    https://code.google.com/p/darwincore/source/browse/#svn%2Ftrunk%2F2009-02-12
  4. Commit
  5. Push
  6. Release (= tag a commit)
  7. Start over from step 2, but this time upload files from the next
    version, until we are at the last version.

@tucotuco https://github.com/tucotuco, would that be fine to you as
well?

@mdoering https://github.com/mdoering, if so, is this something you can
do? Maybe on a historical-releases branch to practice. I think you can
actually do steps 5 and 6 later, since tags (and I assume releases) can be
added retro-actively: http://git-scm.com/book/en/v2/Git-Basics-Tagging


Reply to this email directly or view it on GitHub
#16 (comment).

@tucotuco
Copy link
Member

Testing on new branch version/history starting with legacy pre-standard Darwin Core.

@peterdesmet
Copy link
Member Author

@tucotuco, great! I already started a pull request, so it's easy for us to follow along. Don't forget to bring back the recent files (README, CONTRIBUTING, LICENSE) after you committed the last release.

@mdoering
Copy link
Contributor

Using the zip files sounds like a great @tucotuco

@tucotuco
Copy link
Member

Finished with the migration of the standard in all of its releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants