Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting this repository #3352

Open
1 of 5 tasks
JimMadge opened this issue Nov 9, 2023 · 19 comments
Open
1 of 5 tasks

Splitting this repository #3352

JimMadge opened this issue Nov 9, 2023 · 19 comments
Assignees
Labels
idea-for-discussion This can be used for inviting discussion from collaborators or community in general infrastructure For all issues related to book infrastructure project governance

Comments

@JimMadge
Copy link
Member

JimMadge commented Nov 9, 2023

Summary

Currently, this repository is used for multiple purposes. For example, the following activities are committed,

  • The book
  • Meeting notes
  • Project management documents
  • Newsletters

Now that we have a GitHub org, we can create multiple repositories to split separable items into their own repositories.

This will have a number of benefits,

  • Smaller repositories (faster clones, faster git operations(?))
  • Greater clarity on purpose for each repository, what information belongs where
  • More easily able to set appropriate permissions, requirements, CI for each repository
  • Better navigability (easier to find what you are looking for)

What needs to be done?

  • Identify the best way to move existing data (@sgibson91 👀)
  • Decide how to divide the repository, what new repositories to create
    (For example, the-turing-way/book, the-turing-way/meeting-notes, the-turing-way/newsletters)
  • Communicate changes with the community
    We should be able to avoid rewriting history on the book repository. However, files will be deleted and we should make sure everyone is aware where things have been moved
  • Create new repositories
  • Split existing data into new repositories

Who can help?


Updates

@JimMadge JimMadge added project governance infrastructure For all issues related to book infrastructure labels Nov 9, 2023
@JimMadge
Copy link
Member Author

JimMadge commented Nov 9, 2023

@sgibson91
Copy link
Member

The above docs explain how we can remove a subdirectory from the main repository without rewriting history on it, but preserve the git history of the subfolder when we add it to the new repo.

@aleesteele
Copy link
Member

aleesteele commented Nov 10, 2023

This is great, all! Adding breadcrumbs to #3272 - in case this helps your work!

Also adding #3287 #2729 - as this is a test case for working groups decision-making!

@Arielle-Bennett
Copy link
Collaborator

I like this idea - wondering if anyone has any objections/downsides apart from the effort required to map the split in the repo and transfer the data? Are we worried at all that it will make it harder for people to feel empowered to contribute and if so how would we combat this? (this is me slightly catastrophising, I still think we should do this!)

@sgibson91
Copy link
Member

(Sorry, I clicked the wrong thing on my phone and turned one of the checkboxes into an issue accidentally)

@JimMadge
Copy link
Member Author

That's a good question @Arielle-Bennett.

I would hope that having clear, distinct repos would help people find what they are looking for, prevent conflicts, make each part seem less overwhelming.
However, there is a risk that siloed parts of the project could be more or less welcoming than others.
Making sure we have CoC and some governance principles at the organisational level might help avoid that, or at least make sure there is a process if there are problems.

@Arielle-Bennett
Copy link
Collaborator

Thanks @JimMadge - I agree, we might also ask working groups / the wider community to help ensure each repo has a clear purpose and explicit contribution pathways outlined so that it's clear for newer people how to contribute to each 👍🏻

@sgibson91
Copy link
Member

In addition to that, possibly guidelines on what kinds of repo are acceptable or not to be created in/moved to the org? JupyterHub have been working on an idea to create a new org where the indication is that the repos there are less developed, not strictly maintained by the core JupyterHub team, and may not receive regular patches or releases, so we have been drawing up guidance around "what characteristics should a repo in this org have?". Please feel free to read, obvs YMMV https://hackmd.io/@yuvipanda/B1el-jExp

@JimMadge JimMadge added the idea-for-discussion This can be used for inviting discussion from collaborators or community in general label Nov 17, 2023
@AlexandraAAJ
Copy link
Collaborator

Hello all, following our January monthly meeting, @da5nsy @aleesteele and I will start working on the road map to split the repo during the Collab cafe this Wednesday. Please feel free to join us.

@Arielle-Bennett
Copy link
Collaborator

I can be there for the first half @AlexandraAAJ then I have to switch to something else. :)

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 17, 2024

Following conversations at collab cafe today I did a technical dry run with the newsletter subfolder:
https://github.com/the-turing-way/newsletter

@sgibson91 - I did need to rename the old branch so that I could merge it into the new main (git branch -m staging) but otherwise the instructions worked well.

The summary:

conda create -n TTW-git-filter-repo python=3.10
conda install git-filter-repo
git clone https://github.com/the-turing-way/the-turing-way newsletter --origin source
git filter-repo --subdirectory-filter communications/newsletters --force

(make new github repo, public, and with something to initialise it - I chose a readme, which in hindsight was a poor decision since the folder already had a readme so there was a merge conflict later)

git remote add origin git@github.com:the-turing-way/newsletter.git
git branch -m staging
git checkout --track origin/main
git merge staging --allow-unrelated-histories  -m 'Splitting repo'
git push origin main --dry-run
git push origin main

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 17, 2024

If someone from @the-turing-way/infrastructure could take a look at the new repo and check that it all looks good, that would be awesome!

I think the main thing to check is that the history is preserved (LGTM).
Anything else we should be focusing on at this stage?

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 17, 2024

I think the next steps would be to look into transferring live issues/PRs over to that new repo, to see what that process (if one exists at all?) is like

@sgibson91
Copy link
Member

@da5nsy would you feel able to open a PR to add any missing steps/gotchas you learned from this experience? https://github.com/2i2c-org/infrastructure/blob/master/docs/howto/update-env.md#split-up-an-image-for-use-with-the-repo2docker-action

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 18, 2024

@da5nsy would you feel able to open a PR to add any missing steps/gotchas you learned from this experience? https://github.com/2i2c-org/infrastructure/blob/master/docs/howto/update-env.md#split-up-an-image-for-use-with-the-repo2docker-action

I think the only thing would be the branch rename, and I don't know where in the original instructions that would be best put (or in fact if it's relevant?) 🤔

Otherwise, things that tripped me up were either things that I expect everyone else knows (yes, imposter syndrome etc etc) or specific to the fact that I was modifying the use case.

  • filter repo wants a relative path not an absolute one
  • It didn't work for me with an uninitialised GitHub repo (but if someone is following the original instructions and using the repo template that's not an issue they'll encounter)
  • Don't initialise the github repo with a readme if you already have a local readme because it will clash

@aleesteele
Copy link
Member

Maybe we could transfer some of the existing news-letter-related issues to the new repository, to test if that works @da5nsy? Thanks so much for this transition. I just took a look at the repo, which looks good to me in terms of its content.

@JimMadge
Copy link
Member Author

Looks to me like the filter repo worked well 🎉

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 21, 2024

Maybe we could transfer some of the existing news-letter-related issues to the new repository, to test if that works @da5nsy? Thanks so much for this transition. I just took a look at the repo, which looks good to me in terms of its content.

Just tested with https://github.com/the-turing-way/the-turing-way/issues/3465, seems to have worked AFAIC

@da5nsy
Copy link
Collaborator

da5nsy commented Jan 21, 2024

I thinking we would have to manually transfer open PRs (e.g. #3469).

I guess if we wanted to be extra fancy we could preserve the specific relevant branches when we do the transfers, but we'd still have to make the PR again, and so I think in most cases (there shouldn't be many cases, assuming we keep the book in the-turing-way/the-turing-way) it'll make sense to rebuild the PR from scratch, link to it from the old one and close the old one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea-for-discussion This can be used for inviting discussion from collaborators or community in general infrastructure For all issues related to book infrastructure project governance
Projects
Status: No status
Status: No status
Development

No branches or pull requests

6 participants