New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Structure - Organization into a "Many Repo" (Breaking up the monorepo) #11204
Comments
cc: @RoiEXLab and @bacrossland , y'all both expressed support for a service architecutre and multi-repo, WDYT? |
Got a start on this, some early examles:
I'm liking that the '.github/CONTRIBUTE.md' file can be tailored for how to contribute to that immediate repository. Similarly the README can be very focused as well. |
Discovered a significant draw-back to publishing 'jar files' to github packages - access requires an access token. It is not the end of the world, but it is a another one-time setup step before the project can be built. This page has details: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-gradle-registry |
@DanVanAtta I might be missing something, but can't you use the default
Regarding your plans of splitting the repository: I'm not quite sure how to interpret your figure. What do the colors mean? What do the arrows mean? Are they referring to git submodules? |
I like what you have proposed @DanVanAtta but the diagram is a little confusing. Is each color a separate repo or is each box a repo? I 100% agree that splitting to different repos based on their role will help to rollout improvements to the overall systems in smaller increments. For instance, the recent forum bug holding up release of the game engine wouldn't happen. The issue would only effect the forum client and could be worked in isolation while the game engine updates could still rollout. |
Re: access token for downloading maven package from github's maven repositoryThe publish of jar files can be done using the built-in secret token. I worked on an example to start seeing what this would look, here is an example: https://github.com/triplea-game/maps-client/blob/master/.github/workflows/publish-jar.yml The above is running a gradle build (jar task) and then is uploading that to the packages list: https://github.com/orgs/triplea-game/packages?repo_name=maps-client Eventually with the capability to use it like this:
Notice that the above is a build dependency, and of course is not a project dependency. We need to add some small config into our list of maven repositories in our gradle build config. Access to the github maven repository requires an auth-token. So the context is from a developer machine running a build and pulling down dependencies. Discussion Points / Q&AWhat do the arrows mean? Are they referring to git submodules?Each card/item in the diagram is a repository. Each repository will have an output artifact. Either it will be a game-installer, a maven package,or a service that is deployed to prod. The lines represents a "uses" relationship between repositories, the direction of the arrow points to the item being used. For example, the 'lobby-server' will have a maven Jar file dependency on 'lobby-client', and the 'game' will also have a Jar file dependency on the 'lobby-client'. What do the colors mean?Green: The 'green' repositories are the front-end, generally what we would consider to be 'triplea-game/triplea', the code that will be in those repo's likely would be what we had in 'game-core' and 'game-headed' Yellow: These are repositories that are "stand-alone", typically they host non-java code related stuff. Purple: These are all servers, generally would be separate drop-wizard servers each with their own JSON-over-HTTP API Blue: These are the remaining items and are the various http clients for each server. Part of the idea with the http servers is that they will be developed in parallel with a stand-alone 'client' jar that presents a full java API to the server. So, all of the wire-mock & feign code will wind up in these 'blue' packages, and would present a pure java API to anything that would want to access a server. Game split into different repo's
I generally agree. Though, we may have different perceptions of what "all-out" means. My initial target is to split the game-engine into 3 repositories. This is with the goal to force ourselves to partition the code. If we split the AI further into their own repositories for each AI, I wouldn't quite consider that 'all-out' just yet (notably because we ideally have zero dependencies between AIs and they 'should' be completely standalone dependencies). With that said, something like a 'AI-lib' is probably not desirable and indeed we do not want to go crazy with a dependency tree that would require 5 PRs to update an AI. The repositories would be there for places where we really want to enforce a very hard line between dependencies . The game-engine has the luxury of a lot of time before we're ready to do much in the way of significant splits. I'm thinking we will continue to use sub-modules quite heavily. There is a still majority of code in the game-engine and that will likely continue to be the case for a while. We have a lot of time to design further segregation of the game-engine code. The goals of any first moves would be to "force" break any code dependencies. |
Q: Is each color a separate repo or is each box a repo?Each box is a repo. The colors are for grouping. Benefits of Split
Indeed! Further, our desire to move forum in docker would be facilitated even. I just learned that Github 'packages' can host docker images. Hence, our 'forums' repository could package a docker container and then trigger an update/deployment whereby the forum server pulls a docker image from github. I think a big gain would be for AIs as well. Having those in their own repository forces us not to use AI code in the game-engine, and we can create more granular access permissions, and the partitioning just more re-inforces us to respect a larger system design. |
@DanVanAtta Thanks for clarifying. I like your proposal in general. We'll have to discuss the minor details, i.e. where to split the repo once we got to this point. But for now if we start with just some separate repositories we can easily re-evaluate on a case by case basis where more segmentation is useful and where the separation is just right. |
Adding github access token in order to run gradle buildHere is a PR adding the dev-setup steps required to download maven packages hosted on github packages: https://github.com/triplea-game/docs/pull/2 After going down this path, every developer would have to follow that in order to run the gradle build |
Re-considering multi-repoIn short, I think perhaps our best bet instead of doing mult-repo is Notably, I just learned that there is a 'paths' configuration option Multi-Repo StatusI made it somewhat far down the multi-repo path. The 'maps-client' repository Mono vs Multi Repo - considerations after having done some experimentation nowWith more experience with how it'll look to have multi-repo for TripleA, there To start with the pros. Multi-Repo Pros
I think the noteworthy thing here is that having conditional github actions Multi-Repo: deeper insights (the cons)Access TokenThe biggest drawback is the access token needed to download dependencies. Lots of Tooling De-DuplicationThings like 'code-cov' and any other static analysis tool needs to be configured Distributed Pull RequestsThis looks like it could become quite a problem. There is no single view AFAIK Code isolation is too extremeThis is a pro, but multi-repo starts to go into an extreme. Either we start Dev Setup Instructions & Docs get weirdIt's weird to have a repo dedicated for docs & to have each README duplicate Overall effort to effectuate this changeIt has not been all that quick to drop existing code into new repo's and have Lack of infrastructure shared tooling & duplication of server listsI set up a 'systems-install' repo which is meant to be ansible deployment that will Not a convincing model for new contributors to be impactfulParticularly for new contributors, I fear the amount of frustration that would be experienced Summary - Let's go for Better modularization within a mono-repo?In sum:
Using the 'path' attribute on github actions I think is a game-changer and I think if we try to modularize code better/more, that will be the most important. Sticking to a mono-repo gives a lot of benefits, one place to view pull requests, For stronger modularization, I think the idea is going to be to try and avoid shared For a deployment & infrastructure level, I think we should try to modularize Next StepsThis is a longer read, so I appreciate anyone sticking with it and the full consideration. I'll leave the additional multi-repo's up and available for a little bit, but will plan |
@DanVanAtta Didn't know about the paths attribute, seems like a really good idea using that 👍 |
I don't know how new the 'paths' attribute is, but I only learned about perhaps 2 days ago! This article: https://buttondown.email/blog/just-use-a-monorepo was linked from hacker news I'm not sure if the article is necessarily all that compelling, but it was interesting to think that we were going the opposite direction. In the comments of the article, someone gave an example of the 'paths' attribute where I then learned about it: https://news.ycombinator.com/item?id=34359736 That paths capability really does seem like a complete game changer for this whole conversation! |
The path attribute in github actions is nice. It makes CICD based on changes to certain folders data easier. Using it to format and deploy updated documentation is a common use case. No need for full deploys when all you need to do is fix docs. Separate repos: I get that changing the gradle and build pipeline can be difficult. For a lot of large projects that have the same dependencies this one does (and the same issues) they got around that difficulty by using git submodules. Each module was in a separate repo that git submodule knew about. Devs only needed to check out the parent repo and then run the submodule checkout. It would git checkout the source for the other modules and they would be there in the local parent repo just as if they were stored in the repo. You make changes to the submodule, check it in, and later update the main repo to point at the commit hash of the updated submodule. The last major project I worked on managed 50+ submodules this way. If you are curious to know more, here is a 15 minute primer: https://www.youtube.com/watch?v=gSlXo2iLBro&t=71s Docker and lowering the barrier of entry:
When it comes to lower the barrier to entry on a project, Docker is fantastic. With docker and docker-compose you are not forced to know the full architecture because it's already handled for you. Instead of telling a dev that they need to download, setup, and configure a database on their local system, they use one command to pull an image and launch it as a running container. Need to upgrade? One command to pull a new image. Need to wipe the whole thing? One command to remove the container. It makes working with dependent systems easier because you don't have to know and manage that dependent system. Take NodeBB for instance. Unless you use that library or NodeJS as your daily tech stack, why would you want to install and maintain a NodeJS install, a NodeBB install, and a Mongo DB install on your dev system just so you can fix an issue between your Java application and the basic version of NodeBB? A docker-compose file that spins up NodeBB and Mongo DB using the correct NodeJS version is a single command to up and down the stack. No more worrying about needing to know Javascript and NodeJS. Just get on with fixing your Java application. A 5 minute verify script sounds like a perfect candidate for refactor. If the script is a necessary step in the build or test pipeline, then it should be looked at to see if it can be faster. Github actions can run that script on a developer's branch on them checking in the code before creating the PR. That will save them from needing to run that locally as part of their development process. I've just changed jobs which should free up some dev time for me. I can help with lowering the barrier which will make finding more devs easier and relieving some extra load from you @DanVanAtta and @RoiEXLab. |
I compiled a project list very quickly here: https://forums.triplea-game.org/topic/3388/triplea-dev-project-list-jan-2023 My previous post was a bit long and super detailed. It's hard to convey everything and be concise. I think there are three key problems with multi-repo (perhaps lost in the details):
Long story short, seemingly the path based build gives us so many of the benefits of many-repo. For reference, a sample build script for a 'client' library (something that has no other project dependencies) can be seen here: https://github.com/triplea-game/maps-client/blob/master/build.gradle, it's pretty lengthy and all boiler plate (minus the dependency list, which is not too horrible, but that's a fraction of the overall build script; also take note of the customizatoin around the shadow jar task so that the jar file is both packaged properly & also well named). An example of a 'dependent' library (maps-server) can be seen here: https://github.com/triplea-game/maps-server/blob/master/build.gradle, there it can be noted each dependent project has to be listed as its own repo and there is config for configuring access tokens. Re: docker & barrier to entryI totally agree on the reduction of the barrier to entry provided by docker. To try and clarify, I've found a lot of contributors simply and only want to contribute to the game-app part of TripleA. They just have no interest (yet) to work on the lobby part. The issue is that it's important to be able to run a full CI verification locally, and that means everyone had to run the full set of lobby tests & setup whether they were updating the lobby or not. That is where a lot of the time was sunk during builds. I've been working on this today, the new verify script for the game-client will be at: The docker DB will certainly still be a factor when doing server-side work. Totally agree that is light-years nicer than each developer having to install a DB to bare-metal, let alone considering the effort to destroy it and rebuild it.
Indeed. Getting the build times faster is generally always high value work. It helps everyone, many times. We have a number of tests that can be simply speed up. The lobby side testing was a large chunk of that 5 minutes, and the tests were forced to run in sequence... I agree it's a good place to spend some effort to improve. |
One negative thing about path specific builds so far seems that the coverage report in PRs could be very wonky. Currently coverage is now only computed for master branch builds so that we can get an overall trend. Slower master branch builds are not that important, I'm on the fence if it is worth it. One nice thing about multi-repo was that we could enforce code coverage requirements readily. Perhaps there is a way to configure that in a path specific manner. |
Good news, just landed a PR just now that did not update any source code, and no tests were run! We are in a place now where the CI does not run if for example only the docs are updated. |
Let's experiment with multi-repo
I can see now a lot of benefit to going to multi-repo. I am feeling a bit burned by the lobby 2.6 rollout, issues with 'maps+lobby' server running out of memory and that being hard to diagnose, and finally not really getting the benefit of having everything on latest while still also having to maintain versioned dependencies.
I think mutli-repo can help us a bit here and get 2.6 released a bit faster and put us on a healthier long term path in terms of code organization and even contribution. I am planning to experiment with mutli-repo in relatively easily reversible ways and I'm happy to create additional repositories so that others can do so as well. I know there is already some support for multi-repo & a service like architecture, we could potentially move quickly. Please consider and provide feedback.
Going mutli-repo, I think each repo should have these characteristics:
For roll-out, I'm thinking:
The 'bots' are not listed because I think next steps will also include for us to delete all of the headless code in favor of 'network-relay
Concern, roll-out delays and version 2.6:
I think we use this method to get out lobby v2.6
There is an issue between 2.6 clients and 2.5 & 2.6 bots. We can work on this in the meantime, there theoretically should be no issue with having a 2.6 lobby, 2.5 bots & 2.5 client.
Next, having a 'minimum' version notification built-in allows us to account for backward compatibility issues and have a 'turn-off' switch for old client versions. With this, the process to break off client functionality to a new server could become routine (see 'roll-out' above)
Concern, loss of code history
We'll certainly lose some code history. The main game-code will hopefully mostly stay together and be where we have most of the interesting history. Losing code change history in lobby and maps-server code does not feel too horrible.
Repo Dependency Diagram / Roadmap for consideration & feedback
triplea-repos.graphml.zip
The text was updated successfully, but these errors were encountered: