New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
50MB GitHub max size on tweet.db file #4
Comments
|
My own |
|
@zachleat did you consider using Markdown or JSON files for each tweet, instead of an sqlite DB? It would remove this issue with a single large file. But I guess it might be more difficult to manage, or slower for the build. |
|
I think historically I moved to sqlite for performance reasons yeah, specially around memory in pagination. BUT I also made a bunch of performance/memory improvements to Eleventy pagination in 2.0 that apply very directly here so… I’m not sure 😅 It wouldn’t be a small change to move away from sqlite though. |
|
I do want to mention a short-term workaround here: run builds locally and commit your This has the side benefit of not requiring your entire twitter history in source control as a nice database for people to use 😅 |
|
Nice trick indeed! 👍 |
|
I did want to note one other path forward here that would be a smaller lift than moving away from Feels like a yearly sharding might be the least amount of work, tbh. |
|
Removing useless parts of the JSON would be nice, but I'm not sure it would be enough for people with a lot of tweets. Sharding is a good idea. 👍 Would it make some features more difficult, like assembling threads? |
|
I'd argue that maybe the tweet.db might not need to be committed into the repository as it can be rebuilt entirely from the tweets.js with any external dependency. You'd want to cache it in GitHub Actions, but possibly not commit it. Regarding tweets.js, which can grow big too, then it could easily be split into several smaller files if needed. |
|
Yeah, I've put tweets.js and tweet.db into my gitignore. Alternatively, if you really want to commit those type of large files, I have some in my static sites (videos and such) and I just use https://git-lfs.com. For reference, my tweet.db for >16k tweets is 243.1MB, which I think is pretty reasonable? |
|
I’d also vote in favor of annual shards. |
|
I do want to note for folks that are checking their |
|
mine is 123MB for >100k tweets |
GitHub has a 50MB max, which my personal archive
tweet.dbhas hit.https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github
We might want to shard this (yearly?) for larger archives
The text was updated successfully, but these errors were encountered: