Skip to content

Store binary snapshots as Git LFS #4450

Closed
Closed
@mfranzke

Description

@mfranzke

Yes, the large .git folder size is very likely related to the large images you're storing in the repository. Git is not designed to handle large binary files like images efficiently — it stores every version of every file, which causes the .git directory to grow quickly, especially if you update or replace images over time.

Why the .git Folder Is Huge

  • Git stores all history of your files.

  • Binary files (like .png, .jpg, .psd, etc.) don’t compress well or diff efficiently.

  • Replacing an image means Git stores a full copy of each version.

  • Over time, this history builds up and bloats the .git folder.


Solution: Move Large Files to Git LFS

Git LFS (Large File Storage) is designed for exactly this situation. It stores large files outside the Git history (on GitHub's LFS servers), replacing them in your repo with small pointer files.

Benefits of Git LFS

  • Greatly reduces .git folder size.

  • Faster cloning and pulling.

  • Keeps Git history manageable.

  • Still tracks large files but without the overhead.


How to Migrate to Git LFS

  1. Install Git LFS (Large File Storage) on your environment
  2. Install Git LFS (once):
    git lfs install
    
  3. Track image types:
    Example for PNG, JPG, and PSD:

    git lfs track "*.png"
    git lfs track "*.jpg"
    git lfs track "*.psd"
    
  4. Commit the .gitattributes file that LFS generates:

    git add .gitattributes
    git commit -m "Track image files with Git LFS"
    
  5. Re-add the large files (to move them to LFS):
    You can remove and re-add the files so Git LFS picks them up:

    git rm --cached path/to/image.jpg
    git add path/to/image.jpg
    
  6. Commit the changes:

    git commit -m "Move images to Git LFS"
    
  7. Push to GitHub:

    git push origin main
    

To Fully Clean the History (Optional but Recommended)

If you want to remove the image history from .git entirely (i.e., shrink its current size), you’ll need to rewrite your repo history using tools like:

  • git filter-repo (recommended)

  • bfg-repo-cleaner (easier but less flexible)

Let me know if you want step-by-step help with that — it can permanently reduce your .git folder size but will rewrite history, so proceed with caution (especially for shared/public repos).


Summary

Step Result
Move images to Git LFS ✅ Prevents future .git bloat
Clean Git history ✅ Shrinks .git folder significantly
Just push to GitHub ❌ Doesn’t fix current .git size

Would you like help rewriting the Git history safely?

Yes, the large `.git` folder size **is very likely related to the large images** you're storing in the repository. Git is not designed to handle large binary files like images efficiently — it stores every version of every file, which causes the `.git` directory to grow quickly, especially if you update or replace images over time.

Why the .git Folder Is Huge

  • Git stores all history of your files.
  • Binary files (like .png, .jpg, .psd, etc.) don’t compress well or diff efficiently.
  • Replacing an image means Git stores a full copy of each version.
  • Over time, this history builds up and bloats the .git folder.

Solution: Move Large Files to Git LFS

[Git LFS (Large File Storage)](https://git-lfs.com/) is designed for exactly this situation. It stores large files outside the Git history (on GitHub's LFS servers), replacing them in your repo with small pointer files.

Benefits of Git LFS

  • Greatly reduces .git folder size.
  • Faster cloning and pulling.
  • Keeps Git history manageable.
  • Still tracks large files but without the overhead.

How to Migrate to Git LFS

  1. Install Git LFS (once):

    git lfs install
  2. Track image types:
    Example for PNG, JPG, and PSD:

    git lfs track "*.png"
    git lfs track "*.jpg"
    git lfs track "*.psd"
  3. Commit the .gitattributes file that LFS generates:

    git add .gitattributes
    git commit -m "Track image files with Git LFS"
  4. Re-add the large files (to move them to LFS):
    You can remove and re-add the files so Git LFS picks them up:

    git rm --cached path/to/image.jpg
    git add path/to/image.jpg
  5. Commit the changes:

    git commit -m "Move images to Git LFS"
  6. Push to GitHub:

    git push origin main

To Fully Clean the History (Optional but Recommended)

If you want to remove the image history from .git entirely (i.e., shrink its current size), you’ll need to rewrite your repo history using tools like:

Let me know if you want step-by-step help with that — it can permanently reduce your .git folder size but will rewrite history, so proceed with caution (especially for shared/public repos).


Summary

Step Result
Move images to Git LFS ✅ Prevents future .git bloat
Clean Git history ✅ Shrinks .git folder significantly
Just push to GitHub ❌ Doesn’t fix current .git size

How to rewrite the Git history safely

Great — let’s walk through how to rewrite your Git history to remove the large image files from the .git history and fully reclaim space. This will significantly shrink the .git folder.

We’ll use git filter-repo, the modern, faster, and safer replacement for filter-branch and BFG.


⚠️ Important Before You Begin

  • This rewrites history, which changes commit hashes.
  • If you’ve pushed to GitHub, collaborators will need to re-clone or reset their local repos.
  • Back up your repo or clone it before proceeding (highly recommended).

✅ Step-by-Step: Remove Image History with git filter-repo

1. Install git-filter-repo

For most systems:

You can also use pip install git-filter-repo (requires Python).


2. Clone a fresh copy of your repo (recommended)

To be safe and avoid corrupting your current working repo:

git clone --mirror https://github.com/your-username/your-repo.git
cd your-repo.git

This gives you a bare repo ideal for cleaning.


3. Remove images from history

Run this command to remove large image formats from the entire history:

git filter-repo --path-glob '*.png' --path-glob '*.jpg' --path-glob '*.jpeg' --path-glob '*.psd' --invert-paths

This will strip all matching files from the full commit history.


4. Push the cleaned repo to GitHub

You'll need to force-push to overwrite the remote history:

git push --force --mirror

5. Re-clone the cleaned repo

After the push, delete your local working copy and re-clone the fresh cleaned version:

git clone https://github.com/your-username/your-repo.git

🔁 Going Forward: Use Git LFS

Now that the history is clean, track future images with Git LFS to prevent the .git folder from growing again:

git lfs install
git lfs track "*.png"
git lfs track "*.jpg"
git lfs track "*.psd"
git add .gitattributes
git commit -m "Track images with Git LFS"

🧹 Optional: Run Garbage Collection

If you're doing this in a non-bare repo, you can also run:

git gc --prune=now --aggressive

shell script for all these steps

Great — here’s the final, full-featured script that:


✅ What It Does

  1. Clones your repo as a mirror

  2. Removes:

    • All images (*.png, *.jpg, *.jpeg, *.psd) from history
    • All files > X MB (you choose the size)
  3. Force-pushes the cleaned repo to GitHub

  4. Re-clones a working copy

  5. Sets up Git LFS

  6. Re-adds currently existing large files (including re-tracked images) to LFS and commits them


🧠 Notes

  • Git LFS pointer files are committed; the large content is stored externally.
  • This ensures current files remain in your repo, just tracked efficiently.

🚀 Script: cleanup-and-lfs-migrate.sh

#!/bin/bash

# === CONFIGURATION ===
REPO_URL="https://github.com/your-username/your-repo.git"
REPO_NAME="your-repo"
MAX_FILE_SIZE_MB=5
LFS_FILE_TYPES=("*.png" "*.jpg" "*.jpeg" "*.psd")

# === DERIVED VARIABLES ===
BARE_REPO="${REPO_NAME}.git"
WORKING_REPO="${REPO_NAME}-cleaned"
MAX_BYTES=$((MAX_FILE_SIZE_MB * 1024 * 1024))

# === 1. Clone mirror for safe history rewrite ===
echo "==> Cloning bare mirror of the repo..."
git clone --mirror "$REPO_URL" "$BARE_REPO"
cd "$BARE_REPO" || exit 1

# === 2. Check for git-filter-repo ===
if ! command -v git-filter-repo &> /dev/null; then
  echo "❌ git-filter-repo not found. Install it and try again: https://github.com/newren/git-filter-repo"
  exit 1
fi

# === 3. Find large files in history ===
echo "==> Detecting files over $MAX_FILE_SIZE_MB MB in Git history..."
LARGE_FILES=$(git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  awk -v max_size=$MAX_BYTES '$1 == "blob" && $3 > max_size { print $4 }' | sort | uniq)

# === 4. Prepare filter-repo arguments ===
REMOVE_ARGS=()
for type in "${LFS_FILE_TYPES[@]}"; do
  REMOVE_ARGS+=("--path-glob" "$type")
done
for file in $LARGE_FILES; do
  echo "   • Will remove large file: $file"
  REMOVE_ARGS+=("--path" "$file")
done
REMOVE_ARGS+=("--invert-paths")

# === 5. Rewrite history ===
echo "==> Rewriting history to remove large files and images..."
git filter-repo "${REMOVE_ARGS[@]}"

# === 6. Force-push cleaned history ===
echo "==> Pushing cleaned repo to GitHub..."
git push --force --mirror

# === 7. Clone fresh working copy ===
cd ..
echo "==> Cloning cleaned repo (working copy)..."
git clone "$REPO_URL" "$WORKING_REPO"
cd "$WORKING_REPO" || exit 1

# === 8. Set up Git LFS ===
echo "==> Setting up Git LFS and tracking file types..."
git lfs install
for type in "${LFS_FILE_TYPES[@]}"; do
  git lfs track "$type"
done
git add .gitattributes
git commit -m "Track image files using Git LFS"

# === 9. Find and re-add large current files (> MAX_FILE_SIZE_MB) ===
echo "==> Scanning working tree for large files to migrate to Git LFS..."
LARGE_CURRENT_FILES=$(find . -type f -size +"${MAX_FILE_SIZE_MB}"M ! -path "./.git/*")

for file in $LARGE_CURRENT_FILES; do
  echo "   • Re-adding large file: $file"
  git rm --cached "$file"
  git add "$file"
done

# === 10. Commit and push changes ===
if git diff --cached --quiet; then
  echo "✅ No large working tree files found to re-add."
else
  git commit -m "Move current large files to Git LFS"
  git push origin main
fi

echo "🎉 Done! History cleaned and Git LFS fully set up."

🛠️ How to Use

  1. Replace:

    • REPO_URL="https://github.com/your-username/your-repo.git"
    • REPO_NAME="your-repo"
    • MAX_FILE_SIZE_MB=5 (adjust size threshold as needed)
  2. Save the script as cleanup-and-lfs-migrate.sh

  3. Make it executable:

    chmod +x cleanup-and-lfs-migrate.sh
  4. Run it:

    ./cleanup-and-lfs-migrate.sh

📦 Requirements


Migrating Existing Repositories to Use Git LFS

If a repository already contains large files, migrate them using:

git lfs migrate import --include="*.filetype"

This retroactively replaces large files with LFS pointers.

missing diff functionality for Git LFS files

You're encountering this error because GitHub.com currently does not support rich image diffs for files stored in Git LFS (Git Large File Storage).

Why This Happens:

When you use Git LFS, the actual image file is replaced in the Git repository with a pointer file (a small text file) that references the large file stored separately. GitHub can't display visual diffs of those pointer files because it doesn't automatically fetch and render the actual images for diffing.

That's why you're seeing:

Unable to render rich display

Invalid image source.

Workarounds:

  1. Use Local Tools for Image Diffing:
    Since GitHub can’t show the diff, you can do it locally using tools like:

  2. Avoid LFS for Images If Visual Diffs Are Essential:
    If rich image diffs are important in your workflow (e.g., for design review or QA), consider keeping key image files in the main repo (not under LFS), as long as they’re reasonably small (under GitHub’s 100MB file limit).

  3. Use External Image Diff Review Tools:
    Store LFS-tracked images externally (e.g., in an S3 bucket or CDN), and use a dedicated diff viewer integrated into your CI/CD or review workflow (some teams build internal tools or use PR bots to link to image comparisons).

  4. Push for GitHub Feature Requests:
    You’re not alone—many devs have requested this feature. If it’s important to you, consider upvoting or commenting on [GitHub Community discussions](https://github.community/) about LFS image diff support.


Would you like a script or setup to help with local image diffing for LFS-tracked images?

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions