gomodfs: implement synthetic .zip file serving via synthzip by bradfitz · Pull Request #26 · tailscale/gomodfs

bradfitz · 2026-04-08T04:26:56Z

Implement the $base/$module/@v/$version.zip endpoint from the goproxy
protocol, using github.com/tomhjp/synthzip to synthesize zip files
on-the-fly from individually-cached files without materializing the
entire archive in memory upfront.

synthzip requires Name, Size, and CRC32 for each file at construction
time (no I/O), then lazily reads file contents only when a byte range
is actually accessed.

Key changes:

store: Add CRC32 to PutFile interface, add ZipFileEntry type and
GetZipFileEntries method to Store interface
gitstore: Store CRC32 data as a "zipcrc" blob in the git tree during
PutModule (format: "<8-hex-crc32>\t\n" per file). Load it back
in newModHandle. Old cache entries without zipcrc get CRC32=0.
gomodfs: Add getZipArchive (builds synthzip.Archive from metadata),
getZipFileData (materializes full zip), getZipFileSize (metadata only)
path: Recognize ".zip" extension in parsePath
All frontends (NFS, FUSE, WebDAV, WinFsp): Serve .zip files. Stat
calls use getZipFileSize to avoid materializing the full zip.

Implement the $base/$module/@v/$version.zip endpoint from the goproxy protocol, using github.com/tomhjp/synthzip to synthesize zip files on-the-fly from individually-cached files without materializing the entire archive in memory upfront. synthzip requires Name, Size, and CRC32 for each file at construction time (no I/O), then lazily reads file contents only when a byte range is actually accessed. Key changes: - store: Add CRC32 to PutFile interface, add ZipFileEntry type and GetZipFileEntries method to Store interface - gitstore: Store CRC32 data as a "zipcrc" blob in the git tree during PutModule (format: "<8-hex-crc32>\t<path>\n" per file). Load it back in newModHandle. Old cache entries without zipcrc get CRC32=0. - gomodfs: Add getZipArchive (builds synthzip.Archive from metadata), getZipFileData (materializes full zip), getZipFileSize (metadata only) - path: Recognize ".zip" extension in parsePath - All frontends (NFS, FUSE, WebDAV, WinFsp): Serve .zip files. Stat calls use getZipFileSize to avoid materializing the full zip.

tomhjp

Thanks! There are a few design decisions that are interesting to discuss/document, but none of them are blockers.

tomhjp · 2026-04-08T11:03:55Z

 	}
+
+	// Try to load CRC32 data from the zipcrc blob.
+	if crcObj, err := s.getObject(context.TODO(), modTree.String()+":zipcrc", "blob"); err == nil {


Why not context.Background()?

tomhjp · 2026-04-08T11:11:22Z

+	Blob  objRef
+	Size  int64
+	Mode  os.FileMode
+	CRC32 uint32 // CRC-32 checksum; 0 if not known (old cache entries)


If you care to, I think we could avoid the requirement to track CRC-32s with some library updates. It turns out most zip readers (including go's archive/zip) never cross-check the local file header CRC-32 against the CRC-32 stored in the central directory, so we should be able to zero out the CRC-32 in the central directory and only need to lazily calculate it in a trailer per-file as readers iterate through real file contents.

Unfortunately readers need to guess at the range they'll find the central directory marker within, so e.g. archive/zip reads the last 1KiB upfront (if that fails its second guess is the last 65KiB), which will inevitably include some real file content regions, but that's a small enough amount of duplicate file reading that it seems like a good trade-off for a simpler interface.

That said, given the work we already do on the way into the git store, it seems very reasonable to avoid all that complexity and just track the CRC-32s. I'm probably leaning slightly towards tracking them in gomodfs as you've got it already, WDYT?

tomhjp · 2026-04-08T11:17:24Z

 }

+// ZipFileEntry describes a file within a module's zip for synthetic zip construction.
+type ZipFileEntry struct {


We could use synthzip.File directly, but I guess you didn't want to require the library as part of the interface definition? Which seems reasonable, just thinking aloud.

tomhjp · 2026-04-08T11:24:36Z

+	data := make([]byte, archive.Size())
+	if _, err := archive.ReadAt(data, 0); err != nil && err != io.EOF {


or io.ReadAll if you prefer

tomhjp · 2026-04-08T11:26:37Z

+	})
+}
+
+// getZipFileData returns the full zip file bytes for the given module version.


Have you seen any pathologically large module zips? I had kind of assumed we'd want to make some interface changes to support streaming rather than reading the whole thing into memory, but I don't have a good feel for the worst-case we see in practice.

tomhjp · 2026-04-08T11:27:11Z

+
+	// Read the synthetic .zip file.
+	zipPath := "cache/download/go4.org/mem/@v/v0.0.0-20240501181205-ae6ca9944745.zip"
+	ctx := context.Background()


nit: t.Context()?

tomhjp · 2026-04-08T11:28:33Z

 	}
 }

+func TestZipFile(t *testing.T) {


tomhjp · 2026-04-08T11:30:44Z

+			}
+		}
+	}
+	// If zipcrc doesn't exist (old cache entries), CRC32 fields stay 0.


Maybe expand this comment to explain the implications downstream? IIUC we'll then error when the zip gets constructed and end up reporting the zip doesn't exist, so essentially the existing behaviour we have today?

tomhjp · 2026-04-08T12:05:53Z

+	// Format: "<8-hex-crc32>\t<path>\n"
+	var hasCRC bool
+	for _, f := range data.Files {
+		if f.CRC32() != 0 {


As this is accessing data from the original zip file, the only way this could be zero is if the CRC-32 appears in the data descriptor (trailing fields after the file content) instead of the file header. If we get zip files of that form, we'll fail to extract the CRC-32. Maybe we don't care because it's always the same tool creating the zips so we probably won't see zips of that form? But if we want to catch that, we'd have to write our own data descriptor parsing logic because unfortunately the standard library doesn't expose it AFAICT.

However, we are also about to read the entire file contents in tb.addFile down below, so could calculate it on the fly very cheaply and skip any complex zip parsing.

Anyway, all that to say, there are a few alternatives to consider here, and quite a lot of subtlety behind this check. At the least, probably worth documenting the choices in more detail with a comment.

tomhjp · 2026-04-08T12:09:10Z

+			fmt.Fprintf(&crcBuf, "%08x\t%s\n", f.CRC32(), f.Path())
+		}
+		crcData := crcBuf.Bytes()
+		if err := tb.addFile("zipcrc", static(crcData), 0644); err != nil {


Would we get better git store deduplication across similar module versions if we had a zipcrc blob per path? Was the decision here intentional to reduce the number of blobs? I can see it would be a lot of extra blobs to do one CRC-32 per blob, but not sure what kind of impact that would have on performance.

bradfitz requested a review from tomhjp April 8, 2026 04:27

tomhjp approved these changes Apr 8, 2026

View reviewed changes

bradfitz mentioned this pull request Apr 13, 2026

pacify cmd/go's "go mod download" with zero byte (empty) cache/download/*.zip files #19

Closed

tomhjp mentioned this pull request Apr 16, 2026

gomodfs: implement synthetic .zip file serving via synthzip #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gomodfs: implement synthetic .zip file serving via synthzip#26

gomodfs: implement synthetic .zip file serving via synthzip#26
bradfitz wants to merge 1 commit intomainfrom
bradfitz/synthzip

bradfitz commented Apr 8, 2026

Uh oh!

tomhjp left a comment

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

tomhjp Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		data := make([]byte, archive.Size())
		if _, err := archive.ReadAt(data, 0); err != nil && err != io.EOF {

Conversation

bradfitz commented Apr 8, 2026

Uh oh!

tomhjp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants