Skip to content

[pull] master from git:master#171

Merged
pull[bot] merged 52 commits intoturkdevops:masterfrom
git:master
Mar 3, 2026
Merged

[pull] master from git:master#171
pull[bot] merged 52 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link

@pull pull bot commented Mar 3, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

agawande and others added 30 commits January 8, 2026 11:02
git allows using .netrc file to supply credentials for HTTP auth.
Three test cases are added in this patch to provide missing coverage
when cloning over HTTP using .netrc file:

  - First test case checks that the git clone is successful when credentials
    are provided via .netrc file
  - Second test case checks that the git clone fails when the .netrc file
    provides invalid credentials. The HTTP server is expected to return
    401 Unauthorized in such a case. The test checks that the user is
    provided with a prompt for username/password on 401 to provide
    the valid ones.
  - Third test case checks that the git clone fails when the .netrc file
    provides credentials that are valid but do not have permission for
    this user. For example one may have multiple tokens in GitHub
    and uses the one which was not authorized for cloning this repo.
    In such a case the HTTP server returns 403 Forbidden.
    For this test, the apache.conf is modified to return a 403
    on finding a forbidden-user. No prompt for username/password is
    expected after the 403 (unlike 401). This is because prompting may wipe
    out existing credentials or conflict with custom credential helpers.

Signed-off-by: Ashlesh Gawande <git@ashlesh.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
…object

* ps/read-object-info-improvements:
  packfile: drop repository parameter from `packed_object_info()`
  packfile: skip unpacking object header for disk size requests
  packfile: disentangle return value of `packed_object_info()`
  packfile: always populate pack-specific info when reading object info
  packfile: extend `is_delta` field to allow for "unknown" state
  packfile: always declare object info to be OI_PACKED
  object-file: always set OI_LOOSE when reading object info
…bject

* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
  odb: properly close sources before freeing them
  builtin/gc: fix condition for whether to write commit graphs
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.

Adapt these function signatures to use the correct type.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract a new function that allows us to read object info for a specific
loose object via a user-supplied path. This function will be used in a
subsequent commit.

Note that this also allows us to drop `stat_loose_object()`, which is
a simple wrapper around `odb_loose_path()` plus lstat(3p).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple divergent interfaces to iterate through objects of a
specific backend:

  - `for_each_loose_object()` yields all loose objects.

  - `for_each_packed_object()` (somewhat obviously) yields all packed
    objects.

These functions have different function signatures, which makes it hard
to create a common abstraction layer that covers both of these.

Introduce a new function `odb_source_loose_for_each_object()` to plug
this gap. This function doesn't take any data specific to loose objects,
but instead it accepts a `struct object_info` that will be populated the
exact same as if `odb_source_loose_read_object()` was called.

The benefit of this new interface is that we can continue to pass
backend-specific data, as `struct object_info` contains a union for
these exact use cases. This will allow us to unify how we iterate
through objects across both loose and packed objects in a subsequent
commit.

The `for_each_loose_object()` function continues to exist for now, but
it will be removed at the end of this patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce a new function that knows to
iterate through objects of a given packfile store. Same as with the
equivalent function for loose objects, this new function will also be
agnostic of backends by using a `struct object_info`.

Prepare for this by extracting a new shared function to iterate through
a single packfile store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `packfile_store_for_each_object()`. This
function is equivalent to `odb_source_loose_for_each_object()`, except
that it:

  - Works on a single packfile store instead of working on the object
    database level. Consequently, it will only yield packed objects of a
    single object database source.

  - Passes a `struct object_info` to the callback function.

As such, it provides the same callback interface as we already provide
for loose objects now. These functions will be used in a subsequent step
to implement `odb_for_each_object()`.

The `for_each_packed_object()` function continues to exist for now, but
it will be removed at the end of this patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `odb_for_each_object()` that knows to iterate
through all objects part of a given object database. This function is
essentially a simple wrapper around the object database sources.

Subsequent commits will adapt callers to use this new function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-fsck(1) we have two callsites where we iterate over all objects
via `for_each_loose_object()` and `for_each_packed_object()`. Both of
these are trivially convertible with `odb_for_each_object()`.

Refactor these callsites accordingly.

Note that `odb_for_each_object()` may iterate over the same object
multiple times, for example when it exists both in packed and loose
format. But this has already been the case beforehand, so this does not
result in a change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.

These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.

Refactor the sites accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're using `for_each_loose_object()` and `for_each_packed_object()` at
a couple of callsites to enumerate all loose and packed objects,
respectively. These functions will be removed in a subsequent commit in
favor of the newly introduced `odb_source_loose_for_each_object()` and
`packfile_store_for_each_object()` replacements.

Prepare for this by refactoring the sites accordingly.

Note that ideally, we'd convert all callsites to use the generic
`odb_for_each_object()` function already. But for some callers this is
not possible (yet), and it would require some significant refactorings
to make this work. Converting these site will thus be deferred to a
later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some use cases where we need to figure out the mtime for
objects. Most importantly, this is the case when we want to prune
unreachable objects. But getting at that data requires users to manually
derive the info either via the loose object's mtime, the packfiles'
mtime or via the ".mtimes" file.

Introduce a new `struct object_info::mtimep` pointer that allows callers
to request an object's mtime. This new field will be used in a
subsequent commit.

Note that the concept of "mtime" is ambiguous: given an object, it may
be stored multiple times in the object database, and each of these
instances may have a different mtime. Disambiguating these mtimes is
nothing that can happen on the generic ODB layer: the caller may search
for the oldest object, the newest object, or even the relation of object
mtimes depending on the specific source they are located in. As such, it
is the responsibility of the caller to disambiguate mtimes.

A consequence of this is that it's most likely incorrect to look up the
mtime via `odb_read_object_info()`, as this interface does not give us
enough information to disambiguate the mtime. Document this accordingly
and tell users to use `odb_for_each_object()` instead.

Even with this gotcha though it's sensible to have this request as part
of the object info, as the mtime is a property of the object storage
format. If we for example had a "black-box" storage backend, we'd still
need to be able to query it for the mtime info in a generic way.

We could introduce a safety mechanism that for example calls `BUG()` in
case we look up the mtime outside of `odb_for_each_object()`. But that
feels somewhat heavy-handed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When enumerating objects that are supposed to be stored in a new cruft
pack we use `for_each_packed_object()` and then derive each object's
mtime individually. Refactor this logic to instead use the new
`packfile_store_for_each_object()` function with an object info request
that asks for the respective mtimes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To figure out which objects expired objects we enumerate all loose and
packed objects individually so that we can figure out their respective
mtimes. Refactor the code to instead use `odb_for_each_object()` with a
request that ask for the object mtime instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.

Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git subtree split currently validates --prefix against the working tree.
This breaks when splitting an older commit or when the working tree does
not contain the subtree, even though the commit does.

For example:

  git subtree split --prefix=pkg <commit>

fails if pkg was removed later, even though it exists in <commit>.

Fix this by validating the prefix against the specified commit using
git cat-file instead of the working tree.

Add a test to ensure this behavior does not regress.

Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a signature is made with a valid key and that key later expires, the
signature should still be considered good.

GnuPG emits in this case something like:

	[GNUPG:] NEWSIG
	gpg: Signature made Wed 26 Nov 2014 05:56:50 AM CET
	gpg:                using RSA key FE3958F9067BC667
	[GNUPG:] KEYEXPIRED 1478449622
	[GNUPG:] KEY_CONSIDERED D783920D6D4F0C06AA4C25F3FE3958F9067BC667 0
	[GNUPG:] KEYEXPIRED 1478449622
	[GNUPG:] SIG_ID 8tAN3Fx6XB2NAoH5U8neoguQ9MI 2014-11-26 1416977810
	[GNUPG:] EXPKEYSIG FE3958F9067BC667 Jason Cooper <jason@lakedaemon.net>
	gpg: Good signature from "Jason Cooper <jason@lakedaemon.net>" [expired]
	[GNUPG:] VALIDSIG D783920D6D4F0C06AA4C25F3FE3958F9067BC667 2014-11-26 1416977810 0 4 0 1 2 00 D783920D6D4F0C06AA4C25F3FE3958F9067BC667
	gpg: Note: This key has expired!
	      D783920D6D4F0C06AA4C25F3FE3958F9067BC667

(signature and signed data in this example is taken from Linux commit
756f80c). So GnuPG is relaxed and the fact that
the key is expired is only worth a "Note" which is weaker than e.g.

	gpg: WARNING: The key's User ID is not certified with a trusted signature!
	gpg:          There is no indication that the signature belongs to the owner.

which git still considers ok.

So stop coloring the signature by an expired key red and handle it like
any other good signature.

Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up both gitk and git-gui in Meson as subprojects. These two
programs should be the last missing pieces for feature compatibility
with our Makefile for distributors.

Note that Meson expects subprojects to live in the "subprojects/"
directory. Create symlinks to fulfill this requirement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building the documentation with meson when the build directory is
not an immediate subdirectory of the source directory prints the
following error

[2/1349] Generating Documentation/mer... command (wrapped by meson to set env)
../../Documentation/generate-mergetool-list.sh: line 15: ../git-mergetool--lib.sh: No such file or directory

The build does not fail because the failure is upstream of a pipe. Fix
the error by passing the correct source directory when meson runs
"generate-mergetool-list.sh". As that script sets $MERGE_TOOLS_DIR
we do not need to set it in the environment when running the script.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many mentions of commands using inline-verbatim or
emphasis ('). We just mention the command themselves, not specific
invocations like `git am <opts>`. Let’s link to them instead.

There are also many such mentions which then link to the command right
afterwards. Simplify to just using a link.

Also remove “see <gitlink>” phrases where they have now already
been mentioned.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option `--message-id` was added in a078f73 (git-am: add
--message-id/--no-message-id, 2014-11-25) back when git-interpret-
trailers(1) was relatively new. Let’s spell out that it is a trailer
and link to the dedicated trailer command.

Also use inline-verbatim for `Message-ID`.

Also link to git-interpret-trailers(1) on `--signoff`.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document `--verify` and rephrase the `--[no-]verify` section to lead
with the default, in imperative mood.[1]

Historically it makes sense that only the negated forms are documented;
they are all run by default and thus you only need to use hook options
if you want to turn some of them off. But, beyond just desiring uniform
documentation,[2] it’s very much possible to have, say, a Git alias with
`--no-verify` that you might sometimes want to turn back on with
the *positive* form.

Also mention the options in the “Hooks” section and mention that
`post-applypatch` cannot be skipped.

† 1: See e.g. acffc5e (doc: convert git-remote to synopsis style,
     2025-12-20)
† 2: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `fill_missing_blobs()` receives an array of object IDs and
verifies for each of them whether the corresponding object exists. If it
doesn't exist, we add it to a set of objects and then batch-fetch all of
the objects at once.

The check for whether or not we already have the object is broken
though: we pass `OBJECT_INFO_FOR_PREFETCH`, but `odb_has_object()`
expects us to pass `HAS_OBJECT_*` flags. The flag expands to:

  - `OBJECT_INFO_QUICK`, which asks the object database to not reprepare
    in case the object wasn't found. This makes sense, as we'd otherwise
    reprepare the object database as many times as we have missing
    objects.

  - `OBJECT_INFO_SKIP_FETCH_OBJECT`, which asks the object database to
    not fetch the object in case it's missing. Again, this makes sense,
    as we want to batch-fetch the objects.

This shows that we indeed want the equivalent of this flag, but of
course represented as `HAS_OBJECT_*` flags.

Luckily, the code is already working correctly. The `OBJECT_INFO` flag
expands to `(1 << 3) | (1 << 4)`, none of which are valid `HAS_OBJECT`
flags. And if no flags are passed, `odb_has_object()` ends up calling
`odb_read_object_info_extended()` with exactly the above two flags that
we wanted to set in the first place.

Of course, this is pure luck, and this can break any moment. So let's
fix this and correct the code to not pass any flags at all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `mark_object()` we invoke `has_object()` with a value of 1. This is
somewhat fishy given that the function expects a bitset of flags, so any
behaviour that this results in is purely coincidental and may break at
any point in time.

The call to `has_object()` was originally introduced in 9eb86f4
(fsck: do not lazy fetch known non-promisor object, 2020-08-05). The
intent here was to skip lazy fetches of promisor objects: we have
already verified that the object is not a promisor object, so if the
object is missing it indicates a corrupt repository.

The hardcoded value that we pass maps to `HAS_OBJECT_RECHECK_PACKED`,
which is probably the intended behaviour: `odb_has_object()` will not
fetch promisor objects unless `HAS_OBJECT_FETCH_PROMISOR` is passed, but
we may want to verify that no concurrent process has written the object
that we're trying to read.

Convert the code to use the named flag instead of the the hardcoded
value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object info flag values have a two gaps in their definitions, where
some bits are skipped over. These gaps don't really hurt, but it makes
one wonder whether anything is going on and whether a subset of flags
might be defined somewhere else.

That's not the case though. Instead, this is a case of flags that have
been dropped in the past:

  - The value 4 was used by `OBJECT_INFO_SKIP_CACHED`, removed in
    9c8a294 (sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02).

  - The value 8 was used by `OBJECT_INFO_ALLOW_UNKNOWN_TYPE`, removed in
    ae24b03 (object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag,
    2025-05-16).

Close those gaps to avoid any more confusion.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:

  - The function signatures become more self-documenting, as callers
    don't have to wonder which flags they expect.

  - The compiler can warn when a wrong flag type is passed.

Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the reason in the preceding commit, convert the
`odb_has_object()` flags into an enum.

With this change, we would have catched the misuse of `odb_has_object()`
that was fixed in a preceding commit as the compiler would have
generated a warning:

  ../builtin/backfill.c:71:9: error: implicit conversion from enumeration type 'enum odb_object_info_flag' to different enumeration type 'enum odb_has_object_flag' [-Werror,-Wenum-conversion]
     70 |                 if (!odb_has_object(ctx->repo->objects, &list->oid[i],
        |                      ~~~~~~~~~~~~~~
     71 |                                     OBJECT_INFO_FOR_PREFETCH))
        |                                     ^~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
LemmingAvalanche and others added 22 commits February 17, 2026 10:49
Emphasize that you can pass multiple patches or diffs to this command.

git-patch-id(1) is an efficient pID–commit mapper, able to map
thousands of commits in seconds. But discussions on the command
seem to typically[1] use the standard loop-over-rev-list-and-
shell-out pattern:

    for commit in rev-list:
        prepare a diff from commit | git patch-id

This is unnecessary; we can bulk-process the patches:

    git rev-list --no-merges <ref> |
         git diff-tree --patch --stdin |
         git patch-id --stable

The first version (translated to shell) takes a little over nine
minutes for a commit history of about 78K commits.[2] The other one,
by contrast, takes slightly less than a minute.

Also drop “the” from “standard input”.

† 1: https://stackoverflow.com/a/19758159
† 2: This is `master` of this repository on 2025-10-02

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The utility and usability of git-patch-id(1) was discussed
relatively recently:[1]

    Using "git patch-id" is definitely in the "write a script for it"
    category. I don't think I've ever used it as-is from the command
    line as part of a one-liner. It's very much a command that is
    designed purely for scripting, the interface is just odd and baroque
    and doesn't really make sense for one-liners.

    The typical use of patch-id is to generate two *lists* of patch-ids,
    then sort them and use the patch-id as a key to find commits that
    look the same.

The command doc *could* use an example, and since it is a mapper command
it makes sense for that example to be a little script.

Mapping the commits of some branch to an upstream ref allows us to
demonstrate generating two lists, sorting them, joining them, and
finally discarding the patch ID lookup column with cut(1).

† 1: https://lore.kernel.org/workflows/CAHk-=wiN+8EUoik4UeAJ-HPSU7hczQP+8+_uP3vtAy_=YfJ9PQ@mail.gmail.com/

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-cherry(1) links to this command. These two commands are similar and
we also mention it in the “Examples” section now. Let’s link to it.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.

The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.

This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.

But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:

  - We parse all objects, including trees and blobs, even though we
    don't even need the contents of them.

  - More importantly though, `parse_object()` will cause us to check
    whether the object ID matches its contents.

Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.

Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.

The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):     62.571 s ±  0.413 s    [User: 58.331 s, System: 4.053 s]
    Range (min … max):   62.191 s … 63.010 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):     38.339 s ±  0.192 s    [User: 36.220 s, System: 1.992 s]
    Range (min … max):   38.176 s … 38.551 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)

This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.

Despite the speedup though, this also leads to a massive reduction in
allocations. Before:

  HEAP SUMMARY:
      in use at exit: 352,480,441 bytes in 97,185 blocks
    total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated

And after:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.101 s ±  0.006 s    [User: 8.800 s, System: 0.520 s]
    Range (min … max):    9.095 s …  9.106 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      9.128 s ±  0.094 s    [User: 8.820 s, System: 0.522 s]
    Range (min … max):    9.019 s …  9.188 s    3 runs

  Summary
    git-receive-pack (rev = HEAD~) ran
      1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)

This will be improved in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we will start to parse more commits via the
commit-graph. This change will lead to a segfault though because we try
to access the tree of a commit via `repo_get_commit_tree()`, but:

  - The commit has been parsed via the commit-graph, and thus its
    `maybe_tree` field is not yet populated.

  - We cannot use the commit-graph to populate the commit's tree because
    we're in the process of writing the commit-graph.

The consequence is that we'll get a `NULL` pointer for the tree in
`write_graph_chunk_data()`.

In theory we are already mindful of this situation, as we explicitly use
`repo_parse_commit_no_graph()` to parse the commit without the help of
the commit-graph. But that doesn't do the trick as the commit is already
marked as parsed, so the function will not re-populate it. And as the
commit-graph has been closed, neither will `get_commit_tree_oid()` be
able to load the tree for us.

It seems like this issue can only be hit under artificial circumstances:
the error was hit via `git_test_write_commit_graph_or_die()`, which is
run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:

  $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
      --test-args=-ix -i
  ...
  ++ git -c commit.verbose=true commit --amend
  hint: Waiting for your editor to close the file...
  ./test-lib.sh: line 1012: 55895 Segmentation fault         (core dumped) git -c commit.verbose=true commit --amend

To the best of my knowledge, this is the only case where we end up
writing a commit-graph in the same process that might have already
consulted the commit-graph to look up arbitrary objects. But regardless
of that, this feels like a bigger accident that is just waiting to
happen.

Make the code more robust by extending `repo_parse_commit_no_graph()` to
unparse a commit first in case we detect it's coming from a graph. This
ensures that we will re-read the object without it, and thus we will
populate `maybe_tree` properly.

This fix shouldn't have any performance consequences: the function is
only ever called in the "commit-graph.c" code, and we'll only re-parse
the commit at most once.

Add an exclusion to our Coccinelle rules so that it doesn't complain
about us accessing `maybe_tree` directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.

But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.

Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.

This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.179 s ±  0.031 s    [User: 8.858 s, System: 0.528 s]
    Range (min … max):    9.154 s …  9.213 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      2.337 s ±  0.032 s    [User: 2.331 s, System: 0.234 s]
    Range (min … max):    2.308 s …  2.371 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)

Also, this again leads to a significant reduction in memory allocations.
Before this change:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

And after this change:

  HEAP SUMMARY:
      in use at exit: 11,534,036 bytes in 12,406 blocks
    total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To improve code hygiene, replace direct casts from `struct
odb_transaction` and `struct odb_read_stream` to their concrete
implementations with `container_of()`.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'flags' and 'track_flags' fields in symlinks.c are used
strictly as a collection of bits (using bitwise operators including
&, |, ~). Using a signed integer for bitmasks may lead to undefined
behavior with shift operations and logic errors if the MSB is touched.

Change these fields from 'int' to 'unsigned int' to match our usage
patterns.

Signed-off-by: Tian Yuchen <a3205153416@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though the "struct odb_transaction" member is at the beginning
of the containing "struct odb_transaction_files", i.e., at offset 0,
using container_of() to add offset 0 to a NULL pointer gets flagged
as a bad behaviour under SANITIZE=undefined.

Use container_of_or_null() to work around this issue.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Revamp object enumeration API around odb.

* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
A signature on a commit that was GPG signed long time ago ought to
be still valid after the key that was used to sign it has expired,
but we showed them in alarming red.

* uk/signature-is-good-after-key-expires:
  gpg-interface: signatures by expired keys are fine
"git subtree split --prefix=P <commit>" now checks the prefix P
against the tree of the (potentially quite different from the
current working tree) given commit.

* ps/validate-prefix-in-subtree-split:
  subtree: validate --prefix against commit in split
Code clean-up.

* ty/symlinks-use-unsigned-for-bitset:
  symlinks: use unsigned int for flags
Additional tests were introduced to see the interaction with netrc
auth with auth failure on the http transport.

* ag/http-netrc-tests:
  t5550: add netrc tests for http 401/403
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.

* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
Doc update.

* kh/doc-am-xref:
  doc: am: fill out hook discussion
  doc: am: add missing config am.messageId
  doc: am: say that --message-id adds a trailer
  doc: am: normalize git(1) command links
Update build precedure for mergetool documentation in meson-based builds.

* pw/meson-doc-mergetool:
  meson: fix building mergetool docs
Plumb gitk/git-gui build and install procedure in meson based
builds.

* ps/meson-gitk-git-gui:
  meson: wire up gitk and git-gui
Doc update.

* kh/doc-patch-id-4:
  doc: patch-id: see also git-cherry(1)
  doc: patch-id: add script example
  doc: patch-id: emphasize multi-patch processing
The code to accept shallow "git push" has been optimized.

* ps/receive-pack-shallow-optim:
  commit: use commit graph in `lookup_commit_reference_gently()`
  commit: make `repo_parse_commit_no_graph()` more robust
  commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
Code clean-up.

* jt/object-file-use-container-of:
  object-file.c: avoid container_of() of a NULL container
  object-file: use `container_of()` to convert from base types
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Mar 3, 2026
@pull pull bot added the ⤵️ pull label Mar 3, 2026
@pull pull bot merged commit 4805bb9 into turkdevops:master Mar 3, 2026
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants