Skip to content

[pull] master from git:master#183

Merged
pull[bot] merged 47 commits intoturkdevops:masterfrom
git:master
Mar 25, 2026
Merged

[pull] master from git:master#183
pull[bot] merged 47 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Mar 25, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ychin and others added 30 commits March 3, 2026 08:43
After a diff algorithm has been run, the compaction phase
(xdl_change_compact()) shifts and merges change groups to produce a
cleaner output. However, this shifting could create a new matched group
where both sides now have matching lines. This results in a
wrong-looking diff output which contains redundant lines that are the
same on both files.

Fix this by detecting this situation, and re-diff the texts on each side
to find similar lines, using the fall-back Myer's diff. Only do this for
histogram diff as it's the only algorithm where this is relevant. Below
contains an example, and more details.

For an example, consider two files below:

    file1:
        A

        A
        A
        A

        A
        A
        A

    file2:
        A

        A
        x
        A

        A
        A
        A

When using Myer's diff, the algorithm finds that only the "x" has been
changed, and produces a final diff result (these are line diffs, but
using word-diff syntax for ease of presentation):

        A A[-A-]{+x+}A AAA

When using histogram diff, the algorithm first discovers the LCS "A
AAA", which it uses as anchor, then produces an intermediate diff:

        {+A Ax+}A AAA[- AAA-].

This is a longer diff than Myer's, but it's still self-consistent.
However, the compaction phase attempts to shift the first file's diff
group upwards (note that this shift crosses the anchor that histogram
had used), leading to the final results for histogram diff:

        [-A AA-]{+A Ax+}A AAA

This is a technically correct patch but looks clearly redundant to a
human as the first 3 lines should not be in the diff.

The fix would detect that a shift has caused matching to a new group,
and re-diff the "A AA" and "A Ax" parts, which results in "A A"
correctly re-marked as unchanged. This creates the now correct histogram
diff:

        A A[-A-]{+x+}A AAA

This issue is not applicable to Myer's diff algorithm as it already
generates a minimal diff, which means a shift cannot result in a smaller
diff output (the default Myer's diff in xdiff is not guaranteed to be
minimal for performance reasons, but it typically does a good enough
job).

It's also not applicable to patience diff, because it uses only unique
lines as anchor for its splits, and falls back to Myer's diff within
each split. Shifting requires both ends having the same lines, and
therefore cannot cross the unique line boundaries established by the
patience algorithm. In contrast histogram diff uses non-unique lines as
anchors, and therefore shifting can cross over them.

This issue is rare in a normal repository. Below is a table of
repositories (`git log --no-merges -p --histogram -1000`), showing how
many times a re-diff was done and how many times it resulted in finding
matching lines (therefore addressing this issue) with the fix. In
general it is fewer than 1% of diff's that exhibit this offending
behavior:

| Repo (1k commits)  | Re-diff | Found matching lines |
|--------------------|---------|----------------------|
| llvm-project       |  45     | 11                   |
| vim                | 110     |  9                   |
| git                |  18     |  2                   |
| WebKit             | 168     |  1                   |
| ripgrep            |  22     |  1                   |
| cpython            |  32     |  0                   |
| vscode             |  13     |  0                   |

Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have a "add-patch.c" code file, its declarations are part of
"add-interactive.h". This makes it somewhat harder than necessary to
find relevant code and to identify clear boundaries between the two
subsystems.

Split up concerns and move declarations that relate to "add-patch.c"
into a new "add-patch.h" header.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct add_p_opt` is reused both by our infra for "git add -p" and
"git add -i". Users of `run_add_i()` for example are expected to pass
`struct add_p_opt`. This is somewhat confusing and raises the question
of which options apply to what part of the stack.

But things are even more confusing than that: while callers are expected
to pass in `struct add_p_opt`, these options ultimately get used to
initialize a `struct add_i_state` that is used by both subsystems. So we
are basically going full circle here.

Refactor the code and split out a new `struct interactive_options` that
hosts common options used by both. These options are then applied to a
`struct interactive_config` that hosts common configuration.

This refactoring doesn't yet fully detangle the two subsystems from one
another, as we still end up calling `init_add_i_state()` in the "git add
-p" subsystem. This will be fixed in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit we have split out interactive configuration
that is used by both "git add -p" and "git add -i". But we still
initialize that configuration in the "add -p" subsystem by calling
`init_add_i_state()`, even though we only do so to initialize the
interactive configuration as well as a repository pointer.

Stop doing so and instead store and initialize the interactive
configuration in `struct add_p_state` directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With `run_add_p()` callers have the ability to apply changes from a
specific revision to a repository's index. This infra supports several
different modes, like for example applying changes to the index,
working tree or both.

One feature that is missing though is the ability to apply changes to an
in-memory index different from the repository's index. Add a new
function `run_add_p_index()` to plug this gap.

This new function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "add-patch" mode allows the user to edit hunks to apply custom
changes. This is incompatible with a new `git history split` command
that we're about to introduce in a subsequent commit, so we need a way
to disable this mode.

Add a new flag to disable editing hunks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `write_in_core_index_as_tree()` takes a repository and
writes its index into a tree object. What this function cannot do though
is to take an _arbitrary_ in-memory index.

Introduce a new `struct index_state` parameter so that the caller can
pass a different index than the one belonging to the repository. This
will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce a new command that splits up
a commit into two. Most of the logic will be shared with rewording
commits, except that we also need to have control over the parents and
the old/new trees.

Extract a new function `commit_tree_with_edited_message_ext()` to
prepare for this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite a common use case that one wants to split up one commit into
multiple commits by moving parts of the changes of the original commit
out into a separate commit. This is quite an involved operation though:

  1. Identify the commit in question that is to be dropped.

  2. Perform an interactive rebase on top of that commit's parent.

  3. Modify the instruction sheet to "edit" the commit that is to be
     split up.

  4. Drop the commit via "git reset HEAD~".

  5. Stage changes that should go into the first commit and commit it.

  6. Stage changes that should go into the second commit and commit it.

  7. Finalize the rebase.

This is quite complex, and overall I would claim that most people who
are not experts in Git would struggle with this flow.

Introduce a new "split" subcommand for git-history(1) to make this way
easier. All the user needs to do is to say `git history split $COMMIT`.
From hereon, Git asks the user which parts of the commit shall be moved
out into a separate commit and, once done, asks the user for the commit
message. Git then creates that split-out commit and applies the original
commit on top of it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for '-U<n>' implies that the numeric value '<n>' is
mandatory. However, the command line parser has historically accepted
'-U' without a number.

Strictly requiring a number for '-U' would break existing tests
(e.g., in 't4013') and likely disrupt user scripts relying on this
undocumented behavior.

Hence we retain this fallback behavior for backward compatibility, but
document it as such.

Signed-off-by: Tian Yuchen <cat@malon.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6206089 (commit: write commits for both hashes, 2023-10-01),
`sign_with_header()` was removed, but its forward declaration in
"commit.h" was left. Remove the unused declaration.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sign_commit_to_strbuf()` helper in "commit.c" provides fallback
logic to get the default configured signing key when a key is not
provided and handles generating the commit signature accordingly. This
signing operation is not really specific to commits as any arbitrary
buffer can be signed. Also, in a subsequent commit, this same logic is
reused by git-fast-import(1) when signing commits with invalid
signatures.

Remove the `sign_commit_to_strbuf()` helper from "commit.c" and extend
`sign_buffer()` in "gpg-interface.c" to support using the default key as
a fallback when the `SIGN_BUFFER_USE_DEFAULT_KEY` flag is provided. Call
sites are updated accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With git-fast-import(1), handling of signed commits is controlled via
the `--signed-commits=<mode>` option. When an invalid signature is
encountered, a user may want the option to sign the commit again as
opposed to just stripping the signature. To facilitate this, introduce a
"sign-if-invalid" mode for the `--signed-commits` option. Optionally, a
key ID may be explicitly provided in the form
`sign-if-invalid[=<keyid>]` to specify which signing key should be used
when signing invalid commit signatures.

Note that to properly support interoperability mode when signing commit
signatures, the commit buffer must be created in both the repository and
compatability object formats to generate the appropriate signatures
accordingly. As currently implemented, the commit buffer for the
compatability object format is not reconstructed and thus signing
commits in interoperability mode is not yet supported. Support may be
added in the future.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delete_object helper currently relies on a manual sed command to
calculate object paths. This works, but it's a bit brittle and forces
us to maintain shell logic that Git's own test suite can already
handle more elegantly.

Switch to 'test_oid_to_path' to let Git handle the path logic. This
makes the helper hash independent, which is much cleaner than manual
string manipulation. While at it, use 'local' to declare helper-specific
variables and quote them to follow Git's coding style. This prevents
them from leaking into global shell scope and avoids potential naming
conflicts with other parts of the test suite.

Helped-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-upload-pack(1) writes packfile data to the client we have some
logic in place that buffers some partial lines. When that buffer still
contains data after git-pack-objects(1) has finished we flush the buffer
so that all remaining bytes are sent out.

Curiously, when we do so we also print the string "flushed." to stderr.
This statement has been introduced in b1c71b7 (upload-pack: avoid
sending an incomplete pack upon failure, 2006-06-20), so quite a while
ago. What's interesting though is that stderr is typically spliced
through to the client-side, and consequently the client would see this
message. Munging the way how we do the caching indeed confirms this:

  $ git clone file:///home/pks/Development/linux/
  Cloning into bare repository 'linux.git'...
  remote: Enumerating objects: 12980346, done.
  remote: Counting objects: 100% (131820/131820), done.
  remote: Compressing objects: 100% (50290/50290), done.
  remote: Total 12980346 (delta 96319), reused 104500 (delta 81217), pack-reused 12848526 (from 1)
  Receiving objects: 100% (12980346/12980346), 3.23 GiB | 57.44 MiB/s, done.
  flushed.
  Resolving deltas: 100% (10676718/10676718), done.

It's quite clear that this string shouldn't ever be visible to the
client, so it rather feels like this is a left-over debug statement. The
menitoned commit doesn't mention this line, either.

Remove the debug output to prepare for a change in how we do the
buffering in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `create_pack_file()` is responsible for sending the
packfile data to the client of git-upload-pack(1). As generating the
bytes may take significant computing resources we also have a mechanism
in place that optionally sends keepalive pktlines in case we haven't
sent out any data.

The keepalive logic is purely based poll(3p): we pass a timeout to that
syscall, and if the call times out we send out the keepalive pktline.
While reasonable, this logic isn't entirely sufficient: even if the call
to poll(3p) ends because we have received data on any of the file
descriptors we may not necessarily send data to the client.

The most important edge case here happens in `relay_pack_data()`. When
we haven't seen the initial "PACK" signature from git-pack-objects(1)
yet we buffer incoming data. So in the worst case, if each of the bytes
of that signature arrive shortly before the configured keepalive
timeout, then we may not send out any data for a time period that is
(almost) four times as long as the configured timeout.

This edge case is rather unlikely to matter in practice. But in a
subsequent commit we're going to adapt our buffering mechanism to become
more aggressive, which makes it more likely that we don't send any data
for an extended amount of time.

Adapt the logic so that instead of using a fixed timeout on every call
to poll(3p), we instead figure out how much time has passed since the
last-sent data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the sideband in git-upload-pack(1) we know to send out
keepalive packets in case generating the pack takes too long. These
keepalives take the form of a simple empty pktline.

In the preceding commit we have adapted git-upload-pack(1) to buffer
data more aggressively before sending it to the client. This creates an
obvious optimization opportunity: when we hit the keepalive timeout
while we still hold on to some buffered data, then it makes more sense
to flush out the data instead of sending the empty keepalive packet.

This is overall not going to be a significant win. Most keepalives will
come before the pack data starts, and once pack-objects starts producing
data, it tends to do so pretty consistently. And of course we can't send
data before we see the PACK header, because the whole point is to buffer
the early bit waiting for packfile URIs. But the optimization is easy
enough to realize.

Do so and flush out data instead of sending an empty pktline.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our production systems we have recently observed write contention in
git-upload-pack(1). The system in question was consistently streaming
packfiles at a rate of dozens of gigabits per second, but curiously the
system was neither bottlenecked on CPU, memory or IOPS.

We eventually discovered that Git was spending 80% of its time in
`pipe_write()`, out of which almost all of the time was spent in the
`ep_poll_callback` function in the kernel. Quoting the reporter:

  This infrastructure is part of an event notification queue designed to
  allow for multiple producers to emit events, but that concurrency
  safety is guarded by 3 layers of locking. The layer we're hitting
  contention in uses a simple reader/writer lock mode (a.k.a. shared
  versus exclusive mode), where producers need shared-mode (read mode),
  and various other actions use exclusive (write) mode.

The system in question generates workloads where we have hundreds of
git-upload-pack(1) processes active at the same point in time. These
processes end up contending around those locks, and the consequence is
that the Git processes stall.

Now git-upload-pack(1) already has the infrastructure in place to buffer
some of the data it reads from git-pack-objects(1) before actually
sending it out. We only use this infrastructure in very limited ways
though, so we generally end up matching one read(3p) call with one
write(3p) call. Even worse, when the sideband is enabled we end up
matching one read with _two_ writes: one for the pkt-line length, and
one for the packfile data.

Extend our use of the buffering infrastructure so that we soak up bytes
until the buffer is filled up at least 2/3rds of its capacity. The
change is relatively simple to implement as we already know to flush the
buffer in `create_pack_file()` after git-pack-objects(1) has finished.

This significantly reduces the number of write(3p) syscalls we need to
do. Before this change, cloning the Linux repository resulted in around
400,000 write(3p) syscalls. With the buffering in place we only do
around 130,000 syscalls.

Now we could of course go even further and make sure that we always fill
up the whole buffer. But this might cause an increase in read(3p)
syscalls, and some tests show that this only reduces the number of
write(3p) syscalls from 130,000 to 100,000. So overall this doesn't seem
worth it.

Note that the issue could also be fixed by adapting the write buffer
that we use in the downstream git-pack-objects(1) command, and such a
change would have roughly the same result. But the command that
generates the packfile data may not always be git-pack-objects(1) as it
can be changed via "uploadpack.packObjectsHook", so such a fix would
only help in _some_ cases. Regardless of that, we'll also adapt the
write buffer size of git-pack-objects(1) in a subsequent commit.

Helped-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we're going to add the first caller to
writev(3p). Introduce a compatibility wrapper for this syscall that we
can use on systems that don't have this syscall.

The syscall exists on modern Unixes like Linux and macOS, and seemingly
even for NonStop according to [1]. It doesn't seem to exist on Windows
though.

[1]: http://nonstoptools.com/manuals/OSS-SystemCalls.pdf
[2]: https://www.gnu.org/software/gnulib/manual/html_node/writev.html

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have added a compatibility wrapper for the
writev(3p) syscall. Introduce some generic wrappers for this function
that we nowadays take for granted in the Git codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every pktline that we send out via `send_sideband()` currently requires
two syscalls: one to write the pktline's length, and one to send its
data. This typically isn't all that much of a problem, but under extreme
load the syscalls may cause contention in the kernel.

Refactor the code to instead use the newly introduced writev(3p) infra
so that we can send out the data with a single syscall. This reduces the
number of syscalls from around 133,000 calls to write(3p) to around
67,000 calls to writev(3p).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new `hashfd_ext()` function that takes an options structure.
This function will replace `hashd_throughput()` in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `hashfd_throughput()` function is used by a single callsite in
git-pack-objects(1). In contrast to `hashfd()`, this function uses a
progress meter to measure throughput and a smaller buffer length so that
the progress meter can provide more granular metrics.

We're going to change that caller in the next commit to be a bit more
specific to packing objects. As such, `hashfd_throughput()` will be a
somewhat unfitting mechanism for any potential new callers.

Drop the function and replace it with a call to `hashfd_ext()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running `git pack-objects --stdout` we feed the data through
`hashfd_ext()` with a progress meter and a smaller-than-usual buffer
length of 8kB so that we can track throughput more granularly. But as
packfiles tend to be on the larger side, this small buffer size may
cause a ton of write(3p) syscalls.

Originally, the buffer we used in `hashfd()` was 8kB for all use cases.
This was changed though in 2ca245f (csum-file.h: increase hashfile
buffer size, 2021-05-18) because we noticed that the number of writes
can have an impact on performance. So the buffer size was increased to
128kB, which improved performance a bit for some use cases.

But the commit didn't touch the buffer size for `hashd_throughput()`.
The reasoning here was that callers expect the progress indicator to
update frequently, and a larger buffer size would of course reduce the
update frequency especially on slow networks.

While that is of course true, there was (and still is, even though it's
now a call to `hashfd_ext()`) only a single caller of this function in
git-pack-objects(1). This command is responsible for writing packfiles,
and those packfiles are often on the bigger side. So arguably:

  - The user won't care about increments of 8kB when packfiles tend to
    be megabytes or even gigabytes in size.

  - Reducing the number of syscalls would be even more valuable here
    than it would be for multi-pack indices, which was the benchmark
    done in the mentioned commit, as MIDXs are typically significantly
    smaller than packfiles.

  - Nowadays, many internet connections should be able to transfer data
    at a rate significantly higher than 8kB per second.

Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`,
which translates to ~64kB. This limit was chosen because `git
pack-objects --stdout` is most often used when sending packfiles via
git-upload-pack(1), where packfile data is chunked into pktlines when
using the sideband. Furthermore, most internet connections should have a
bandwidth signifcantly higher than 64kB/s, so we'd still be able to
observe progress updates at a rate of at least once per second.

This change significantly reduces the number of write(3p) syscalls from
355,000 to 44,000 when packing the Linux repository. While this results
in a small performance improvement on an otherwise-unused system, this
improvement is mostly negligible. More importantly though, it will
reduce lock contention in the kernel on an extremely busy system where
we have many processes writing data at once.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-refs tests previously used raw 'test -f' and 'test -e' checks
with negation. Update them to use Git's standard helper function
test_path_is_missing for consistency and clearer failure reporting.

As suggested in review, replaced the negated 'test_path_exists' with
test_path_is_missing to better reflect the expected absence of paths.

Signed-off-by: Ritesh Singh Jadoun <riteshjd75@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
…/cocci-do-not-pass-strbuf-by-value

* dd/list-objects-filter-options-wo-strbuf-split:
  list-objects-filter-options: avoid strbuf_split_str()
  worktree: do not pass strbuf by value
Passing a struct strbuf by value to a function copies the struct
but shares the underlying character array between caller and callee.
If the callee causes a reallocation, the caller's copy becomes a
dangling pointer, leading to a double-free when strbuf_release() is
called.  There is no coccinelle rule to catch this pattern.

Jeff King suggested adding one during review of the
write_worktree_linking_files() fix [1], and noted that a reporting
rule using coccinelle's Python scripting extensions could emit a
descriptive warning, but we do not currently require Python support
in coccinelle.

Add a transformation rule that rewrites a by-value strbuf parameter
to a pointer.  The detection is identical to what a Python-based
reporting rule would catch; only the presentation differs.  The
resulting diff will not produce compilable code on its own (callers
and the function body still need updating), but the spatch output
alerts the developer that the signature needs attention.  This is
consistent with the other rules in strbuf.cocci, which also rewrite
to the preferred form.

[1] https://lore.kernel.org/git/20260309192600.GC309867@coredump.intra.peff.net/

Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
save_untracked_files() takes its 'files' parameter as struct strbuf
by value.  Passing a strbuf by value copies the struct but shares
the underlying buffer between caller and callee, risking a dangling
pointer and double-free if the callee reallocates.

The function needs both the buffer and its length for
pipe_command(), so a plain const char * is not sufficient here.
Switch the parameter to struct strbuf * and update the caller to
pass a pointer.

Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update clar to e4172e3 (Merge pull request #134 from
clar-test/ethomson/const, 2026-01-10). Besides some changes to
"generate.py" which don't have any impact on us, this commit also fixes
compilation on platforms that don't have PATH_MAX, like for example
GNU/Hurd.

Reported-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The example provided has its arguments in the wrong order. The revision
should follow the pattern, and not the other way around.

Signed-off-by: Guillaume Jacob <guillaume@absolut-sensing.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mroik and others added 17 commits March 16, 2026 12:43
The "large exclude file ignored in tree" test fails. This is due to an
additional warning message that is generated in the test. "warning:
unable to access 'subdir/.gitignore': Too many levels of symbolic
links", the extra warning that is not supposed to be there, happens
because of some leftover files left by previous tests.

To fix this we improve cleanup on "symlinks not respected in-tree", and
because the tests in t0008 in general have poor cleanup, at the start of
"large exclude file ignored in tree" we search for any leftover
.gitignore and remove them before starting the test.

Improve post-test cleanup and add pre-test cleanup to make sure that we
have a workable environment for the test.

Signed-off-by: Mirko Faina <mroik@delayed.space>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace old-style path existence checks in t4200-rerere.sh with
the appropriate test_path_* helper functions. These helpers provide
clearer diagnostic messages on failure than the raw shell test
builtin.

Signed-off-by: Prashant S Bisht <prashantjee2025@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We retrieve config values with repo_config_get_string(), which will
allocate a new copy of the string for us. But we don't hold on to those
strings, since they are just fed to git_config_colorbool() and
color_parse(). But nor do we free them, which means they leak.

We can fix this by using the "_tmp" form of repo_config_get_string(),
which just hands us a pointer directly to the internal storage. This is
OK for our purposes, since we don't need it to last for longer than our
parsing calls.

Two interesting side notes here:

  1. Many types already have a repo_config_get_X() variant that handles
     this for us (e.g., repo_config_get_bool()). But neither colorbools
     nor colors themselves have such helpers. We might think about
     adding them, but converting all callers is a larger task, and out
     of scope for this fix.

  2. As far as I can tell, this leak has been there since 960786e
     (push: colorize errors, 2018-04-21), but wasn't detected by LSan in
     our test suite. It started triggering when we applied dd3693e
     (transport-helper, connect: use clean_on_exit to reap children on
     abnormal exit, 2026-03-12) which is mostly unrelated.

     Even weirder, it seems to trigger only with clang (and not gcc),
     and only with GIT_TEST_DEFAULT_REF_FORMAT=reftable. So I think this
     is another odd case where the pointers happened to be hanging
     around in stack memory, but changing the pattern of function calls
     in nearby code was enough for them to be incidentally overwritten.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fast-import" learned to optionally replace signature on
commits whose signatures get invalidated due to replaying by
signing afresh.

* jt/fast-import-sign-again:
  fast-import: add mode to sign commits with invalid signatures
  gpg-interface: allow sign_buffer() to use default signing key
  commit: remove unused forward declaration
Test clean-up.

* ss/t0410-delete-object-cleanup:
  t0410: modernize delete_object helper
"git history" learned the "split" subcommand.

* ps/history-split:
  builtin/history: implement "split" subcommand
  builtin/history: split out extended function to create commits
  cache-tree: allow writing in-memory index as tree
  add-patch: allow disabling editing of hunks
  add-patch: add support for in-memory index patching
  add-patch: remove dependency on "add-interactive" subsystem
  add-patch: split out `struct interactive_options`
  add-patch: split out header from "add-interactive.h"
Fix an example in the user-manual.

* gj/user-manual-fix-grep-example:
  doc: fix git grep args order in Quick Reference
Clar (unit testing framework) update from the upstream.

* ps/clar-wo-path-max:
  clar: update to fix compilation on platforms without PATH_MAX
Test updates.

* rj/pack-refs-tests-path-is-helpers:
  t/pack-refs-tests: use test_path_is_missing
Leakfix.

* jk/transport-color-leakfix:
  transport: plug leaks in transport_color_config()
Test clean-up.

* pb/t4200-test-path-is-helpers:
  t4200: convert test -[df] checks to test_path_* helpers
Test clean-up.

* mf/t0008-cleanup:
  t0008: improve test cleanup to fix failing test
The final clean-up phase of the diff output could turn the result of
histogram diff algorithm suboptimal, which has been corrected.

* yc/histogram-hunk-shift-fix:
  xdiff: re-diff shifted change groups when using histogram algorithm
Reduce system overhead "git upload-pack" spends on relaying "git
pack-objects" output to the "git fetch" running on the other end of
the connection.

* ps/upload-pack-buffer-more-writes:
  builtin/pack-objects: reduce lock contention when writing packfile data
  csum-file: drop `hashfd_throughput()`
  csum-file: introduce `hashfd_ext()`
  sideband: use writev(3p) to send pktlines
  wrapper: introduce writev(3p) wrappers
  compat/posix: introduce writev(3p) wrapper
  upload-pack: reduce lock contention when writing packfile data
  upload-pack: prefer flushing data over sending keepalive
  upload-pack: adapt keepalives based on buffering
  upload-pack: fix debug statement when flushing packfile data
"git diff -U<num>" was too lenient in its command line parsing and
took an empty string as a valid <num>.

* ty/doc-diff-u-wo-number:
  diff: document -U without <n> as using default context
Add a coccinelle rule to break the build when "struct strbuf" gets
passed by value.

* dd/cocci-do-not-pass-strbuf-by-value:
  stash: do not pass strbuf by value
  coccinelle: detect struct strbuf passed by value
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Mar 25, 2026
@pull pull bot added the ⤵️ pull label Mar 25, 2026
@pull pull bot merged commit ce74208 into turkdevops:master Mar 25, 2026
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.