Skip to content

Tags: gitgitgadget/git

Tags

pr-1879/Ethan0456/cleanup/deduplicate-repo-fwd-decl-v1

dir.h: remove duplicate forward declaration of struct repository

From: Abhijeetsingh Meena <abhijeet040403@gmail.com>

The `struct repository;` forward declaration appears twice in `dir.h`:
once at line 10 and again at line 46. This duplication is unnecessary
and likely unintentional.

Removing the second declaration has no impact on compilation, as verified
by a clean build.

Signed-off-by: Abhijeetsingh Meena <abhijeet040403@gmail.com>

Submitted-As: https://lore.kernel.org/git/pull.1879.git.1741705175922.gitgitgadget@gmail.com

v2.49.0-rc2

Verified

This tag was signed with the committer’s verified signature.
gitster Junio C Hamano
Git 2.49-rc2

pr-1819/derrickstolee/path-walk-upstream-v1

PATH WALK II: Add --path-walk option to 'git pack-objects'

Here is a full submission of the --path-walk feature for 'git pack-objects'
and 'git repack'. It's been discussed in an RFC [1], as a future application
for the path walk API [2], and is updated now that --name-hash-version=2
exists (as a replacement for the --full-name-hash option from the RFC) [3].

[1]
https://lore.kernel.org/git/pull.1813.v2.git.1729431810.gitgitgadget@gmail.com/

[2]
https://lore.kernel.org/git/pull.1818.git.1730356023.gitgitgadget@gmail.com

[3]
https://lore.kernel.org/git/pull.1813.git.1728396723.gitgitgadget@gmail.com

This patch series does the following:

 1. Add a new '--path-walk' option to 'git pack-objects' that uses the
    path-walk API instead of the revision API to collect objects for delta
    compression.

 2. Add a new '--path-walk' option to 'git repack' to pass this option along
    to 'git pack-objects'.

 3. Add a new 'pack.usePathWalk' config option to opt into this option
    implicitly, such as in 'git push'.

 4. Optimize the '--path-walk' option using threading so it better competes
    with the existing multi-threaded delta compression mechanism.

 5. Update the path-walk API with a new 'edge_aggressive' option that pairs
    close to the --edge-aggressive option in the revision API. This is
    useful when creating thin packs inside shallow clones.

This feature works by using the path-walk API to emit groups of objects that
appear at the same path. These groups are tracked so they can be tested for
delta compression with each other, and then after those groups are tested a
second pass using the name-hash attempts to find better (or first time)
deltas across path boundaries. This second pass is much faster than a fresh
pass since the existing deltas are used as a limit for the size of
potentially new deltas, short-circuiting the checks when the delta size
exceeds the current-best.

The benefits of the --path-walk feature first come into play when the name
hash functions have many collisions, so sorting by name hash value leads to
unhelpful groupings of objects. Many of these benefits are improved by
--name-hash-version=2, but collisions still exist with any hash-based
approach. There are also performance benefits in some cases due to the
isolation of delta compression testing within path groups.

All of the benefits of the --path-walk feature are less dramatic when
compared to --name-hash-version=2, but they can still exist in many cases. I
have also seen some cases where --name-hash-version=2 compresses better than
--path-walk with --name-hash-version=1, but these options can be combined to
get the best of both worlds.

Detailed statistics are provided within patch messages, but a few are
highlighted here:

The microsoft/fluentui is a public Javascript repo that suffers from many of
the name hash collisions as internal repositories I've worked with. Here is
a comparison of the compressed size and end-to-end time of the repack:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             439.4M      87.24s
Hash v2             161.7M      21.51s
Path Walk           142.5M      28.16s

Less dramatic, but perhaps more standardly structured is the nodejs/node
repository, with these stats:

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                739.9M      71.18s
Hash v2                764.6M      67.82s
Path Walk              698.0M      75.10s

Even the Linux kernel repository gains some benefits, even though the number
of hash collisions is relatively low due to a preference for short
filenames:

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                  2.5G     554.41s
Hash v2                  2.5G     549.62s
Path Walk                2.2G     559.00s

The drawbacks of the --path-walk feature is that it will be harder to
integrate it with bitmap features, specifically delta islands. This is not
insurmountable, but would require more work, such as a revision walk to
paint objects with reachability information before using that during delta
computations.

However, there should still be significant benefits to Git clients trying to
save space and improve local performance.

This feature was shipped with similar features in microsoft/git as of
v2.47.0.vfs.0.3 [4]. This was used in CI machines for an internal monorepo
that had significant repository growth due to constructing a batch of
beachball [5] CHANGELOG.[md|json] files and pushing them to a release
branch. These pushes were frequently 70-200 MB due to poor delta
compression. Using the 'pack.usePathWalk=true' config, these pushes dropped
in size by 100x while improving performance. Since these CI machines were
working with a shallow clone, the 'edge_aggressive' changes were required to
enable the path-walk option.

[4] https://github.com/microsoft/git/releases/tag/v2.47.0.vfs.0.3

[5] https://github.com/microsoft/beachball

This version incorporates feedback from previous RFCs and reviewed patch
series whenever possible. It also benefits from learned experience, much of
which was already applied in the original path-walk API submission.

Thanks, -Stolee

Derrick Stolee (13):
  pack-objects: extract should_attempt_deltas()
  pack-objects: add --path-walk option
  pack-objects: update usage to match docs
  p5313: add performance tests for --path-walk
  pack-objects: introduce GIT_TEST_PACK_PATH_WALK
  t5538: add tests to confirm deltas in shallow pushes
  repack: add --path-walk option
  pack-objects: enable --path-walk via config
  scalar: enable path-walk during push via config
  pack-objects: refactor path-walk delta phase
  pack-objects: thread the path-based compression
  path-walk: add new 'edge_aggressive' option
  pack-objects: allow --shallow and --path-walk

 Documentation/config/feature.adoc          |   4 +
 Documentation/config/pack.adoc             |   8 +
 Documentation/git-pack-objects.adoc        |  25 +-
 Documentation/git-repack.adoc              |  14 +-
 Documentation/technical/api-path-walk.adoc |   9 +
 builtin/pack-objects.c                     | 411 +++++++++++++++++++--
 builtin/repack.c                           |   7 +-
 pack-objects.h                             |  12 +
 path-walk.c                                |   6 +-
 path-walk.h                                |   7 +
 repo-settings.c                            |   3 +
 repo-settings.h                            |   1 +
 scalar.c                                   |   1 +
 t/README                                   |   4 +
 t/helper/test-path-walk.c                  |   2 +
 t/perf/p5313-pack-objects.sh               |  37 +-
 t/t0411-clone-from-partial.sh              |   6 +
 t/t0450/adoc-help-mismatches               |   1 -
 t/t5300-pack-object.sh                     |  19 +
 t/t5306-pack-nobase.sh                     |   5 +
 t/t5310-pack-bitmaps.sh                    |  13 +-
 t/t5316-pack-delta-depth.sh                |   9 +-
 t/t5332-multi-pack-reuse.sh                |   7 +
 t/t5538-push-shallow.sh                    |  34 ++
 t/t6601-path-walk.sh                       |  20 +
 t/t7406-submodule-update.sh                |   3 +
 26 files changed, 601 insertions(+), 67 deletions(-)

base-commit: a36e024

Submitted-As: https://lore.kernel.org/git/pull.1819.git.1741571455.gitgitgadget@gmail.com

pr-1878/jnavila/fix_empty_line_before_block-v1

doc: add a blank line around block delimiters

From: =?UTF-8?q?Jean-No=C3=ABl=20Avila?= <jn.avila@free.fr>

The documentation is using the historical mode for titles, which is a
setext-style (i.e., two-line) section title.

The issue with this mode is that starting block delimiters (e.g.,
`----`) can be confused with a section title when they are exactly the
same length as the preceding line. In the original documentation, this
is taken care of for English by the writer, but it is not the case for
translations where these delimiters are hidden. A translator can
generate a line that is exactly the same length as the following block
delimiter, which leads to this line being considered as a title.

To safeguard against this issue, add a blank line before and after
block delimiters where block is at root level, else add a "+" line
before block delimiters to link it to the preceding paragraph.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>

Submitted-As: https://lore.kernel.org/git/pull.1878.git.1741549511665.gitgitgadget@gmail.com

pr-1877/emilyyyylime/fix-archive-remote-segfault-v1

archive: error instead of triggering a segfault in `git archive --rem…

…ote=""`

From: emilylime <emilyyyylime+git@gmail.com>

Signed-off-by: emilylime <emilyyyylime+git@gmail.com>

Submitted-As: https://lore.kernel.org/git/pull.1877.git.1741515155475.gitgitgadget@gmail.com

pr-1876/newren/fix-break-follow-v1

diffcore-rename: fix BUG when break detection and --follow used together

From: Elijah Newren <newren@gmail.com>

Prior to commit 9db2ac5 (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order.  The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons.  (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.)  However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap.  Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst().  This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:

		else if (options->single_follow &&
			 strcmp(options->single_follow, p->two->path))
			continue; /* not interested */

which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held.  That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details.  Anyway...

When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array.  So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.

It turns out that making a testcase to trigger this is a bit challenging
too.  I added a simple testcase which tickles the necessary area, but
running it normally actually passes for me.  However, running it under
valgrind shows that it is depending upon uninitialized memory.  I
suspect that to get a reliable reproduction case, I might need to have
several more paths involved, but that might make the testcase more
difficult to understand.  So, I instead just embedded a warning within
the testname that the test triggered uninitialized memory use.

In short, when these two rare options are used together, fix the
accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array), by adding a little
more care around the recorded indices for break_idx.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Elijah Newren <newren@gmail.com>

Submitted-As: https://lore.kernel.org/git/pull.1876.git.1741395615315.gitgitgadget@gmail.com

pr-1875/newren/endit-new-features-v1

Small new merge-ort features, prepping for deletion of merge-recursiv…

…e.[ch]

I've got 19 patches covering the work needed to prep for and allow us to
delete merge-recursive.[ch], and remap 'recursive' to 'ort', including some
clean-up along the way. I've tried to divide it up into five smaller patch
series.

These 3 patches are the first of those series, and each of these 3 patches
provide a small new feature that together will be used to allow us to
convert some callers over from recursive to ort. If the third patch,
introducing merge_ort_generic(), doesn't make sense to submit without one of
its new callers, I can extend this series to 6 patches and include the
conversion of git-am.sh.

Elijah Newren (3):
  merge-ort: add new merge_ort_generic() function
  merge-ort: allow rename detection to be disabled
  merge-ort: support having merge verbosity be set to 0

 Documentation/merge-strategies.adoc | 12 +++---
 merge-ort-wrappers.c                | 64 +++++++++++++++++++++++++++++
 merge-ort-wrappers.h                | 12 ++++++
 merge-ort.c                         | 24 ++++++++---
 merge-ort.h                         |  5 +++
 5 files changed, 106 insertions(+), 11 deletions(-)

base-commit: a36e024

Submitted-As: https://lore.kernel.org/git/pull.1875.git.1741362522.gitgitgadget@gmail.com

pr-1873/newren/endit-fix-funny-rename-v1

merge-ort: fix a crash in process_renames for a file transitively ren…

…amed to itself

Maintainer note: this is not a recent regression; it need not be included in
v2.49.0.

This is a follow-up to Dmitry's report of a case where both merge-ort and
merge-recursive fail to properly handle a very specific type of conflict.
See
https://lore.kernel.org/git/20240921024533.15249-2-dgoncharov@users.sf.net/
for the earlier thread; the first patch is an adaptation of his
demonstrating the issue, and the second patch is my fix for it.

Dmitry Goncharov (1):
  t6423: add a testcase causing a failed assertion in process_renames

Elijah Newren (1):
  merge-ort: fix slightly overzealous assertion for rename-to-self

 merge-ort.c                         |  3 ++-
 t/t6423-merge-rename-directories.sh | 41 +++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

base-commit: f93ff17

Submitted-As: https://lore.kernel.org/git/pull.1873.git.1741275027.gitgitgadget@gmail.com

pr-1872/bgw/bgw/diff-describe-optional-locks-v1

describe and diff: implement --no-optional-locks

git describe and git diff may update the index in the background for similar
performance reasons to git-status. These patches implement the
--no-optional-locks option for these commands, which allows scripts to
bypass this behavior.

I'm implementing this as a solution to a stale lockfile issue we've
sporadically encountered due to a build script that runs git describe. We're
mitigating this issue by using libgit2 in our build script, which does not
have the same background update behavior for its git_describe_workdir
implementation, but it would be nice to have this option supported in the
git CLI.

Tests and documentation changes are included!

Benjamin Woodruff (2):
  describe: implement --no-optional-locks
  diff: implement --no-optional-locks

 Documentation/config/diff.adoc     |  4 ++-
 Documentation/git-describe.adoc    | 12 ++++++++
 Documentation/git.adoc             |  4 ++-
 builtin/describe.c                 | 12 ++++----
 builtin/diff.c                     |  4 +++
 t/meson.build                      |  1 +
 t/t4070-diff-auto-refresh-index.sh | 46 ++++++++++++++++++++++++++++++
 t/t6120-describe.sh                |  8 ++++++
 8 files changed, 84 insertions(+), 7 deletions(-)
 create mode 100755 t/t4070-diff-auto-refresh-index.sh

base-commit: e969bc8

Submitted-As: https://lore.kernel.org/git/pull.1872.git.1741240685.gitgitgadget@gmail.com

pr-1867/dscho/g4w-hot-fixes-v2

Hot fixes from Git for Windows v2.49.0-rc0

I needed many patches to make Git for Windows v2.49.0-rc0 compile and run
the many, many CI jobs successfully. These here patches even apply to
upstream Git. (Technically, the Meson sorting patch is not required to
compile, but it was the fall-out from many required adjustments to make the
Meson jobs happy.)

Changes since v1:

 * I spent two hours investigating under which circumstances the (correct)
   compiler error about a non-writable pw_gecos field triggers, and
   augmented the commit message accordingly.
 * Since -rc1, another breaking change necessitated yet another hot fix,
   which I invites to this patch fest (and the patch series hence had to be
   rebased to the current tip of Git's main branch).

Johannes Schindelin (3):
  ident: stop assuming that `gw_gecos` is writable
  meson: fix sorting
  cmake: generalize the handling of the `CLAR_TEST_OBJS` list

 contrib/buildsystems/CMakeLists.txt | 12 ++++++++----
 ident.c                             |  2 +-
 meson.build                         |  2 +-
 3 files changed, 10 insertions(+), 6 deletions(-)

base-commit: e969bc8

Submitted-As: https://lore.kernel.org/git/pull.1867.v2.git.1741256780.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1867.git.1740671049.gitgitgadget@gmail.com