Skip to content

Commit

Permalink
submodule--helper: convert the bulk of cmd_add() to C
Browse files Browse the repository at this point in the history
Introduce the 'add' subcommand to submodule--helper that does all the
work 'submodule add' past the parsing of flags.

While at it, remove the now-unused shell interface to
'resolve_relative_url()' and repurpose it as an internal helper
function.

Since the 'die()' of C prepends the word 'fatal', we modify t7400.6 to
reflect the same.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
  • Loading branch information
tfidfwastaken committed Jun 17, 2021
1 parent 2de56c0 commit 3c46c10
Show file tree
Hide file tree
Showing 3 changed files with 165 additions and 116 deletions.
180 changes: 163 additions & 17 deletions builtin/submodule--helper.c
Original file line number Diff line number Diff line change
Expand Up @@ -197,34 +197,23 @@ static char *relative_url(const char *remote_url,
return strbuf_detach(&sb, NULL);
}

static int resolve_relative_url(int argc, const char **argv, const char *prefix)
static char *resolve_relative_url(const char *url)
{
char *remoteurl = NULL;
char *remote = get_default_remote();
const char *up_path = NULL;
char *res;
const char *url;
char *remote = get_default_remote();
char *remoteurl = NULL;
struct strbuf sb = STRBUF_INIT;

if (argc != 2 && argc != 3)
die("resolve-relative-url only accepts one or two arguments");

url = argv[1];
strbuf_addf(&sb, "remote.%s.url", remote);
free(remote);

if (git_config_get_string(sb.buf, &remoteurl))
/* the repository is its own authoritative upstream */
remoteurl = xgetcwd();

if (argc == 3)
up_path = argv[2];

res = relative_url(remoteurl, url, up_path);
puts(res);
free(res);
res = relative_url(remoteurl, url, NULL);
free(remoteurl);
return 0;
return res;
}

static int resolve_relative_url_test(int argc, const char **argv, const char *prefix)
Expand Down Expand Up @@ -3059,6 +3048,163 @@ static int add_config(int argc, const char **argv, const char *prefix)
return 0;
}

static void die_on_index_match(const char *path, int force)
{
struct pathspec ps;
const char *args[] = { path, NULL };
parse_pathspec(&ps, 0, PATHSPEC_PREFER_CWD, NULL, args);

/* TODO: Should this just be read_cache()? */
read_cache_preload(NULL);

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 18, 2021

Author Owner

I'm not too sure about the difference between read_cache() and read_cache_preload(NULL).

I understand the latter preloads the index (in parallel) according to a pathspec argument, but does that imply the former is doing some kind of on-demand loading? Moreover, does passing NULL to read_cache_preload() preload the whole index?

Which form should be preferred for which use case?

I think I did not understand those functions correctly.

This comment has been minimized.

Copy link
@chriscool

chriscool Jun 18, 2021

Is this function doing the equivalent of the ls-files invocation in shell we already discussed?

If yes, I think it should be tried and discussed as part of your ongoing patch series on the mailing list.

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 18, 2021

Author Owner

I am confident that this is doing the equivalent of the ls-files invocation, based on my observations from trying to trigger the edge cases, along with all the tests passing, including the extra one that I wrote.

However, within ls-files, repo_read_index(the_repository) is used, which is just another name for read_cache().

In Paul's patch that @sivaraam pointed me to, which had previously converted a very similar ls-files invocation, read_cache_preload()was used.

When I tried swapping the two functions, I could not find any change in behaviour.

This comment has been minimized.

Copy link
@chriscool

chriscool Jun 18, 2021

Great! Did you update the patch that you already sent to the mailing list and which needed it?

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 19, 2021

Author Owner

Oh, sorry for not clarifying before.

This particular commit, which replaces the ls-files invocation has not been sent to the mailing list yet.
I thought of sending this after I got some feedback on what is already on the list[1] as I don't want to cause confusion by having multiple series that are very related on the list at the same time.

Once [1] gets a response, I'll send a series which includes this change and ask for feedback about which of the two functions is preferable here.

[1] https://lore.kernel.org/git/20210615145745.33382-1-raykar.ath@gmail.com/

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 19, 2021

I understand the latter preloads the index (in parallel) according to a pathspec argument, but does that imply the former is doing some kind of on-demand loading?

From what I've seen so far, both do on-demand loading as repo_read_index_preload calls pthread_join to ensure all threads terminate before it returns. So, you don't have worry about things happening in the background. It seems that the difference repo_read_index_preload updates in parallel and read_cache does not.

I'm not very sure if there are an other differences to it. I think its best to ask this to the list as @chriscool suggests.

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 21, 2021

Author Owner

It seems that the difference repo_read_index_preload updates in parallel and read_cache does not.

I thought the same, but when I looked at repo_read_index, it eventually calls do_read_index which is also multithreaded code.

I'll probably get to know once I ask the list.

if (ps.nr) {
int i;
char *ps_matched = xcalloc(ps.nr, 1);

/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 18, 2021

Author Owner

This seemed like the right thing to do after observing other parts of the code that iterated through cache entries, although I do not know the full details and the story behind this function.

https://lore.kernel.org/git/pull.906.git.1615929435.gitgitgadget@gmail.com/


/*
* Since there is only one pathspec, we just need
* need to check ps_matched[0] to know if a cache
* entry matched.
*/
for (i = 0; i < active_nr; i++) {
ce_path_match(&the_index, active_cache[i], &ps,
ps_matched);

if (ps_matched[0]) {
if (!force)
die(_("'%s' already exists in the index"),
path);
else if (!S_ISGITLINK(active_cache[i]->ce_mode))
die(_("'%s' already exists in the index "
"and is not a submodule"), path);
break;
}
}
free(ps_matched);
}
}

static void die_on_repo_without_commits(const char *path)
{
struct strbuf sb = STRBUF_INIT;
strbuf_addstr(&sb, path);
if (is_nonbare_repository_dir(&sb)) {
struct object_id oid;
if (resolve_gitlink_ref(path, "HEAD", &oid) < 0)
die(_("'%s' does not have a commit checked out"), path);
}
}

static int module_add(int argc, const char **argv, const char *prefix)
{
int force = 0, quiet = 0, progress = 0, dissociate = 0;
struct add_data add_data = ADD_DATA_INIT;

struct option options[] = {
OPT_STRING('b', "branch", &add_data.branch, N_("branch"),
N_("branch of repository to add as submodule")),
OPT__FORCE(&force, N_("allow adding an otherwise ignored submodule path"),
PARSE_OPT_NOCOMPLETE),
OPT__QUIET(&quiet, N_("print only error messages")),
OPT_BOOL(0, "progress", &progress, N_("force cloning progress")),
OPT_STRING(0, "reference", &add_data.reference_path, N_("repository"),
N_("reference repository")),
OPT_BOOL(0, "dissociate", &dissociate, N_("borrow the objects from reference repositories")),
OPT_STRING(0, "name", &add_data.sm_name, N_("name"),
N_("sets the submodule’s name to the given string "
"instead of defaulting to its path")),
OPT_INTEGER(0, "depth", &add_data.depth, N_("depth for shallow clones")),
OPT_END()
};

const char *const usage[] = {
N_("git submodule--helper add [<options>] [--] [<path>]"),

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 20, 2021

I believe add takes two arguments: the repo to add as submodule and the path to which it should be added. So, the usage could be tweaked to clarify that.

NULL
};

argc = parse_options(argc, argv, prefix, options, usage, 0);

if (!is_writing_gitmodules_ok())
die(_("please make sure that the .gitmodules file is in the working tree"));

if (prefix && *prefix &&
add_data.reference_path && !is_absolute_path(add_data.reference_path))
add_data.reference_path = xstrfmt("%s%s", prefix, add_data.reference_path);

if (argc == 0 || argc > 2) {
usage_with_options(usage, options);

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 20, 2021

This seems to cause an incorrect usage message from the user perspective when no arguments are passed. See below.

Message after conversion:

$ git submodule add
usage: git submodule--helper add [<options>] [--] [<path>]

    -b, --branch <branch>
                          branch of repository to add as submodule
    -f, --force           allow adding an otherwise ignored submodule path
    -q, --quiet           print only error messages
    --progress            force cloning progress
    --reference <repository>
                          reference repository
    --dissociate          borrow the objects from reference repositories
    --name <name>         sets the submodule’s name to the given string instead of defaulting to its path
    --depth <n>           depth for shallow clones

Notice how the usage mentions submodule--helper when indeed the user has invoked the submodule command.

Compare this with the message shown before conversion:

$ git submodule add
usage: git submodule [--quiet] [--cached]
   or: git submodule [--quiet] add [-b <branch>] [-f|--force] [--name <name>] [--reference <repository>] [--] <repository> [<path>]
   or: git submodule [--quiet] status [--cached] [--recursive] [--] [<path>...]
   or: git submodule [--quiet] init [--] [<path>...]
   or: git submodule [--quiet] deinit [-f|--force] (--all| [--] <path>...)
   or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--] [<path>...]
   or: git submodule [--quiet] set-branch (--default|--branch <branch>) [--] <path>
   or: git submodule [--quiet] set-url [--] <path> <newurl>
   or: git submodule [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
   or: git submodule [--quiet] foreach [--recursive] <command>
   or: git submodule [--quiet] sync [--recursive] [--] [<path>...]
   or: git submodule [--quiet] absorbgitdirs [--] [<path>...]

Of course, add isn't the first command that exhibits this behaviour. set-url seems to exhibit the same behaviour when no arguments are passed. So, that might need fixing too. But we could easily do this later. Just note this somewhere so that we don't forget to back to it later :)

} else if (argc == 1) {

This comment has been minimized.

Copy link
@chriscool

chriscool Jun 18, 2021

I don't think the else is needed as usage_with_options() dies.

add_data.repo = argv[0];
add_data.sm_path = guess_dir_name_from_git_url(add_data.repo, 0, 0);
} else {
add_data.repo = argv[0];

This comment has been minimized.

Copy link
@chriscool

chriscool Jun 18, 2021

Maybe you could move both add_data.repo = argv[0]; instructions outside the if (...) {...} else {...}.

add_data.sm_path = xstrdup(argv[1]);
}

if (prefix && *prefix && !is_absolute_path(add_data.sm_path))
add_data.sm_path = xstrfmt("%s%s", prefix, add_data.sm_path);

if (starts_with_dot_dot_slash(add_data.repo) ||
starts_with_dot_slash(add_data.repo)) {
if (prefix)
die(_("relative path can only be used from the toplevel "

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 19, 2021

The original message started with a capital 'R'. We should likely keep it that way in the conversion to be consistent.

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 19, 2021

Also, we don't get the fatal: prefix in the original. I remember Junio pointing out earlier to avoid changes in error messages. So, we could better keep it strictly consistent for now and leave error message changes for later.

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 19, 2021

It looks like this would apply to other messages too. Like the 'some_subm' already exists in the index message that appears further above. Could you take a look at these messages and fix them?

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 21, 2021

Author Owner

Also, we don't get the fatal: prefix in the original. I remember Junio pointing out earlier to avoid changes in error messages. So, we could better keep it strictly consistent for now and leave error message changes for later.

I could not find a way to die() without having the 'fatal:' prefix. It seems to be baked into the C version.

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 21, 2021

Author Owner

I did figure out one way to change it—I could use set_die_routine(my_own_die_func), but adding non-trivial code just so that the error messages match (to what seems to be a less standard format, at that) during conversion does not seem right to me.

I hope you don't mind if I wait and see if there's any objection to the extra 'fatal:' once I send this patch to the list. If it is contentious, then I will make the change.

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 21, 2021

I hope you don't mind if I wait and see if there's any objection to the extra 'fatal:' once I send this patch to the list. If it is contentious, then I will make the change.

I don't mind but as we've seen before that Junio finds it a distraction and his point is right. We could handle message conversion later after the conversion is done. Honestly, you could even do it in the last patch of the series. Just not during the conversion :)

I could not find a way to die() without having the 'fatal:' prefix. It seems to be baked into the C version.

The solution is simple: don't use die() ;-)

You likely won't find any Git helper method that help you here. fprintf with manual call to exit with an error code is the solution for this :-)

This comment has been minimized.

Copy link
@tfidfwastaken

tfidfwastaken Jun 23, 2021

Author Owner

I don't mind but as we've seen before that Junio finds it a distraction and his point is right.

Yes, I agree with his point as well. The reason this particular case felt weird though, is because a reviewer might normally expect a die in shell to be converted to a die() in C, especially because they semantically mean the same thing. So I thought explicitly making a die become an fprintf (or vreportf) followed by an exit() might cause a distraction as well, as it violates most people's default assumption that a die in shell should map to a die() in C.

Another reason is the precedent set by all previous conversions of submodule subcommands that also did a straight die->die() conversion.

You likely won't find any Git helper method that help you here. fprintf with manual call to exit with an error code is the solution for this :-)

Okay, I shall do it. I think the best case might be a middle ground, where I can somehow still have a die helper of my own that does this, so that it's less jarring. I also realized while looking at this, that the C version of die() seems to exit with a different error code than the shell version, so I can fix that as well. I'll try to make it work the same.

Edit: I re-read your comment and I had previously missed out your suggestion of adding a final patch at the end of the series that would switch over the the usual C die(). I'll weigh in that as well.

This comment has been minimized.

Copy link
@chriscool

chriscool Jun 23, 2021

Please don't spend a lot of time on this. It might be better to just do something simple even if it's not the best and just explain it a bit and ask the mailing list.

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 23, 2021

Yes, I agree with his point as well. The reason this particular case felt weird though, is because a reviewer might normally expect a die in shell to be converted to a die() in C, especially because they semantically mean the same thing. So I thought explicitly making a die become an fprintf (or vreportf) followed by an exit() might cause a distraction as well, as it violates most people's default assumption that a die in shell should map to a die() in C.

I used to think that too. But apparently reviewers like Junio are too familiar with Git enough to clearly know the nuances of a call to die from shell and C. So, from their perspective using die appears to be an improper conversion ;-)

@chriscool has a valid point too, though. I just realized there might me many messages that need to be changed which might be wasted effort if we're going to fix that in a later patch. So, setting expectations for the reviewer by mentioning this inconsistency in the appropriate places might be enough?

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 23, 2021

I just realized there might me many messages that need to be changed which might be wasted effort if we're going to fix that in a later patch. So, setting expectations for the reviewer by mentioning this inconsistency in the appropriate places might be enough?

I just noticed you've already achieved the change by using a custom die() routine in your v3a. So, don't mind this part :)

"of the working tree"));

/* dereference source url relative to parent's url */
add_data.realrepo = resolve_relative_url(add_data.repo);
} else if (is_dir_sep(add_data.repo[0]) || strchr(add_data.repo, ':')) {
add_data.realrepo = add_data.repo;
} else {
die(_("repo URL: '%s' must be absolute or begin with ./|../"),
add_data.repo);
}

/*
* normalize path:
* multiple //; leading ./; /./; /../;
*/
normalize_path_copy(add_data.sm_path, add_data.sm_path);
strip_dir_trailing_slashes(add_data.sm_path);

die_on_index_match(add_data.sm_path, force);
die_on_repo_without_commits(add_data.sm_path);

if (!force) {
int exit_code = -1;
struct strbuf sb = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
cp.git_cmd = 1;
cp.no_stdout = 1;
strvec_pushl(&cp.args, "add", "--dry-run", "--ignore-missing",
"--no-warn-embedded-repo", add_data.sm_path, NULL);
if ((exit_code = pipe_command(&cp, NULL, 0, NULL, 0, &sb, 0))) {
strbuf_complete_line(&sb);
fputs(sb.buf, stderr);
return exit_code;
}
strbuf_release(&sb);
}

add_data.sm_name = add_data.sm_name ? add_data.sm_name : add_data.sm_path;
if (check_submodule_name(add_data.sm_name))
die(_("'%s' is not a valid submodule name"), add_data.sm_name);

add_data.prefix = prefix;
add_data.force = !!force;
add_data.quiet = !!quiet;
add_data.progress = !!progress;
add_data.dissociate = !!dissociate;

if (add_submodule(&add_data))
return 1;
configure_added_submodule(&add_data);
free(add_data.sm_path);

return 0;
}

#define SUPPORT_SUPER_PREFIX (1<<0)

struct cmd_struct {
Expand All @@ -3071,13 +3217,13 @@ static struct cmd_struct commands[] = {
{"list", module_list, 0},
{"name", module_name, 0},
{"clone", module_clone, 0},
{"add", module_add, SUPPORT_SUPER_PREFIX},
{"add-clone", add_clone, 0},
{"add-config", add_config, 0},
{"update-module-mode", module_update_module_mode, 0},
{"update-clone", update_clone, 0},
{"ensure-core-worktree", ensure_core_worktree, 0},
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
{"resolve-relative-url-test", resolve_relative_url_test, 0},
{"foreach", module_foreach, SUPPORT_SUPER_PREFIX},
{"init", module_init, SUPPORT_SUPER_PREFIX},
Expand Down
99 changes: 1 addition & 98 deletions git-submodule.sh
Original file line number Diff line number Diff line change
Expand Up @@ -145,104 +145,7 @@ cmd_add()
shift
done

if ! git submodule--helper config --check-writeable >/dev/null 2>&1
then
die "$(eval_gettext "please make sure that the .gitmodules file is in the working tree")"
fi

if test -n "$reference_path"
then
is_absolute_path "$reference_path" ||
reference_path="$wt_prefix$reference_path"

reference="--reference=$reference_path"
fi

repo=$1
sm_path=$2

if test -z "$sm_path"; then
sm_path=$(printf '%s\n' "$repo" |
sed -e 's|/$||' -e 's|:*/*\.git$||' -e 's|.*[/:]||g')
fi

if test -z "$repo" || test -z "$sm_path"; then
usage
fi

is_absolute_path "$sm_path" || sm_path="$wt_prefix$sm_path"

# assure repo is absolute or relative to parent
case "$repo" in
./*|../*)
test -z "$wt_prefix" ||
die "$(gettext "Relative path can only be used from the toplevel of the working tree")"

# dereference source url relative to parent's url
realrepo=$(git submodule--helper resolve-relative-url "$repo") || exit
;;
*:*|/*)
# absolute url
realrepo=$repo
;;
*)
die "$(eval_gettext "repo URL: '\$repo' must be absolute or begin with ./|../")"
;;
esac

# normalize path:
# multiple //; leading ./; /./; /../; trailing /
sm_path=$(printf '%s/\n' "$sm_path" |
sed -e '
s|//*|/|g
s|^\(\./\)*||
s|/\(\./\)*|/|g
:start
s|\([^/]*\)/\.\./||
tstart
s|/*$||
')
if test -z "$force"
then
git ls-files --error-unmatch "$sm_path" > /dev/null 2>&1 &&
die "$(eval_gettext "'\$sm_path' already exists in the index")"
else
git ls-files -s "$sm_path" | sane_grep -v "^160000" > /dev/null 2>&1 &&
die "$(eval_gettext "'\$sm_path' already exists in the index and is not a submodule")"
fi

if test -d "$sm_path" &&
test -z $(git -C "$sm_path" rev-parse --show-cdup 2>/dev/null)
then
git -C "$sm_path" rev-parse --verify -q HEAD >/dev/null ||
die "$(eval_gettext "'\$sm_path' does not have a commit checked out")"
fi

if test -z "$force"
then
dryerr=$(git add --dry-run --ignore-missing --no-warn-embedded-repo "$sm_path" 2>&1 >/dev/null)
res=$?
if test $res -ne 0
then
echo >&2 "$dryerr"
exit $res
fi
fi

if test -n "$custom_name"
then
sm_name="$custom_name"
else
sm_name="$sm_path"
fi

if ! git submodule--helper check-name "$sm_name"
then
die "$(eval_gettext "'$sm_name' is not a valid submodule name")"
fi

git submodule--helper add-clone ${GIT_QUIET:+--quiet} ${force:+"--force"} ${progress:+"--progress"} ${branch:+--branch "$branch"} --prefix "$wt_prefix" --path "$sm_path" --name "$sm_name" --url "$realrepo" ${reference:+"$reference"} ${dissociate:+"--dissociate"} ${depth:+"$depth"} || exit
git submodule--helper add-config ${force:+--force} ${branch:+--branch "$branch"} --url "$repo" --resolved-url "$realrepo" --path "$sm_path" --name "$sm_name"
git ${wt_prefix:+-C "$wt_prefix"} ${prefix:+--super-prefix "$prefix"} submodule--helper add ${GIT_QUIET:+--quiet} ${force:+--force} ${progress:+"--progress"} ${branch:+--branch "$branch"} ${reference_path:+--reference "$reference_path"} ${dissociate:+--dissociate} ${custom_name:+--name "$custom_name"} ${depth:+"$depth"} -- "$@"

This comment has been minimized.

Copy link
@sivaraam

sivaraam Jun 20, 2021

I believe you could drop the ${prefix:+--super-prefix "$prefix"} part as I don't think the prefix shell variable would ever be set for cmd_add.

On a related note, I've got a patch that fixes unnecessary option logic based on prefix variable. I'll try to send it soon and will Cc you in it.

}

#
Expand Down
2 changes: 1 addition & 1 deletion t/t7400-submodule-basic.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ test_expect_success 'submodule update aborts on missing gitmodules url' '

test_expect_success 'add aborts on repository with no commits' '
cat >expect <<-\EOF &&
'"'repo-no-commits'"' does not have a commit checked out
'"fatal: 'repo-no-commits'"' does not have a commit checked out
EOF
git init repo-no-commits &&
test_must_fail git submodule add ../a ./repo-no-commits 2>actual &&
Expand Down

0 comments on commit 3c46c10

Please sign in to comment.