Skip to content

Update command palette search to prioritize "longest substring" match. #18700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

e82eric
Copy link
Contributor

@e82eric e82eric commented Mar 18, 2025

Summary of the Pull Request

Addresses: #6693

Repurposed work from #16586

Feel free to close if it doesn't make sense for what ever reason (E.g. License file still an issue, wrong direction for fix, too complex, etc).

palette2

References and Relevant Issues

#6693

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

PR Checklist

  • Closes #xxx
  • Tests added/passed
  • Documentation updated
    • If checked, please file a pull request on our docs repo and link it here: #xxx
  • Schema updated (if necessary)

@lhecker lhecker self-requested a review March 26, 2025 22:04
@e82eric e82eric closed this Apr 10, 2025
@e82eric e82eric reopened this Apr 10, 2025
@e82eric
Copy link
Contributor Author

e82eric commented Apr 10, 2025

Sorry for the spam open/close. Let me know if I you guys are interested in this and I can reopen it.

@e82eric e82eric closed this Apr 10, 2025
@lhecker
Copy link
Member

lhecker commented Apr 10, 2025

Generally speaking, probably yes, but I personally haven't gotten a chance yet to look at it. Currently every member of the terminal team is unfortunately either working on another project or on sick leave, hence the overall delays since late last year. One of the things @zadjii-msft made was released just recently: https://learn.microsoft.com/en-us/windows/powertoys/command-palette/overview
The project I'm working on will be released soon as well.

I plan to review this PR after I fixed the remaining major issues the the Preview/Canary version has. (Which is what has occupied what little time I had for Windows Terminal since then.)

@e82eric
Copy link
Contributor Author

e82eric commented Apr 11, 2025

Sorry, didn't mean to rush you, was more worried I was spamming you guys with all of these fuzzy finding PRs. Please let me know if it doesn't make sense for what ever reason and I can close it.

@e82eric e82eric reopened this Apr 11, 2025
Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I'd like to thank you so much for taking care of the licensing situation!

Unfortunately, there's still some issues with this code and non-English languages.

I'm also somewhat confused by the code's dissimilarity with the original fzf to be honest (that's not an issue in on itself; it just surprised me). It prompted me to try and use Copilot to transform the original Go code to C++ though. It almost one-shotted the entire translation. I've since patched up the issues it had and here's the current state: https://gist.github.com/lhecker/dd840ed2f74e8f66d0f43fd49e999b88
Unfortunately, it doesn't work correctly, and I ran out of time working on it. If you want to play around with that code, please feel free to do so!

That aside, for the upcoming project that I've worked on, I ported VS Code's fuzzy matcher to Rust. Here's the original: https://github.com/microsoft/vscode/blob/071eafaf5ab4de3814466b616fe00a66d2739ad2/src/vs/base/common/fuzzyScorer.ts
If you'd like I could generate a C++ port of it next week based on my Rust version. The benefit of that code is that it's a lot simpler. The disadvantage is that fzf is a leagues better algorithm and if we had it, it would definitely be superior of course. I guess it's a trade-off.

If we stick with this fzf port, I really don't think it has to be perfect for us to merge it. But I do think though that we should at least get rid of the towlower calls and similar. The current version uses lstrcmpiW to implement case-insensitive comparisons which do work correctly (even if it's honestly bad to abuse that function for this purpose haha).

Comment on lines 61 to 62
Weight(_computeWeight());
HighlightedName(_computeHighlightedName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will execute the fzf matching logic twice. I think we should replace the two functions with a single one called _update (or similar) which runs ParsePattern only once and updates both members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, I think I may have gone too far with this, I realized that ParsePattern was getting executed for every item and moved the pattern parsing outside of UpdateFilter which ended up bringing the snippets pane and suggestions UI into scope. Hopefully this isn't too invasive.

Comment on lines 19 to 25
enum CharClass
{
NonWord = 0,
CharLower = 1,
CharUpper = 2,
Digit = 3,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reviewing this implementation today and it has me a little confused to be honest. It's quite different from the original fzf. Among others, the CharClass enum is missing the Delimiter, Letter, and Whitespace types.

That aside, I'd turn this enum into a enum class and assign it an uint8_t size explicitly, so that the LUT I'll suggest later is going to be 4x more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 29 to 30
size_t start = str.find_first_not_of(L" ");
return (start == std::wstring::npos) ? std::wstring_view() : std::wstring_view(str).substr(start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: find_first_not_of(L" ") and find_first_not_of(L' ') are not the same and the latter will be much faster. This can also exploit the fact that npos is guaranteed to be the max. size_t value:

const auto off = str.find_first_not_of(L' ');
return str.substr(std::min(off, str.size()));

Copy link
Contributor Author

@e82eric e82eric Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 87 to 93
if (std::iswlower(ch))
return CharLower;
if (std::iswupper(ch))
return CharUpper;
if (std::iswdigit(ch))
return Digit;
return NonWord;
Copy link
Member

@lhecker lhecker Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 functions are documented to only work with ASCII. I recommend using u_charType and a LUT. Something like this:

static constexpr auto lut = []() {
    std::array<CharClass, U_CHAR_CATEGORY_COUNT> lut;
    lut.fill(CharClass::NonWord);

    lut[U_UPPERCASE_LETTER] = CharClass::Upper;
    lut[U_LOWERCASE_LETTER] = CharClass::Lower;
    lut[U_TITLECASE_LETTER] = CharClass::Letter;
    lut[U_MODIFIER_LETTER] = CharClass::Letter;
    lut[U_OTHER_LETTER] = CharClass::Letter;

    lut[U_DECIMAL_DIGIT_NUMBER] = CharClass::Number;

    lut[U_SPACE_SEPARATOR] = CharClass::White;
    lut[U_LINE_SEPARATOR] = CharClass::White;
    lut[U_PARAGRAPH_SEPARATOR] = CharClass::White;

    return lut;
}();

return lut[u_charType(c)];

(This code is untested, and I wrote it for the fzf port I mentioned elsewhere.)

Copy link
Contributor Author

@e82eric e82eric Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, I didn't realize those only work with ASCII, I am you glad mentioned that, I didn't understand why lstrcmpi fixed the original issue until now...

Comment on lines 122 to 124
std::ranges::transform(textCopy, textCopy.begin(), [](wchar_t c) {
return std::towlower(c);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies here. I believe that creating this copy is unnecessary, as you can just lowercase each char from the original text within the loop below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and then changed when I moved everything to UChar32.

// We use the same comparison method as upon sorting to guarantee consistent behavior
const WCHAR searchCharAsString[] = { currentPatternChar, L'\0' };
const WCHAR currentCharAsString[] = { currentChar, L'\0' };
auto isCurrentCharMatched = lstrcmpi(searchCharAsString, currentCharAsString) == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple uses of lstrcmpiW throughout this file (and in the current matching logic before this PR) and I don't think that's a good way to implement this. Each call to lstrcmpiW goes through multiple layers just to arrive at CompareStringEx which is rather expensive for just comparing two characters with folding.

In this particular instance, currentChar is already lowercase so the only problem is currentPatternChar which could simply get lowercases with ICU and then compared against the currentChar. Technically we then also need to take care of the difference between lowercasing and casefolding (e.g. the lowercase of Greek "sigma" has two variants), but I think that's a secondary issue.

FWIW, we also technically need to iterate by codepoint here instead of by character. Right now, this code is limited to UCS2 (no surrogate pair support), but I think that's similarly a secondary concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I "think" I have this in a better state now.

  • Updated to use u_foldCase for the lower casing and folding. I think this works for everything except for some of the more complex folding, ex fold results in multiple code points
  • The text and pattern both get converted to UChar32 to iterate over the codepoints.
  • The positions get converted back to Utf16 positions for the highlighting
  • I tried to add some test cases for the multi language support and surrogate pairs, but there are probably still some major holes in my understanding of Unicode, are the there other scenarios that I should be testing for?

Comment on lines 307 to 314
int whiteSpaces = 0;
for (wchar_t c : text)
{
if (c != L' ')
break;
whiteSpaces++;
}
return whiteSpaces;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use the same find_first_not_of logic as above, but without slicing the string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and then changed when I moved everything to UChar32.

while ((found = trimmedPatternStr.find(L' ', pos)) != std::wstring::npos)
{
std::wstring term = std::wstring{ trimmedPatternStr.substr(pos, found - pos) };
std::ranges::transform(term, term.begin(), ::towlower);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another 2 instances of towlower FWIW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and then changed when I moved everything to UChar32.

@e82eric e82eric force-pushed the command-palette-search-fzf branch 2 times, most recently from b931738 to 87a9911 Compare April 27, 2025 00:09
@e82eric e82eric marked this pull request as draft April 28, 2025 16:16
@e82eric
Copy link
Contributor Author

e82eric commented Apr 29, 2025

I'm also somewhat confused by the code's dissimilarity with the original fzf to be honest (that's not an issue in on itself; it just surprised me).

I think there are a couple of things that contributed this.

  • The original telescope-native port that I ported was done back in 2022 before the delimiter class type etc, and I think he had removed some stuff like the normalization.
  • When I ported it to c++ I mangled all of the variable names trying to understand how all of the dynamic programming and back tracking work.

Should I try to update use the latest fzf logic?

@e82eric
Copy link
Contributor Author

e82eric commented Apr 29, 2025

That aside, for the upcoming project that I've worked on, I ported VS Code's fuzzy matcher to Rust. Here's the original: https://github.com/microsoft/vscode/blob/071eafaf5ab4de3814466b616fe00a66d2739ad2/src/vs/base/common/fuzzyScorer.ts If you'd like I could generate a C++ port of it next week based on my Rust version. The benefit of that code is that it's a lot simpler. The disadvantage is that fzf is a leagues better algorithm and if we had it, it would definitely be superior of course. I guess it's a trade-off.

Totally understand if the VS Code algo is more maintainable etc, let me know what direction you guys want to go. I am happy to try to port the typescript or the rust code or incorporate your port of the rust code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhecker do you happen to know how to run these tests. When I try with runut.cmd or VS, I keep getting an error about them being "blocked" from this Error: TAEF: [HRESULT: 0x80080204] Failed to create the test host process for out of process test execution. (Failed to find executable "TestHostApp.exe" specified in custom AppX manifest.)

I am sure this something that I am doing wrong but I am struggling to figure it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for not responding to this. The last few weeks in particular, there was some crunch to get things done on my end. The project I mentioned did launch now though: https://github.com/microsoft/edit

As you can probably tell by the number of issues and PRs, I'm struggling to keep up with them still. I promise you though that I'm not touching any WT code until I got your PR sorted out. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the failure, that's not one I've ever seen before, or remember having had in the past. I usually just build all of the Unit/Feature test projects in VS and then it "just works" when I run them via the OpenConsole.psm1 commands. I'll tag this PR up for discussion just in case I forget about this again so I can ask my colleagues.

Copy link
Contributor Author

@e82eric e82eric May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lhecker! will try building from VS and then OpenConsole.psm1. No rush on my end.

I was actually just about to mark this as draft again. I discovered the edit repo a couple hours ago and was in the process of trying to port https://github.com/microsoft/edit/blob/main/src/fuzzy.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note I am pretty excited to start digging through the edit code. It looks like there is a ton of stuff that I can learn from in there.

@e82eric e82eric force-pushed the command-palette-search-fzf branch from a5b854f to fbf76f2 Compare April 29, 2025 21:28
@e82eric e82eric marked this pull request as ready for review April 29, 2025 21:32
carlos-zamora added a commit that referenced this pull request May 16, 2025
@lhecker lhecker added the Needs-Discussion Something that requires a team discussion before we can proceed label May 21, 2025
@e82eric e82eric marked this pull request as draft May 22, 2025 06:50

This comment has been minimized.

This comment has been minimized.

@e82eric e82eric force-pushed the command-palette-search-fzf branch from d08f795 to ce0bf44 Compare May 24, 2025 18:22

This comment has been minimized.

@e82eric e82eric force-pushed the command-palette-search-fzf branch 3 times, most recently from 755209e to 2ae608e Compare May 25, 2025 07:42
@e82eric
Copy link
Contributor Author

e82eric commented May 27, 2025

Updated the PR to use the VsCode algo from the Edit repo, wanted to see how the results compared to the fzf port. It was more or less like you said where the matching was not quite as good but the implementation was much simpler. Let me know which direction you guys want to go.

There are a couple of differences from the rust code.

  • I didn't add the arena allocator, I had removed the Slab one from the fzf implementation to make things simpler. Should I add this back?
  • I wasn't able to get the case folding from icu::fold_case to work, so I stuck with u_foldCase. When the folding resulted in a expansion it would match a extra position or 1 fewer depending on if the expansion was in the query or the haystack.
  • I am still splitting the query into terms using the space char so that "anta sp" from the example would still match.
    • It looked like VsCode was doing the same

Difference I noted from the fzf code.

@e82eric e82eric marked this pull request as ready for review May 27, 2025 00:06
@lhecker
Copy link
Member

lhecker commented May 27, 2025

It was more or less like you said where the matching was not quite as good but the implementation was much simpler. Let me know which direction you guys want to go.

Things have finally settled down in the edit repository and I now have plenty time to review your PR. If you prefer the fzf approach, I'd be happy to review that and help you with finishing up the PR if needed. If you prefer the VS Code approach, I can also review that. What would you prefer?

I'll make sure it gets into main within this week. 🙂

I didn't add the arena allocator, I had removed the Slab one from the fzf implementation to make things simpler. Should I add this back?

Nah, it's fine this way. Of course, a slab/arena would be ideal, but it doesn't have to be used.

When the folding resulted in a expansion it would match a extra position or 1 fewer depending on if the expansion was in the query or the haystack.

Oh f... that's an issue I haven't considered. 😢 The ICU source code for case folding is here: https://github.com/unicode-org/icu/blob/454ce01479fdf4435e0a6600c38fc205a898501c/icu4c/source/common/ustrcase.cpp#L208-L304
The Edits object can be used for mapping folded offsets back to the original offsets, but that feature is unavailable via the C API. Since I probably can't implement fzf in edit (binary size), I'll probably have to port that internal C++ code to Rust.

@e82eric
Copy link
Contributor Author

e82eric commented May 28, 2025

Things have finally settled down in the edit repository and I now have plenty time to review your PR. If you prefer the fzf approach, I'd be happy to review that and help you with finishing up the PR if needed. If you prefer the VS Code approach, I can also review that. What would you prefer?

I guess my preference is to use the fzf port, having the gap penalties seemed to improve the matching quite a bit. I would totally understand if you guys would prefer to go with the VsCode one to avoid adding the extra complexity to the code base. I'll revert the PR back to the fzf version, let me know if that doesn't make sense and I can switch back to the VsCode version if needed.

I made a couple of attempts tonight at supporting the case folding expansions in the C++ port of the VsCode algo and failed each time. But I think I have a strategy that will work now, I want to say I should be able to get that working in a night or 2 and then another night to add it to the fzf version. Does that timeline work? (I don't want to miss the window where you have extra time for PR reviews).

@lhecker
Copy link
Member

lhecker commented May 28, 2025

Let's say we go with fzf, since you prefer its behavior. I just read your fzf implementation in fbf76f2 and it actually looks absolutely fine to me (it uses the per-codepoint fold-case function which is "good enough" IMO). I'd merge it as-is and improve it as needed later on. How about we

  • backup your current changes
  • revert to fbf76f2 and merge that
  • and then improve upon it later?

Or did I miss an issue with fbf76f2? Otherwise, I'd just test it locally first and then approve it. 🙂

@e82eric e82eric force-pushed the command-palette-search-fzf branch from 2ae608e to fbf76f2 Compare May 28, 2025 13:19
@e82eric
Copy link
Contributor Author

e82eric commented May 28, 2025

Sounds good! Just updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Discussion Something that requires a team discussion before we can proceed zBugBash-Consider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants