Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip messing around with sentence similarity #4989

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

bovine3dom
Copy link
Member

Closes #936 if ever finished

thoughts:

  1. slightly slow to embed docs (~30 seconds) so need to cache between sessions
  2. probably too slow to do an embedding as you type but worth checking
  3. results often follow the pattern irrelevant, bizarre, totally irrelevant, THING YOU ARE LOOKING FOR, irrelvant, kooky ... but given that :apropos for the same pattern often gives zero results, maybe that's useful. example: searching for "version control" gives you lots of random stuff but ~5th among them is mktridactylrc which is what you probably want. Which you might not have found otherwise
  4. should try on tab urls / title and see what happens


const docEmbeds = await extractor(funcs.map(f => f[0] + ": " + f[1].doc.slice(0, 512)), { pooling: 'mean', normalize: true }) // need to check max length. takes a few seconds

probe = 'easter egg'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results: "hint", "wintitle", "snow_mouse_mode", "neo_mouse_mode", "escapehatch", "back", "markjumplocal", "yankimage", "markjump", "forward"

for "version control": "issue", "credits", "updatecheck", "nativeinstall", "updatenative", "containerupdate", "mktridactylrc", "firefoxsyncpull", "setnull", "markaddlocal"

@bovine3dom
Copy link
Member Author

bovine3dom commented Jun 9, 2024

Worth checking if we can speed up search with a nearest neighbour index https://unum-cloud.github.io/usearch/javascript/

I think the slowness is in the embedding of the search term but I could be wrong

Edit: that library needs wasm. I can't get my head around it, @wasmer/sdk fails at init() before we even try to import the hnsw library. I tried another one but it was ~100x slower than just searching everything. Probably not worth any more time investigating

For our data, it is much, much slower than just searching all the data.
And gets worse results. Committing because I might want the reshape
function again at some point
This reverts commit 5ba8438.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beyond fuzzy search
1 participant