Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal vector based concept search #888

Closed
20 of 21 tasks
etiennedi opened this issue Jun 2, 2019 · 0 comments · Fixed by #918
Closed
20 of 21 tasks

Minimal vector based concept search #888

etiennedi opened this issue Jun 2, 2019 · 0 comments · Fixed by #918

Comments

@etiennedi
Copy link
Member

etiennedi commented Jun 2, 2019

Goal

An early as possible pre-release

Limitations to speed up delivery

  • no indexing of props, i.e. only Explore->Concepts search, but without structured filters
  • a single index, no separate indices per Class/etc.
  • occurence not reflected, only stopwords removed

Todos

  • Include required ES with vector plugin in stack
  • build vectorizer to be used at import time
    • translate class instance to corpus
      • only use string props in MVP pre-release
    • use at import time
      • concept to vector
      • import to vector repo
  • adapt contextionary
    • return single centroid for class corpus text
    • remove stopwords or words not in the c11y
    • error if no words left
    • build client in weaviate
  • build vector repo
    • prepare schema once - how? when?
      • build idempotent way to manage schema
      • run for example at startup. In the future we need to run this as part of the schema manager as it depends on the schema. But for now there is just a single index, so we can just run it once
    • import single entry
    • perform search, return list of IDs
  • replace Fetch Fuzzy with Explore Concepts
etiennedi added a commit that referenced this issue Jun 2, 2019
currently we aren't doing anything yet with the result. The next step is
then to insert it into some sort of a vector backend
etiennedi added a commit that referenced this issue Jun 3, 2019
and verify with integration tests
etiennedi added a commit that referenced this issue Jun 4, 2019
otherwise we end up with a compound word that on its own will most
likely not be contained in the contextinoary
etiennedi added a commit that referenced this issue Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant