-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC : Search #1685
PoC : Search #1685
Conversation
@@ -34,6 +34,7 @@ public function registerBundles() | |||
new Wallabag\ImportBundle\WallabagImportBundle(), | |||
new Doctrine\Bundle\MigrationsBundle\DoctrineMigrationsBundle(), | |||
new Craue\ConfigBundle\CraueConfigBundle(), | |||
new \FOS\ElasticaBundle\FOSElasticaBundle(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need the first \
Nice :) I added 2 TODO: a parameter to disable this feature and changes in docker-compose. |
And I guess that if we have questions about ES, we can ask for an expert 👋 @damienalexandre |
Just read this blog post http://blog.zenika.com/2016/02/15/consolider-les-logs-docker-dans-un-elk/ you can find configuration for your docker-compose file. |
custom_french_analyzer: | ||
type: custom | ||
tokenizer: letter | ||
filter: ["asciifolding", "lowercase", "french_stem", "stop_fr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The french_stem
token filter does not exist, you don't have any stemming on this analyzer.
I suggest you use elision too, instead of stop_fr
. Have a look at this example.
Also, why tokenizing for french only? I'm not sure the contents are mono-lang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why tokenizing for french only? I'm not sure the contents are mono-lang?
Not at all. It was to test results. Should we put analysers for all languages or is there an other way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way you will be able to get a good analysis if you don't know the language of the content, so my best guess would be to build a triGram analyzer, with html_strip char filter etc... and use multi-fields.
Each "searchable field" could be mapped:
- with a triGram analyzer (to collect a lot of contents)
- with standard analyzer (to improve the pertinence)
It will not be perfect 😞 and will need some tuning when it's setup (there are so many ways to do analysis, you have to found one that will fit your contents and how you want to search them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library we use can sometime provide us the language content. Can it be helpful too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can help yes,
you could chose to support some strong elision on a list of language (the one already in ES core are a good starting point), and search on those on a content basis.
Have a quick look at this to learn more about the recommended config: https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html#_use_n_grams
Strange error.
|
provider: ~ | ||
listener: ~ | ||
finder: ~ | ||
properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be part of the "mappings"; why is it here?
How-to :
composer update
php bin/console fos:elastica:populate
TODO :
Future :
Should fix #18