Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC : Search #1685

Closed
wants to merge 8 commits into from
Closed

PoC : Search #1685

wants to merge 8 commits into from

Conversation

tcitworld
Copy link
Member

How-to :

  • Launch elasticsearch (tested with version 1.74)
  • composer update
  • php bin/console fos:elastica:populate
  • add entries
  • search for entries
  • profit

TODO :

  • a parameter to enable/disable this feature (for those who can't install ES)
  • Make things properly
  • use RabbitMQ to index without loss of perfs
  • tests
  • add ES in docker-compose

Future :

  • find a way to display searched tags
  • search in comments too

Should fix #18

@@ -34,6 +34,7 @@ public function registerBundles()
new Wallabag\ImportBundle\WallabagImportBundle(),
new Doctrine\Bundle\MigrationsBundle\DoctrineMigrationsBundle(),
new Craue\ConfigBundle\CraueConfigBundle(),
new \FOS\ElasticaBundle\FOSElasticaBundle(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need the first \

@nicosomb
Copy link
Member

Nice :)

I added 2 TODO: a parameter to disable this feature and changes in docker-compose.

@j0k3r
Copy link
Member

j0k3r commented Feb 18, 2016

And I guess that if we have questions about ES, we can ask for an expert 👋 @damienalexandre

@nicosomb
Copy link
Member

Just read this blog post http://blog.zenika.com/2016/02/15/consolider-les-logs-docker-dans-un-elk/ you can find configuration for your docker-compose file.

custom_french_analyzer:
type: custom
tokenizer: letter
filter: ["asciifolding", "lowercase", "french_stem", "stop_fr"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The french_stem token filter does not exist, you don't have any stemming on this analyzer.

I suggest you use elision too, instead of stop_fr. Have a look at this example.

Also, why tokenizing for french only? I'm not sure the contents are mono-lang?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why tokenizing for french only? I'm not sure the contents are mono-lang?

Not at all. It was to test results. Should we put analysers for all languages or is there an other way ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way you will be able to get a good analysis if you don't know the language of the content, so my best guess would be to build a triGram analyzer, with html_strip char filter etc... and use multi-fields.

Each "searchable field" could be mapped:

  • with a triGram analyzer (to collect a lot of contents)
  • with standard analyzer (to improve the pertinence)

It will not be perfect 😞 and will need some tuning when it's setup (there are so many ways to do analysis, you have to found one that will fit your contents and how you want to search them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library we use can sometime provide us the language content. Can it be helpful too ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can help yes,
you could chose to support some strong elision on a list of language (the one already in ES core are a good starting point), and search on those on a content basis.

Have a quick look at this to learn more about the recommended config: https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html#_use_n_grams

@tcitworld
Copy link
Member Author

Strange error.

Wallabag\CoreBundle\Tests\Controller\EntryControllerTest::testQuickstart
InvalidArgumentException: Unreachable field "entry"
Wallabag/CoreBundle/Tests/Controller/EntryControllerTest.php:44

provider: ~
listener: ~
finder: ~
properties:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be part of the "mappings"; why is it here?

@nicosomb nicosomb removed this from the 2.0.0 milestone Apr 3, 2016
@j0k3r j0k3r mentioned this pull request Apr 11, 2016
@nicosomb nicosomb closed this Apr 18, 2016
@j0k3r j0k3r deleted the v2-es branch October 3, 2016 08:48
@j0k3r j0k3r mentioned this pull request Oct 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants