New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support full text search for all languages #700

Closed
wants to merge 20 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@kou
Contributor

kou commented Apr 24, 2016

GitHub: #615

It uses PGroonga instead of built-in PostgreSQL full text search
feature. Because built-in PostgreSQL full text search doesn't support
languages that don't put space between terms such as Japanese, Chinese
and so on. PGroonga supports all languages including Japanese and
Chinese.

This change doesn't enable PGroonga by default. It means that this
change doesn't change the current behavior.

You need to set True to USING_PGROONGA in zrpoject/settings.py to enable
PGroonga.

@smarx

This comment has been minimized.

Show comment
Hide comment
@smarx

smarx Apr 24, 2016

Automated message from Dropbox CLA bot

@kou, thanks for the pull request! It looks like you haven't yet signed the Dropbox CLA. Please sign it here.

smarx commented Apr 24, 2016

Automated message from Dropbox CLA bot

@kou, thanks for the pull request! It looks like you haven't yet signed the Dropbox CLA. Please sign it here.

@smarx

This comment has been minimized.

Show comment
Hide comment
@smarx

smarx Apr 24, 2016

Automated message from Dropbox CLA bot

@kou, thanks for signing the CLA!

smarx commented Apr 24, 2016

Automated message from Dropbox CLA bot

@kou, thanks for signing the CLA!

Show outdated Hide outdated puppet/zulip/files/postgresql/process_fts_updates
@@ -43,6 +43,8 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../.
try:
os.environ['DJANGO_SETTINGS_MODULE'] = 'zproject.settings'
from django.conf import settings
if settings.USING_PGROONGA:
exit()

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

@kou can you explain a bit more what's going on here? Is the story that with PGROONGA we're not logging and deferring updates to the full-text search index, and so we don't need this script?

@timabbott

timabbott Apr 25, 2016

Member

@kou can you explain a bit more what's going on here? Is the story that with PGROONGA we're not logging and deferring updates to the full-text search index, and so we don't need this script?

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

Sure.
PGroonga supports read lock free updating. It means that search performance isn't decreased while we add new messages. So we don't need to defer full-text search index update.

(I don't know why Zulip defers full-text search index update. So the answer may not describe what you want to know.)

@kou

kou Apr 26, 2016

Contributor

Sure.
PGroonga supports read lock free updating. It means that search performance isn't decreased while we add new messages. So we don't need to defer full-text search index update.

(I don't know why Zulip defers full-text search index update. So the answer may not describe what you want to know.)

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

Right, so the story here is that Zulip does this deferral as a performance optimization. Updating the full-text search index is a substantial portion of the total work involved in do_send_message, so rather than blocking on doing that before finishing the message sending operation, we do it in the background after the message is sent. So this issue isn't search performance, it's message sending performance.

I think it would be good to use the same mechanism with PGroonga if there isn't a technical constraint that makes this difficult, since otherwise sites using this will experience substantially slower message sending performance.

@timabbott

timabbott Apr 26, 2016

Member

Right, so the story here is that Zulip does this deferral as a performance optimization. Updating the full-text search index is a substantial portion of the total work involved in do_send_message, so rather than blocking on doing that before finishing the message sending operation, we do it in the background after the message is sent. So this issue isn't search performance, it's message sending performance.

I think it would be good to use the same mechanism with PGroonga if there isn't a technical constraint that makes this difficult, since otherwise sites using this will experience substantially slower message sending performance.

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

Thanks for the information.

PGroonga has "newer data are more important" policy. So PGroonga's index update is faster than other full-text search implementation such as GIN. But it's slower than "do nothing".

I hope that sent messages are searchable as soon as possible. (It also respects PGroonga policy.)

But this is Zulip project. Zulip project prefers deferring index update, we should do. It should be done in this pull request? Can we separate it as another issue? Because the current implementation works.

@kou

kou Apr 27, 2016

Contributor

Thanks for the information.

PGroonga has "newer data are more important" policy. So PGroonga's index update is faster than other full-text search implementation such as GIN. But it's slower than "do nothing".

I hope that sent messages are searchable as soon as possible. (It also respects PGroonga policy.)

But this is Zulip project. Zulip project prefers deferring index update, we should do. It should be done in this pull request? Can we separate it as another issue? Because the current implementation works.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

The deferred index update usually only adds <10ms of delay to the data being available in the index -- it just avoids blocking the critical send_message code path on updating the index.

I definitely understand the desire to get something finished and merged and then move onto building improvements and optimizations. I'm OK with fixing this being a follow-up issue, but I consider it a blocker for PGroonga being available in Zulip in production, because I don't want to support two models here (and I think it'll actually be very little work to make the PGroonga integration use the existing infrastructure for doing deferred updates we have for postgres full text search).

@timabbott

timabbott Apr 27, 2016

Member

The deferred index update usually only adds <10ms of delay to the data being available in the index -- it just avoids blocking the critical send_message code path on updating the index.

I definitely understand the desire to get something finished and merged and then move onto building improvements and optimizations. I'm OK with fixing this being a follow-up issue, but I consider it a blocker for PGroonga being available in Zulip in production, because I don't want to support two models here (and I think it'll actually be very little work to make the PGroonga integration use the existing infrastructure for doing deferred updates we have for postgres full text search).

This comment has been minimized.

@kou

kou Apr 29, 2016

Contributor

OK. I've implemented and pushed it.

In the implementation, the default full-text search index is also updated. Should we disable index update for the default full-text search?

@kou

kou Apr 29, 2016

Contributor

OK. I've implemented and pushed it.

In the implementation, the default full-text search index is also updated. Should we disable index update for the default full-text search?

Show outdated Hide outdated zproject/settings.py
# USING_PGROONGA = True
USING_PGROONGA = False
else:
USING_PGROONGA = False

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

This user-configurable setting for USING_PGROONGA should be added to zproject/local_settings_template.py (installed as /etc/zulip/settings.py) so users don't need to patch Zulip to set this setting. You should add its default value of False to DEFAULT_SETTINGS in this file.

@timabbott

timabbott Apr 25, 2016

Member

This user-configurable setting for USING_PGROONGA should be added to zproject/local_settings_template.py (installed as /etc/zulip/settings.py) so users don't need to patch Zulip to set this setting. You should add its default value of False to DEFAULT_SETTINGS in this file.

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

You should add some docs in a new file docs/full-text-search.md explaining how to setup pgroonga, and linking to the upstream broader docs on pgroonga. (We should probably also have some text in there explaining how the default Zulip full-text search works, but I'm happy to write that).

Also, the section for this that you add in zproject/local_settings_template.py should have comments linking to that documentation.

@timabbott

timabbott Apr 25, 2016

Member

You should add some docs in a new file docs/full-text-search.md explaining how to setup pgroonga, and linking to the upstream broader docs on pgroonga. (We should probably also have some text in there explaining how the default Zulip full-text search works, but I'm happy to write that).

Also, the section for this that you add in zproject/local_settings_template.py should have comments linking to that documentation.

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

This user-configurable setting for USING_PGROONGA should be added to zproject/local_settings_template.py (installed as /etc/zulip/settings.py) so users don't need to patch Zulip to set this setting. You should add its default value of False to DEFAULT_SETTINGS in this file.

Done.

You should add some docs in a new file docs/full-text-search.md explaining how to setup pgroonga, and linking to the upstream broader docs on pgroonga. (We should probably also have some text in there explaining how the default Zulip full-text search works, but I'm happy to write that).

I've added it with some TODO marks. Could you resolve TODOs for full-text search feature summary and the default full-text search feature description after this pull request is merged?

Also, the section for this that you add in zproject/local_settings_template.py should have comments linking to that documentation.

Done.

@kou

kou Apr 26, 2016

Contributor

This user-configurable setting for USING_PGROONGA should be added to zproject/local_settings_template.py (installed as /etc/zulip/settings.py) so users don't need to patch Zulip to set this setting. You should add its default value of False to DEFAULT_SETTINGS in this file.

Done.

You should add some docs in a new file docs/full-text-search.md explaining how to setup pgroonga, and linking to the upstream broader docs on pgroonga. (We should probably also have some text in there explaining how the default Zulip full-text search works, but I'm happy to write that).

I've added it with some TODO marks. Could you resolve TODOs for full-text search feature summary and the default full-text search feature description after this pull request is merged?

Also, the section for this that you add in zproject/local_settings_template.py should have comments linking to that documentation.

Done.

Show outdated Hide outdated provision.py
run(["sudo", "add-apt-repository", "-y", "universe"])
run(["sudo", "add-apt-repository", "-y", "ppa:groonga/ppa"])
run(["sudo", "apt-get", "update"])

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

This only suffices to install pgroonga in a development environment; we'll also need some code for installing it in a production environment. I think the right way to do this would be to add a new (optional) puppet module under puppet/zulip/manifests/ intended to be used alongside postgres_appdb.pp that adds this PPA and depends on the package. Let me know if you need help understanding how to do this properly.

@timabbott

timabbott Apr 25, 2016

Member

This only suffices to install pgroonga in a development environment; we'll also need some code for installing it in a production environment. I think the right way to do this would be to add a new (optional) puppet module under puppet/zulip/manifests/ intended to be used alongside postgres_appdb.pp that adds this PPA and depends on the package. Let me know if you need help understanding how to do this properly.

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

How about separating the task a separated issue? I want to work on minimum task in one pull request.

@kou

kou Apr 26, 2016

Contributor

How about separating the task a separated issue? I want to work on minimum task in one pull request.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

I think that addressing this is required to make pgroonga a supported feature for Zulip, but I'm OK with making this a follow-up issue if you like. But we should clearly mark this feature as experimental in the docs/ page and be clear it only works in the Zulip development environment for now. Does that seem reasonable? I don't want to have people think they can use this in production when that's not done yet.

@timabbott

timabbott Apr 27, 2016

Member

I think that addressing this is required to make pgroonga a supported feature for Zulip, but I'm OK with making this a follow-up issue if you like. But we should clearly mark this feature as experimental in the docs/ page and be clear it only works in the Zulip development environment for now. Does that seem reasonable? I don't want to have people think they can use this in production when that's not done yet.

This comment has been minimized.

@kou

kou Apr 29, 2016

Contributor

OK. I've pushed puppet support.

@kou

kou Apr 29, 2016

Contributor

OK. I've pushed puppet support.

Show outdated Hide outdated zerver/views/messages.py
keywords).label("subject_matches"))
condition = or_(column("subject").op("@@")(operand),
column("rendered_content").op("@@")(operand))
return query.where(maybe_negate(condition))

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

Does highlighting of the matched words work with the pgroonga version here?

@timabbott

timabbott Apr 25, 2016

Member

Does highlighting of the matched words work with the pgroonga version here?

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

Sure. I've added a new function that compatible with ts_match_locs_array in tsearch_extras to PGroonga.

We can use the existing code for highlighting.

@kou

kou Apr 26, 2016

Contributor

Sure. I've added a new function that compatible with ts_match_locs_array in tsearch_extras to PGroonga.

We can use the existing code for highlighting.

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

Cool, that's great! Do you need to add any code to make that work?

@timabbott

timabbott Apr 26, 2016

Member

Cool, that's great! Do you need to add any code to make that work?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

Do you need to add any code to make that work?

No. The current code is enough.

@kou

kou Apr 27, 2016

Contributor

Do you need to add any code to make that work?

No. The current code is enough.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

great!

@timabbott
Show outdated Hide outdated zerver/migrations/0001_initial.py
"""),
]
""")
zulip_postgres_migrations = [full_text_search_migration]

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

This approach of modifying the initial migration to use PGROONGA isn't ideal, since it makes it fairly unpleasant for an existing Zulip installation to switch to using PGROONGA. I'm not sure what the right answer is (maybe we just provide a separate script the user can use when they toggle PGROONGA it to create the migration), but wanted to flag this as a potential issue.

@timabbott

timabbott Apr 25, 2016

Member

This approach of modifying the initial migration to use PGROONGA isn't ideal, since it makes it fairly unpleasant for an existing Zulip installation to switch to using PGROONGA. I'm not sure what the right answer is (maybe we just provide a separate script the user can use when they toggle PGROONGA it to create the migration), but wanted to flag this as a potential issue.

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

OK. Should we create a new issue about it? It may be a large task.

@kou

kou Apr 26, 2016

Contributor

OK. Should we create a new issue about it? It may be a large task.

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

I'm not sure it needs to be a large task. Assuming we're OK with the old full-text search still having its implementation present, it seems to me like we need the script to just do a few things:

  • Do an ALTER ROLE to update the default search path to the appropriate value
  • Run the CREATE INDEX command above

Or am I missing something?

@timabbott

timabbott Apr 26, 2016

Member

I'm not sure it needs to be a large task. Assuming we're OK with the old full-text search still having its implementation present, it seems to me like we need the script to just do a few things:

  • Do an ALTER ROLE to update the default search path to the appropriate value
  • Run the CREATE INDEX command above

Or am I missing something?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

Sorry. My description didn't have enough information.

What the script needs to do is what you list. It's not a large task as you said.

I think that we need to decide the followings things. It may be a large task because we may need many things or discussions.

  • Where should we put the script?
  • How do users use the script?
@kou

kou Apr 27, 2016

Contributor

Sorry. My description didn't have enough information.

What the script needs to do is what you list. It's not a large task as you said.

I think that we need to decide the followings things. It may be a large task because we may need many things or discussions.

  • Where should we put the script?
  • How do users use the script?

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

I'd put the script with a path like scripts/setup/enable-pgroonga, and the documentation for the USING_PGROONGA option would suggest that users need to run that script in order to use it.

Does that seem reasonable?

@timabbott

timabbott Apr 27, 2016

Member

I'd put the script with a path like scripts/setup/enable-pgroonga, and the documentation for the USING_PGROONGA option would suggest that users need to run that script in order to use it.

Does that seem reasonable?

This comment has been minimized.

@kou

kou Apr 29, 2016

Contributor

Thanks for deciding them. It's reasonable.

I've put scripts/setup/enable-pgroonga and scripts/setup/disable-pgroonga and removed related codes from tools/postgres-init-deb-db and zerver/migrations/0001_initial.py. docs/full-text-search.md is updated.

@kou

kou Apr 29, 2016

Contributor

Thanks for deciding them. It's reasonable.

I've put scripts/setup/enable-pgroonga and scripts/setup/disable-pgroonga and removed related codes from tools/postgres-init-deb-db and zerver/migrations/0001_initial.py. docs/full-text-search.md is updated.

Show outdated Hide outdated tools/postgres-init-dev-db
@@ -24,7 +24,7 @@ if [[ $# == 0 ]]; then
USERNAME=zulip
PASSWORD=$("$(dirname "$0")/../bin/get-django-setting" LOCAL_DATABASE_PASSWORD)
DBNAME=zulip
SEARCH_PATH="$USERNAME",public
SEARCH_PATH="$USERNAME",public,pgroonga,pg_catalog

This comment has been minimized.

@timabbott

timabbott Apr 25, 2016

Member

why do we need to add pg_catalog here as well?

@timabbott

timabbott Apr 25, 2016

Member

why do we need to add pg_catalog here as well?

This comment has been minimized.

@kou

kou Apr 26, 2016

Contributor

PostgreSQL searches pg_catalog first if it's not specified in search_path.

http://www.postgresql.org/docs/9.5/static/runtime-config-client.html :

The system catalog schema, pg_catalog, is always searched, whether it is mentioned in the path or not. If it is mentioned in the path then it will be searched in the specified order. If pg_catalog is not in the path then it will be searched before searching any of the path items.

pgroonga schema provides @@ operator. It's also defined in pg_catalog. If we just specify only pgroonga, @@ operator is never used. Because pg_catalog is searched first.

It causes a problem when sequential scan is used by PostgreSQL. Both of @@ operator in pg_catalog and @@ operator in pgroonga provide full-text search feature. But @@ operator in pg_catalog doesn't support languages that doesn't put space between words such as Japanese and Chinese. In sequential scan, we need to use @@ operator in pgroonga to support all languages.

@kou

kou Apr 26, 2016

Contributor

PostgreSQL searches pg_catalog first if it's not specified in search_path.

http://www.postgresql.org/docs/9.5/static/runtime-config-client.html :

The system catalog schema, pg_catalog, is always searched, whether it is mentioned in the path or not. If it is mentioned in the path then it will be searched in the specified order. If pg_catalog is not in the path then it will be searched before searching any of the path items.

pgroonga schema provides @@ operator. It's also defined in pg_catalog. If we just specify only pgroonga, @@ operator is never used. Because pg_catalog is searched first.

It causes a problem when sequential scan is used by PostgreSQL. Both of @@ operator in pg_catalog and @@ operator in pgroonga provide full-text search feature. But @@ operator in pg_catalog doesn't support languages that doesn't put space between words such as Japanese and Chinese. In sequential scan, we need to use @@ operator in pgroonga to support all languages.

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

OK, makes sense. Can you add a comment explaining this in the relevant section of zproject/settings.py?

I'm thinking something like:

We need to have pgroonga before pg_catalog in the postgres search path, because pgroonga overrides the @@ operator from pg_catalog, and pg_catalog is searched first if not specified in the search path.

Also, you'll need to put this in scripts/setup/postgres-init-db as well (the version that runs for setting up production instances).

@timabbott

timabbott Apr 26, 2016

Member

OK, makes sense. Can you add a comment explaining this in the relevant section of zproject/settings.py?

I'm thinking something like:

We need to have pgroonga before pg_catalog in the postgres search path, because pgroonga overrides the @@ operator from pg_catalog, and pg_catalog is searched first if not specified in the search path.

Also, you'll need to put this in scripts/setup/postgres-init-db as well (the version that runs for setting up production instances).

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

OK, makes sense. Can you add a comment explaining this in the relevant section of zproject/settings.py?

Done.

Also, you'll need to put this in scripts/setup/postgres-init-db as well (the version that runs for setting up production instances).

We should not add pgroonga to search when we don't use PGroonga. If PGroonga is installed but Zulip doesn't use PGroonga, @@ operator in pgroonga schema is used. It's a problem.

We need to know whether does the user want to use PGroonga or not in scripts/setup/postgres-init-db. If the user wants to use PGroonga, we should setup search path. How do we know about it?

@kou

kou Apr 27, 2016

Contributor

OK, makes sense. Can you add a comment explaining this in the relevant section of zproject/settings.py?

Done.

Also, you'll need to put this in scripts/setup/postgres-init-db as well (the version that runs for setting up production instances).

We should not add pgroonga to search when we don't use PGroonga. If PGroonga is installed but Zulip doesn't use PGroonga, @@ operator in pgroonga schema is used. It's a problem.

We need to know whether does the user want to use PGroonga or not in scripts/setup/postgres-init-db. If the user wants to use PGroonga, we should setup search path. How do we know about it?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

Ah, should we use ADDITIONAL_PACKAGES environment variable introduced at 68c6d51 ?

@kou

kou Apr 27, 2016

Contributor

Ah, should we use ADDITIONAL_PACKAGES environment variable introduced at 68c6d51 ?

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

Hmm, this is tricky -- postgres-init-db is run before we give the user the opportunity to put things into settings.py. Let me think about this a bit more and get back to you. It might work to move postgres-init-db to being run a bit later in the installation process.

@timabbott

timabbott Apr 27, 2016

Member

Hmm, this is tricky -- postgres-init-db is run before we give the user the opportunity to put things into settings.py. Let me think about this a bit more and get back to you. It might work to move postgres-init-db to being run a bit later in the installation process.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

From our discussion on provision.py, it sounds like you're planning to defer enabling this feature in production for a follow-up PR; since this is related to the same goal, it's fine for this to be bundled with that follow-up issue.

@timabbott

timabbott Apr 27, 2016

Member

From our discussion on provision.py, it sounds like you're planning to defer enabling this feature in production for a follow-up PR; since this is related to the same goal, it's fine for this to be bundled with that follow-up issue.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Apr 25, 2016

Member

Cool, thanks for working on this @kou! I posted a bunch of comments and questions; let me know when you have a new version ready!

Member

timabbott commented Apr 25, 2016

Cool, thanks for working on this @kou! I posted a bunch of comments and questions; let me know when you have a new version ready!

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Apr 26, 2016

Contributor

I've pushed some commits to apply your comments.

Contributor

kou commented Apr 26, 2016

I've pushed some commits to apply your comments.

Show outdated Hide outdated docs/full-text-search.md
TODO: Describe how to enable PGroonga on installed Zulip.
Now, you can use full-text search against all languages.

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

Does the user need to add any dictionary files or do other setup to make PGROONGA work?

@timabbott

timabbott Apr 26, 2016

Member

Does the user need to add any dictionary files or do other setup to make PGROONGA work?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

There are no more setup.
PGroonga can use dictionary based full-text search and not dictionary based full-text search. In this pull request, we use not dictionary based full-text search. Because we need dictionaries for all languages to support all languages with dictionary based full-text search.

@kou

kou Apr 27, 2016

Contributor

There are no more setup.
PGroonga can use dictionary based full-text search and not dictionary based full-text search. In this pull request, we use not dictionary based full-text search. Because we need dictionaries for all languages to support all languages with dictionary based full-text search.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

I see -- maybe we should add documentation on how to add a dictionary for a particular language to PGroonga?

@timabbott

timabbott Apr 27, 2016

Member

I see -- maybe we should add documentation on how to add a dictionary for a particular language to PGroonga?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

I think that we don't need it. Because we use dictionary based full-text search, we can't support all languages. If the user who doesn't need to support all languages, the user will use the current (default) full-text search feature.

@kou

kou Apr 27, 2016

Contributor

I think that we don't need it. Because we use dictionary based full-text search, we can't support all languages. If the user who doesn't need to support all languages, the user will use the current (default) full-text search feature.

Show outdated Hide outdated docs/full-text-search.md
Then, you grant `USAGE` privilege on `pgroonga` schema to `zulip`
user:
GRANT USAGE ON SCHEMA pgroonga TO zulip;

This comment has been minimized.

@timabbott

timabbott Apr 26, 2016

Member

Can we move the GRANT USAGE part into the pgroonga setup script?

@timabbott

timabbott Apr 26, 2016

Member

Can we move the GRANT USAGE part into the pgroonga setup script?

This comment has been minimized.

@kou

kou Apr 27, 2016

Contributor

Yes when we decide how to provide the PGroonga setup script. It will be discussed at https://github.com/zulip/zulip/pull/700/files/3532f455a1bff2fb8aa413cceaa6b1aba9fdb8de#r60951735 or new issue.

@kou

kou Apr 27, 2016

Contributor

Yes when we decide how to provide the PGroonga setup script. It will be discussed at https://github.com/zulip/zulip/pull/700/files/3532f455a1bff2fb8aa413cceaa6b1aba9fdb8de#r60951735 or new issue.

This comment has been minimized.

@timabbott

timabbott Apr 27, 2016

Member

OK.

@timabbott
@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Apr 26, 2016

Member

@kou thanks for addressing those comments! I posted a few more issues.

The other thing that I think would be really good to have here is some sort of automated testing to confirm the pgroonga implementation doesn't get broken the next time we do refactoring in the search code paths. I'm not totally sure what the best way to do this is. https://docs.djangoproject.com/en/1.8/topics/testing/tools/#overriding-settings would help with letting us set different settings for a test case, but since this involves a modified database schema, we may need to do something more complex to do the testing.

Member

timabbott commented Apr 26, 2016

@kou thanks for addressing those comments! I posted a few more issues.

The other thing that I think would be really good to have here is some sort of automated testing to confirm the pgroonga implementation doesn't get broken the next time we do refactoring in the search code paths. I'm not totally sure what the best way to do this is. https://docs.djangoproject.com/en/1.8/topics/testing/tools/#overriding-settings would help with letting us set different settings for a test case, but since this involves a modified database schema, we may need to do something more complex to do the testing.

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Apr 27, 2016

Contributor

The other thing that I think would be really good to have here is some sort of automated testing to confirm the pgroonga implementation doesn't get broken the next time we do refactoring in the search code paths.

I've added it: 062afa6

Contributor

kou commented Apr 27, 2016

The other thing that I think would be really good to have here is some sort of automated testing to confirm the pgroonga implementation doesn't get broken the next time we do refactoring in the search code paths.

I've added it: 062afa6

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Apr 27, 2016

Member

Posted a few more comments. You have a lint failure that travis CI has flagged. The remaining big issue from my perspective is patching the initial migration as the model for supporting USING_PGROONGA. I think it will make it quite difficult for anyone to switch between using pgroonga and the default postgres full text search.

The simplest fix would be to add a scripts/setup/enable-pgroonga script (and maybe a corresponding disable script) to setup pgroonga and add the PGroonga index, which is outside the zerver migration model. Perhaps the nicest approach might be to make a separate Django app for PGroonga with its own migration sequence; the user would add the new app to INSTALLED_APPS and run manage.py migrate to enable PGroonga; this would leave us the flexibility to update the PGroonga database schema in a natural way by adding new migrations.

Maybe the simplest path to getting this merged would be to move the PGroonga setup code out of the first migration and instead just put it into the docs/ file along with the GRANT statement, and then we can put it into a script as we get this productionizable?

What do you think @kou?

Member

timabbott commented Apr 27, 2016

Posted a few more comments. You have a lint failure that travis CI has flagged. The remaining big issue from my perspective is patching the initial migration as the model for supporting USING_PGROONGA. I think it will make it quite difficult for anyone to switch between using pgroonga and the default postgres full text search.

The simplest fix would be to add a scripts/setup/enable-pgroonga script (and maybe a corresponding disable script) to setup pgroonga and add the PGroonga index, which is outside the zerver migration model. Perhaps the nicest approach might be to make a separate Django app for PGroonga with its own migration sequence; the user would add the new app to INSTALLED_APPS and run manage.py migrate to enable PGroonga; this would leave us the flexibility to update the PGroonga database schema in a natural way by adding new migrations.

Maybe the simplest path to getting this merged would be to move the PGroonga setup code out of the first migration and instead just put it into the docs/ file along with the GRANT statement, and then we can put it into a script as we get this productionizable?

What do you think @kou?

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Apr 29, 2016

Contributor

Posted a few more comments.

I think that I've applied all your comments.

You have a lint failure that travis CI has flagged.

Sorry. There was a code for debug. I've fixed it.

Contributor

kou commented Apr 29, 2016

Posted a few more comments.

I think that I've applied all your comments.

You have a lint failure that travis CI has flagged.

Sorry. There was a code for debug. I've fixed it.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott May 3, 2016

Member

OK cool; I'm working on getting a Zulip 1.3.11 release out the door and then will re-review and hopefully merge this just after (since I think it's worth having a bit of time on master befoer this goes into a release)

Member

timabbott commented May 3, 2016

OK cool; I'm working on getting a Zulip 1.3.11 release out the door and then will re-review and hopefully merge this just after (since I think it's worth having a bit of time on master befoer this goes into a release)

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou May 3, 2016

Contributor

OK. I wait for your review.

Contributor

kou commented May 3, 2016

OK. I wait for your review.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott May 19, 2016

Member

@kou just wanted to update you that I still really want to review this, but I simply haven't had time to do it and I'm about to go on a 2-week vacation, so it may be a little while. I'm really sorry about the delay here!

Member

timabbott commented May 19, 2016

@kou just wanted to update you that I still really want to review this, but I simply haven't had time to do it and I'm about to go on a 2-week vacation, so it may be a little while. I'm really sorry about the delay here!

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou May 19, 2016

Contributor

Thanks for notifying your status. Have a nice vacation!

Contributor

kou commented May 19, 2016

Thanks for notifying your status. Have a nice vacation!

@Jianchun1

This comment has been minimized.

Show comment
Hide comment
@Jianchun1

Jianchun1 Jun 2, 2016

Contributor

hi all, I have enable the Chinese full-text search in my Zulip production environment. It works well for some days. I hope it would be a feature of the next release. Thanks for all your works.

Contributor

Jianchun1 commented Jun 2, 2016

hi all, I have enable the Chinese full-text search in my Zulip production environment. It works well for some days. I hope it would be a feature of the next release. Thanks for all your works.

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jun 7, 2016

Contributor

@timabbott Could you restart this pull request?

Contributor

kou commented Jun 7, 2016

@timabbott Could you restart this pull request?

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 10, 2016

Member

@kou yep I've recently returned from my trip, and am trying to catch up on a large backlog of open pull requests. I'm hopefully going to be able to review this again properly soon. Sorry again for the delays @kou!

And thanks @Jianchun1 for the report that this is working for you in production! It's very helpful data.

Member

timabbott commented Jun 10, 2016

@kou yep I've recently returned from my trip, and am trying to catch up on a large backlog of open pull requests. I'm hopefully going to be able to review this again properly soon. Sorry again for the delays @kou!

And thanks @Jianchun1 for the report that this is working for you in production! It's very helpful data.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 17, 2016

Member

@kou I found time to take another detailed look at this, and tested it out; things went smoothly except for a couple small issues:

  • It seems the enable/disable scripts need to run as root, but they aren't run under sudo in the docs; can you update the docs?
  • Do you need to add the universe repository in the provision.py file?

I have a few questions about how pgroonga's release and packaging process works:

  • Is there a plan for PGroonga being included in Debian upstream (and thus eventually Ubuntu)?
  • To what extent is PGroonga in a stable state of development? E.g. can we assume that it won't have major backwards-incompatible upgrades in the future? If so, we should at least document this status.
  • If we're going to be using the PPA in all Zulip production environments, we're putting all Zulip users at (hopefully small) risk of apt-get upgrade breaking their system, so I'd like to understand the tradeoffs there. An alternative, more conservative approach could be to only add the Pgroonga PPA and package in production if the feature is enabled, so problems with PGroonga can only affect those sites that are using it. We can do easily that by making a separate puppet manifest for it. (I think it's fine to have it installed in development for everyone).
  • To what extent can we expect the PGroonga PPA to have packages for all major versions of Ubuntu (+ supported Postgres versions), and to be kept up to date with any security updates?
  • Is there benchmark data on the performance of PGroonga compares with Postgres' built-in full-text search? I think I'd want to have that before switching our full-text search to use it for everything (as opposed to just an option), which seems like the long-term path (since full-text search working in all languages is clearly better than just working in some).

In the meantime I can work on writing the rest of the docs on our full-text search system, so that we can remove the TODOs from this pull request.

Member

timabbott commented Jun 17, 2016

@kou I found time to take another detailed look at this, and tested it out; things went smoothly except for a couple small issues:

  • It seems the enable/disable scripts need to run as root, but they aren't run under sudo in the docs; can you update the docs?
  • Do you need to add the universe repository in the provision.py file?

I have a few questions about how pgroonga's release and packaging process works:

  • Is there a plan for PGroonga being included in Debian upstream (and thus eventually Ubuntu)?
  • To what extent is PGroonga in a stable state of development? E.g. can we assume that it won't have major backwards-incompatible upgrades in the future? If so, we should at least document this status.
  • If we're going to be using the PPA in all Zulip production environments, we're putting all Zulip users at (hopefully small) risk of apt-get upgrade breaking their system, so I'd like to understand the tradeoffs there. An alternative, more conservative approach could be to only add the Pgroonga PPA and package in production if the feature is enabled, so problems with PGroonga can only affect those sites that are using it. We can do easily that by making a separate puppet manifest for it. (I think it's fine to have it installed in development for everyone).
  • To what extent can we expect the PGroonga PPA to have packages for all major versions of Ubuntu (+ supported Postgres versions), and to be kept up to date with any security updates?
  • Is there benchmark data on the performance of PGroonga compares with Postgres' built-in full-text search? I think I'd want to have that before switching our full-text search to use it for everything (as opposed to just an option), which seems like the long-term path (since full-text search working in all languages is clearly better than just working in some).

In the meantime I can work on writing the rest of the docs on our full-text search system, so that we can remove the TODOs from this pull request.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 17, 2016

Member

BTW, thanks for your work on this and the upstream PGroonga tool -- I'm definitely excited about being able to offer full-text search not just in English!

Member

timabbott commented Jun 17, 2016

BTW, thanks for your work on this and the upstream PGroonga tool -- I'm definitely excited about being able to offer full-text search not just in English!

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jun 18, 2016

Contributor
  • It seems the enable/disable scripts need to run as root, but they aren't run under sudo in the docs; can you update the docs?

Sure. I've added the following description:

The following processes should be executed as the root user. Run:

sudo -i

The description is based on https://zulip.org/server.html :

The installation process should be executed as the root user. Run:

sudo -i


  • Do you need to add the universe repository in the provision.py file?

Yes. postgresql-9.4-pgroonga package depends on liblz4-1 package. It's in the universe repository in Ubuntu 14.04: http://packages.ubuntu.com/trusty/liblz4-1

It's in the main repository since Ubuntu 16.04: http://packages.ubuntu.com/xenial/liblz4-1

Contributor

kou commented Jun 18, 2016

  • It seems the enable/disable scripts need to run as root, but they aren't run under sudo in the docs; can you update the docs?

Sure. I've added the following description:

The following processes should be executed as the root user. Run:

sudo -i

The description is based on https://zulip.org/server.html :

The installation process should be executed as the root user. Run:

sudo -i


  • Do you need to add the universe repository in the provision.py file?

Yes. postgresql-9.4-pgroonga package depends on liblz4-1 package. It's in the universe repository in Ubuntu 14.04: http://packages.ubuntu.com/trusty/liblz4-1

It's in the main repository since Ubuntu 16.04: http://packages.ubuntu.com/xenial/liblz4-1

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jun 18, 2016

Contributor
  • Is there a plan for PGroonga being included in Debian upstream (and thus eventually Ubuntu)?

Unfortunately, no.

  • To what extent is PGroonga in a stable state of development? E.g. can we assume that it won't have major backwards-incompatible upgrades in the future? If so, we should at least document this status.

PGroonga 1.x keeps backward compatibility. (The current version is 1.1.0.)

But PGroonga 2.0.0 introduces backward incompatible changes. PGroonga users need to recreate PGroonga indexes and change SQL (custom operators in WHERE). PGroonga 2.0.0 will be released within a year.

I'll send a pull request when PGroonga 2.0.0 is released.

  • If we're going to be using the PPA in all Zulip production environments, we're putting all Zulip users at (hopefully small) risk of apt-get upgrade breaking their system, so I'd like to understand the tradeoffs there. An alternative, more conservative approach could be to only add the Pgroonga PPA and package in production if the feature is enabled, so problems with PGroonga can only affect those sites that are using it. We can do easily that by making a separate puppet manifest for it. (I think it's fine to have it installed in development for everyone).

Here are risks:

  • If the PPA site is down, apt-get update will be failed.
  • If the PPA introduces wrong dependency, apt-get upgrade will be failed. (It'll be fixed soon when it's occurred. Because Groonga (full text search engine used in PGroonga) and Mroonga (storage engine based on Groonga for MySQL) users are also using the PPA. They will find the problem before Zulip users find the problem. If the problem is found, we'll fix the problem soon.)
  • When PGroonga 2.0.0 is released but Zulip doesn't support PGroonga 2.0.0 yet, apt-get upgrade will break Zulip. (It may be better that we use Pin: version 1.* in /etc/apt/preferences.)

There is no Zulip breakage risk by apt-get upgrade for Zulip users who install PGroonga but doesn't use PGroonga. PGroonga is never used until Zulip users run CREATE EXTENSION pgroonga.

  • To what extent can we expect the PGroonga PPA to have packages for all major versions of Ubuntu (+ supported Postgres versions), and to be kept up to date with any security updates?

The PGroonga PPA will support Ubuntu 14.04 or later (= PostgreSQL 9.3 or later) at least until the next few years. We may drop Ubuntu 14.04 before Ubuntu 14.04 EOL when we need to require any features that aren't in Ubuntu 14.04. We'll keep supporting Ubuntu 14.04 (= PostgreSQL 9.3) as much as possible.

The PGroonga PPA is updated new PGroonga version is released. If any security issues are found, new PGroonga version is released. The new package is also released.

  • Is there benchmark data on the performance of PGroonga compares with Postgres' built-in full-text search? I think I'd want to have that before switching our full-text search to use it for everything (as opposed to just an option), which seems like the long-term path (since full-text search working in all languages is clearly better than just working in some).

No.

There is benchmark data for PGroonga and pg_bigm. pg_bigm is a full text search extension based on GIN. The benchmark uses Japanese Wikipedia data: http://groonga.org/en/blog/2015/10/29/pgroonga-1.0.0.html

Contributor

kou commented Jun 18, 2016

  • Is there a plan for PGroonga being included in Debian upstream (and thus eventually Ubuntu)?

Unfortunately, no.

  • To what extent is PGroonga in a stable state of development? E.g. can we assume that it won't have major backwards-incompatible upgrades in the future? If so, we should at least document this status.

PGroonga 1.x keeps backward compatibility. (The current version is 1.1.0.)

But PGroonga 2.0.0 introduces backward incompatible changes. PGroonga users need to recreate PGroonga indexes and change SQL (custom operators in WHERE). PGroonga 2.0.0 will be released within a year.

I'll send a pull request when PGroonga 2.0.0 is released.

  • If we're going to be using the PPA in all Zulip production environments, we're putting all Zulip users at (hopefully small) risk of apt-get upgrade breaking their system, so I'd like to understand the tradeoffs there. An alternative, more conservative approach could be to only add the Pgroonga PPA and package in production if the feature is enabled, so problems with PGroonga can only affect those sites that are using it. We can do easily that by making a separate puppet manifest for it. (I think it's fine to have it installed in development for everyone).

Here are risks:

  • If the PPA site is down, apt-get update will be failed.
  • If the PPA introduces wrong dependency, apt-get upgrade will be failed. (It'll be fixed soon when it's occurred. Because Groonga (full text search engine used in PGroonga) and Mroonga (storage engine based on Groonga for MySQL) users are also using the PPA. They will find the problem before Zulip users find the problem. If the problem is found, we'll fix the problem soon.)
  • When PGroonga 2.0.0 is released but Zulip doesn't support PGroonga 2.0.0 yet, apt-get upgrade will break Zulip. (It may be better that we use Pin: version 1.* in /etc/apt/preferences.)

There is no Zulip breakage risk by apt-get upgrade for Zulip users who install PGroonga but doesn't use PGroonga. PGroonga is never used until Zulip users run CREATE EXTENSION pgroonga.

  • To what extent can we expect the PGroonga PPA to have packages for all major versions of Ubuntu (+ supported Postgres versions), and to be kept up to date with any security updates?

The PGroonga PPA will support Ubuntu 14.04 or later (= PostgreSQL 9.3 or later) at least until the next few years. We may drop Ubuntu 14.04 before Ubuntu 14.04 EOL when we need to require any features that aren't in Ubuntu 14.04. We'll keep supporting Ubuntu 14.04 (= PostgreSQL 9.3) as much as possible.

The PGroonga PPA is updated new PGroonga version is released. If any security issues are found, new PGroonga version is released. The new package is also released.

  • Is there benchmark data on the performance of PGroonga compares with Postgres' built-in full-text search? I think I'd want to have that before switching our full-text search to use it for everything (as opposed to just an option), which seems like the long-term path (since full-text search working in all languages is clearly better than just working in some).

No.

There is benchmark data for PGroonga and pg_bigm. pg_bigm is a full text search extension based on GIN. The benchmark uses Japanese Wikipedia data: http://groonga.org/en/blog/2015/10/29/pgroonga-1.0.0.html

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 21, 2016

Member

OK, cool. It'd be great to have a benchmark against the default postgres full-text search, but that's certianly plenty of evidence that this is efficient :). My main concern is the PGroona 2.0 compatibility issue:

When PGroonga 2.0.0 is released but Zulip doesn't support PGroonga 2.0.0 yet, apt-get upgrade will break Zulip. (It may be better that we use Pin: version 1.* in /etc/apt/preferences.)

I think probably we can reasonably include with the Zulip PGroonga configuration an /etc/apt/preferences.d file to ping PGroonga to 1.0.x. When PGroonga 2.0.0 comes out, will PGroonga 1.0.x still be available in the PGroonga PPA repository? If not, we may need to build PGroonga packages for our own PPA, so we can make sure old versions of Zulip remain installable.

Member

timabbott commented Jun 21, 2016

OK, cool. It'd be great to have a benchmark against the default postgres full-text search, but that's certianly plenty of evidence that this is efficient :). My main concern is the PGroona 2.0 compatibility issue:

When PGroonga 2.0.0 is released but Zulip doesn't support PGroonga 2.0.0 yet, apt-get upgrade will break Zulip. (It may be better that we use Pin: version 1.* in /etc/apt/preferences.)

I think probably we can reasonably include with the Zulip PGroonga configuration an /etc/apt/preferences.d file to ping PGroonga to 1.0.x. When PGroonga 2.0.0 comes out, will PGroonga 1.0.x still be available in the PGroonga PPA repository? If not, we may need to build PGroonga packages for our own PPA, so we can make sure old versions of Zulip remain installable.

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jun 23, 2016

Contributor

When PGroonga 2.0.0 comes out, will PGroonga 1.0.x still be available in the PGroonga PPA repository?

Ah, no. PPA provides only the latest versions.

OK. I change my plan. PGroonga 2.x keeps 1.x API and make 1.x API deprecated.
I'll send a pull request to use PGroonga 2.x API after I release PGroonga 2.0.0.
Zulip can use PGroonga 1.x API while PGroonga 2.x. I drop PGroonga 1.x API when I release PGroonga 3.0.0. All Zulip users who use PGroonga should upgrade to PGroonga 2.0.0 or later until PGroonga 3.0.0 is released.

What do you think about this plan?

Contributor

kou commented Jun 23, 2016

When PGroonga 2.0.0 comes out, will PGroonga 1.0.x still be available in the PGroonga PPA repository?

Ah, no. PPA provides only the latest versions.

OK. I change my plan. PGroonga 2.x keeps 1.x API and make 1.x API deprecated.
I'll send a pull request to use PGroonga 2.x API after I release PGroonga 2.0.0.
Zulip can use PGroonga 1.x API while PGroonga 2.x. I drop PGroonga 1.x API when I release PGroonga 3.0.0. All Zulip users who use PGroonga should upgrade to PGroonga 2.0.0 or later until PGroonga 3.0.0 is released.

What do you think about this plan?

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 23, 2016

Member

That plan sounds great! I imagine it'd help a lot of other projects that might use PGroonga as well.

Member

timabbott commented Jun 23, 2016

That plan sounds great! I imagine it'd help a lot of other projects that might use PGroonga as well.

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jun 24, 2016

Contributor

Thanks for confirming the new plan.
Is there any concern to merge this pull request?

Contributor

kou commented Jun 24, 2016

Thanks for confirming the new plan.
Is there any concern to merge this pull request?

Show outdated Hide outdated zproject/local_settings_template.py
# feature. See
# https://zulip.readthedocs.org/en/latest/full-text-search.html how to
# enable PGroonga based full-text search feature.
# USING_PGROONGA = True

This comment has been minimized.

@timabbott

timabbott Jun 26, 2016

Member

maybe change this to just read USING_PGROONGA = False and then the user can change False -> True to enable it? The instructions would be simpler this way.

@timabbott

timabbott Jun 26, 2016

Member

maybe change this to just read USING_PGROONGA = False and then the user can change False -> True to enable it? The instructions would be simpler this way.

This comment has been minimized.

@kou

kou Jun 27, 2016

Contributor

Done.

@kou

kou Jun 27, 2016

Contributor

Done.

Show outdated Hide outdated docs/full-text-search.md
After:
# USING_PGROONGA = True

This comment has been minimized.

@timabbott

timabbott Jun 26, 2016

Member

See my comment below about changing how this setting appears in local_settings_template.py, but I think we can simplify these instructions substantially once it's just "change False to True" or "True to False".

@timabbott

timabbott Jun 26, 2016

Member

See my comment below about changing how this setting appears in local_settings_template.py, but I think we can simplify these instructions substantially once it's just "change False to True" or "True to False".

This comment has been minimized.

@kou

kou Jun 27, 2016

Contributor

Done.

@kou

kou Jun 27, 2016

Contributor

Done.

Show outdated Hide outdated scripts/setup/enable-pgroonga
UPDATE zerver_message SET search_pgroonga = subject || ' ' || rendered_content;
CREATE INDEX zerver_message_search_pgroonga ON zerver_message

This comment has been minimized.

@timabbott

timabbott Jun 26, 2016

Member

We probably want this to use CREATE INDEX CONCURRENTLY; otherwise, nobody will be able to send messages while this is running.

@timabbott

timabbott Jun 26, 2016

Member

We probably want this to use CREATE INDEX CONCURRENTLY; otherwise, nobody will be able to send messages while this is running.

This comment has been minimized.

@kou

kou Jun 27, 2016

Contributor

Done.

@kou

kou Jun 27, 2016

Contributor

Done.

Show outdated Hide outdated scripts/setup/enable-pgroonga
)
# Clear memcached to avoid contamination from previous database state
sh "$(dirname "$0")/flush-memcached"

This comment has been minimized.

@timabbott

timabbott Jun 26, 2016

Member

We should probably run this just after the ALTER TABLE line, before creating the new index, since the contamination issue is primarily around the new column on zerver_message.

@timabbott

timabbott Jun 26, 2016

Member

We should probably run this just after the ALTER TABLE line, before creating the new index, since the contamination issue is primarily around the new column on zerver_message.

This comment has been minimized.

@kou

kou Jun 27, 2016

Contributor

I think that we don't need to run the line hurry.

Because the new column on zerver_message isn't used until we set True to USING_PGROONGA and run restart-server. We'll do them after enable-pgroonga is finished.

@kou

kou Jun 27, 2016

Contributor

I think that we don't need to run the line hurry.

Because the new column on zerver_message isn't used until we set True to USING_PGROONGA and run restart-server. We'll do them after enable-pgroonga is finished.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 26, 2016

Member

I posted a few smaller comments; the one larger issue is that I'd prefer to be using the Django migrations system for managing Zulip's database migrations, rather than some scripts off to the side. Here's what I think we could do that would work nicely:

  • We create a new (very small) Django app in the Zulip tree called "pgroonga", which just contains an single migration containing the code from enable-pgroonga (with a "reverse" migration with the code from disable-pgroonga).
  • We modify INSTALLED_APPS in zproject/settings.py to have pgroonga added if and only if USING_PGROONGA=True.
  • The instructions to install it become "set USING_PGROONGA and run ./manage.py migrate"
  • The instructions to remove it become "run ./manage.py migrate pgroonga zero and unset USING_PGROONGA"

Since that's basically just rearranging the code you've already written, I think this shouldn't be a lot of work.

Member

timabbott commented Jun 26, 2016

I posted a few smaller comments; the one larger issue is that I'd prefer to be using the Django migrations system for managing Zulip's database migrations, rather than some scripts off to the side. Here's what I think we could do that would work nicely:

  • We create a new (very small) Django app in the Zulip tree called "pgroonga", which just contains an single migration containing the code from enable-pgroonga (with a "reverse" migration with the code from disable-pgroonga).
  • We modify INSTALLED_APPS in zproject/settings.py to have pgroonga added if and only if USING_PGROONGA=True.
  • The instructions to install it become "set USING_PGROONGA and run ./manage.py migrate"
  • The instructions to remove it become "run ./manage.py migrate pgroonga zero and unset USING_PGROONGA"

Since that's basically just rearranging the code you've already written, I think this shouldn't be a lot of work.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jun 26, 2016

Member

Also I pushed ea611c0; so you can rebase and merge your new docs with that, to eliminate the TODOs.

Member

timabbott commented Jun 26, 2016

Also I pushed ea611c0; so you can rebase and merge your new docs with that, to eliminate the TODOs.

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jul 2, 2016

Contributor

I created "pgroonga" Django application and moved PGroonga related SQL to the application.

Here are limitations by this change:

  • Users need to add SUPERUSER permission to zulip role in PostgreSQL temporary while enabling/disabling PGroonga. Because CREATE EXTENSION requires SUPERUSER permission.
  • We can't use CREATE INDEX CONCURRENTLY. Because we can't use CREATE INDEX CONCURRENTLY in transaction and Django migration uses transaction.
Contributor

kou commented Jul 2, 2016

I created "pgroonga" Django application and moved PGroonga related SQL to the application.

Here are limitations by this change:

  • Users need to add SUPERUSER permission to zulip role in PostgreSQL temporary while enabling/disabling PGroonga. Because CREATE EXTENSION requires SUPERUSER permission.
  • We can't use CREATE INDEX CONCURRENTLY. Because we can't use CREATE INDEX CONCURRENTLY in transaction and Django migration uses transaction.
@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jul 6, 2016

Member

Urgh, those are annoying limitations. The CREATE INDEX CONCURRENTLY issue is apparently fixed in Django 1.10, but that doesn't help us quite yet. The other issue we may need to resolve another way (don't want to change the permissions model). Sigh. I guess we could have a script run as root do the CREATE EXTENSION and the rest in the migration? Or perhaps better, we can do the CREATE EXTENSION as part of the installation of the extension...

I'll need to find more time to think about this later...

Member

timabbott commented Jul 6, 2016

Urgh, those are annoying limitations. The CREATE INDEX CONCURRENTLY issue is apparently fixed in Django 1.10, but that doesn't help us quite yet. The other issue we may need to resolve another way (don't want to change the permissions model). Sigh. I guess we could have a script run as root do the CREATE EXTENSION and the rest in the migration? Or perhaps better, we can do the CREATE EXTENSION as part of the installation of the extension...

I'll need to find more time to think about this later...

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Jul 9, 2016

Contributor

I'll need to find more time to think about this later...

Does it mean that this pull request isn't merged soon...?

Contributor

kou commented Jul 9, 2016

I'll need to find more time to think about this later...

Does it mean that this pull request isn't merged soon...?

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Jul 10, 2016

Member

No, I think this is important and I really want to get this merged soon, I'm just extremely busy right now and need to find the time. Sorry again about the delays!

Member

timabbott commented Jul 10, 2016

No, I think this is important and I really want to get this merged soon, I'm just extremely busy right now and need to find the time. Sorry again about the delays!

@brainwane

This comment has been minimized.

Show comment
Hide comment
@brainwane

brainwane Aug 17, 2016

Contributor

@timabbott Good news: @kou can probably come to the office hour on August 25/26. So if you have some time to do some more thinking before then, you & kou can talk about it live next week. :)

(I've just reread the comments on this pull request. I think the main thing that needs to happen is that Tim needs to revisit his open questions from July 6th.)

Contributor

brainwane commented Aug 17, 2016

@timabbott Good news: @kou can probably come to the office hour on August 25/26. So if you have some time to do some more thinking before then, you & kou can talk about it live next week. :)

(I've just reread the comments on this pull request. I think the main thing that needs to happen is that Tim needs to revisit his open questions from July 6th.)

kou added some commits Apr 24, 2016

Support full text search for all languages.
GitHub: #615

It uses PGroonga instead of built-in PostgreSQL full text search
feature. Because built-in PostgreSQL full text search doesn't support
languages that don't put space between terms such as Japanese, Chinese
and so on. PGroonga supports all languages including Japanese and
Chinese.

This change doesn't enable PGroonga by default. It means that this
change doesn't change the current behavior.

You need to set True to USING_PGROONGA in zrpoject/settings.py to enable
PGroonga.
Extract PGroonga enable/disable code as separated scripts.
It's for enabling PGroonga based full-text search feature on already
installed Zulip instance.
Support deferred full-text search index update for PGroonga.
It reduces response time on sending message. It's good for users.
Remove needless "su".
"su" is used in these scripts. We need to root privilege to use "su".
Specify operator class for PGroonga v1 API explicitly
It will be useful to work with PGroonga 2.x without any changes.
Set False to USING_PGROONGA by default in configuration template
It's for simplifying enable/disable instructions.
Use CREATE INDEX CONCURRENTLY
It's for supporting sending messages while creating index.
Use pgroonga project to enable/disable PGroonga
You can enable PGroonga by the following command:

    ./manage.py migrate pgroonga

You can disable PGroonga by the following command:

    ./manage.py migrate pgroonga zero

There are some limitations:

  * You need to add SUPERUSER permission to "zulip" role temporary while
    you enable/disable PGroonga. Because CREATE/DROP EXTENSION require
    SUPERUSER permission.

  * You can't send new messages until the PGroonga enable migration is
    finished. Because CREATE INDEX CONCURRENTLY can't be used in
    transaction.
test: Add end-to-end test for PGroonga search
It fails for now because Zulip requires match locations in characters
instead of bytes in a90b470.

PGroonga provides only in bytes version. I need to implement in
characters version.
@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Aug 27, 2016

Member

Cool, the test looks good; I think we should just make a second copy of highlight_string that uses bytes for this code path (it's probably a bug that the existing full-text search system, tsearch_extras doesn't measure in bytes).

Member

timabbott commented Aug 27, 2016

Cool, the test looks good; I think we should just make a second copy of highlight_string that uses bytes for this code path (it's probably a bug that the existing full-text search system, tsearch_extras doesn't measure in bytes).

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Aug 27, 2016

Member

I did a bit of cleanups to the code, tweaked the tests a bit and squashed the commits, creating this: https://github.com/timabbott/zulip/tree/pgroonga-test, which I think is ready to merge (pending Travis CI passing).

@kou thanks so much for all your work on this PR (and also on building PGroonga itself)! I'm really excited about this feature. I'm going on vacation tomorrow but when I return, I plan to enable PGroonga on zulip.tabbott.net; with luck we can test this extensively enough that we can just remove tsearch-extras and use PGroonga as the only supported full-text search in Zulip (since given both options, I imagine everyone would want the every-language version).

Sorry again for all the delay in figuring out how to integrate this.

Member

timabbott commented Aug 27, 2016

I did a bit of cleanups to the code, tweaked the tests a bit and squashed the commits, creating this: https://github.com/timabbott/zulip/tree/pgroonga-test, which I think is ready to merge (pending Travis CI passing).

@kou thanks so much for all your work on this PR (and also on building PGroonga itself)! I'm really excited about this feature. I'm going on vacation tomorrow but when I return, I plan to enable PGroonga on zulip.tabbott.net; with luck we can test this extensively enough that we can just remove tsearch-extras and use PGroonga as the only supported full-text search in Zulip (since given both options, I imagine everyone would want the every-language version).

Sorry again for all the delay in figuring out how to integrate this.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Aug 27, 2016

Member

Merged, thanks @kou!

Member

timabbott commented Aug 27, 2016

Merged, thanks @kou!

@timabbott timabbott closed this Aug 27, 2016

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Aug 27, 2016

Contributor

Thanks!
I'll start using Zulip with PGroonga. Some problems will be happen but we'll be able to fix them. Please mention me when full-text search related issues are reported.

Contributor

kou commented Aug 27, 2016

Thanks!
I'll start using Zulip with PGroonga. Some problems will be happen but we'll be able to fix them. Please mention me when full-text search related issues are reported.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Aug 27, 2016

Member

Will do!

Member

timabbott commented Aug 27, 2016

Will do!

@kou kou deleted the kou:full-text-search-support-all-languages branch Dec 1, 2016

@kou

This comment has been minimized.

Show comment
Hide comment
@kou

kou Dec 1, 2016

Contributor

FYI: I have published benchmark result with English Wikipedia data against PGroonga, textsearch and pg_trgm: http://pgroonga.github.io/reference/pgroonga-versus-textsearch-and-pg-trgm.html

Search performance: PGroonga and textsearch are similar. pg_trgm is slower.

Index creation performance: PGroonga is the fastest module. PGroong is about 2x faster than textsearch.

Contributor

kou commented Dec 1, 2016

FYI: I have published benchmark result with English Wikipedia data against PGroonga, textsearch and pg_trgm: http://pgroonga.github.io/reference/pgroonga-versus-textsearch-and-pg-trgm.html

Search performance: PGroonga and textsearch are similar. pg_trgm is slower.

Index creation performance: PGroonga is the fastest module. PGroong is about 2x faster than textsearch.

@timabbott

This comment has been minimized.

Show comment
Hide comment
@timabbott

timabbott Dec 2, 2016

Member

Awesome, thanks for following up on that @kou!

Member

timabbott commented Dec 2, 2016

Awesome, thanks for following up on that @kou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment