Sphinx fulltext search engine integration #1259

Merged
merged 58 commits into from Nov 24, 2013

Conversation

Projects
None yet
5 participants
@klimov-paul
Member

klimov-paul commented Nov 20, 2013

Relates to #11

Depends on #1205 and #1249

Here is a Sphinx fulltext search engine integration.
It uses ability of the Sphinx searchd daemon to work via MySQL network protocol.

The code have much in common with “yii\db”, but I have to abandon the idea of direct inheritance (instead of Connection class), since the significant differences are present. Still some classes from “yii\db” are used directly, like “Transaction” and “Expression”.

“Relation” feature is still incomplete, while I am waiting for #1205 and #1249 resolution. Still “hasOne” relations work properly both to and from Sphinx Active Record.

Suggestions and critics are welcome.

klimov-paul added some commits Nov 1, 2013

"yii\sphinx\IndexSchema' created.
"yii\sphinx\Schema' reworked to drop inheritance from "yii\db".
"yii\sphinx\ActiveQuery" refactored.
"yii\sphinx\Schema" caching fixed.
@klimov-paul

This comment has been minimized.

Show comment
Hide comment
@klimov-paul

klimov-paul Nov 20, 2013

Member

You can get code from this PR using following GIT commands:

git checkout -b klimov-paul-sphinx master
git pull https://github.com/klimov-paul/yii2.git sphinx
Member

klimov-paul commented Nov 20, 2013

You can get code from this PR using following GIT commands:

git checkout -b klimov-paul-sphinx master
git pull https://github.com/klimov-paul/yii2.git sphinx
@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Nov 20, 2013

Coverage Status

Coverage remained the same when pulling 7fa5724 on klimov-paul:sphinx into 070bd96 on yiisoft:master.

Coverage Status

Coverage remained the same when pulling 7fa5724 on klimov-paul:sphinx into 070bd96 on yiisoft:master.

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Nov 20, 2013

Member

Great work, will review it in detail later. Why did you copy DataReader?

Member

cebe commented Nov 20, 2013

Great work, will review it in detail later. Why did you copy DataReader?

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Nov 20, 2013

Member

With a quick glance, I noticed quite some code are duplicated, such as DataReader, Command, QueryBuilder, ActiveRecord. Since this is also using PDO, perhaps it's better it uses the same set of DAO classes (with possible enhancements or changes)? Otherwise the maintenance of the code would be a nightmare.

Member

qiangxue commented Nov 20, 2013

With a quick glance, I noticed quite some code are duplicated, such as DataReader, Command, QueryBuilder, ActiveRecord. Since this is also using PDO, perhaps it's better it uses the same set of DAO classes (with possible enhancements or changes)? Otherwise the maintenance of the code would be a nightmare.

@ghost ghost assigned klimov-paul Nov 20, 2013

@klimov-paul

This comment has been minimized.

Show comment
Hide comment
@klimov-paul

klimov-paul Nov 21, 2013

Member

I have started from direct inheritance of “yii\db\QueryBuilder” and “yii\db\Command”, but in the end I have dropped it.
SphinxQL is similar to regular SQL, but have some significant differences:
– Sphinx does not support index schema changes via query: no “create, drop, alter table (index)”, no indexing fields and so on
– Sphinx is very sensitive to the data type of the query parameters: I have to rework the “where” composition to avoid problems with it.
– Sphinx does not support HAVING and some other statements
– Sphinx uses MVA (multi value attributes), which requires rework of “where”, “insert” and “update” composition.

“DataReader” is redundant, that is no doubt. But constructor “yii\db\DataReader::__construct()” prevent me from direct usage of “yii\db\DataReader”:

public function __construct(Command $command, $config = []) // PHP Type restriction for $command
{
    …
}

If the constructor can be changed “yii\sphinx\DataReader” can be dropped.

Member

klimov-paul commented Nov 21, 2013

I have started from direct inheritance of “yii\db\QueryBuilder” and “yii\db\Command”, but in the end I have dropped it.
SphinxQL is similar to regular SQL, but have some significant differences:
– Sphinx does not support index schema changes via query: no “create, drop, alter table (index)”, no indexing fields and so on
– Sphinx is very sensitive to the data type of the query parameters: I have to rework the “where” composition to avoid problems with it.
– Sphinx does not support HAVING and some other statements
– Sphinx uses MVA (multi value attributes), which requires rework of “where”, “insert” and “update” composition.

“DataReader” is redundant, that is no doubt. But constructor “yii\db\DataReader::__construct()” prevent me from direct usage of “yii\db\DataReader”:

public function __construct(Command $command, $config = []) // PHP Type restriction for $command
{
    …
}

If the constructor can be changed “yii\sphinx\DataReader” can be dropped.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Nov 21, 2013

Coverage Status

Coverage decreased (-0.38%) when pulling 18af698 on klimov-paul:sphinx into 070bd96 on yiisoft:master.

Coverage Status

Coverage decreased (-0.38%) when pulling 18af698 on klimov-paul:sphinx into 070bd96 on yiisoft:master.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Nov 21, 2013

Member

It seems most code in Command is the same as yii\db\Command. If the main difference is that some features are not supported by sphinx, it's totally fine we throw NotSupportedException in these methods. If the current design of the Command or other classes causes difficulty in customization, we may discuss to improve it. I think we should not allow so much code duplication in the core code.

Member

qiangxue commented Nov 21, 2013

It seems most code in Command is the same as yii\db\Command. If the main difference is that some features are not supported by sphinx, it's totally fine we throw NotSupportedException in these methods. If the current design of the Command or other classes causes difficulty in customization, we may discuss to improve it. I think we should not allow so much code duplication in the core code.

@klimov-paul

This comment has been minimized.

Show comment
Hide comment
@klimov-paul

klimov-paul Nov 22, 2013

Member

It seems most code in Command is the same as yii\db\Command. If the main difference is that some features are not supported by sphinx, it's totally fine we throw NotSupportedException in these methods.

Make sense – done.

Member

klimov-paul commented Nov 22, 2013

It seems most code in Command is the same as yii\db\Command. If the main difference is that some features are not supported by sphinx, it's totally fine we throw NotSupportedException in these methods.

Make sense – done.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Nov 22, 2013

Coverage Status

Coverage remained the same when pulling 400b531 on klimov-paul:sphinx into 22439d3 on yiisoft:master.

Coverage Status

Coverage remained the same when pulling 400b531 on klimov-paul:sphinx into 22439d3 on yiisoft:master.

@cebe cebe referenced this pull request Nov 22, 2013

Closed

Implement NoSQL Storage for ActiveRecord #11

19 of 20 tasks complete
@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Nov 22, 2013

Member

It looks better now. Thanks.

A few more questions:

  • The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?
  • Can Schema, ColumnSchema and IndexSchema extend from the existing base classes? Yes, there will be some unneeded properties/methods, but I think it's fine because these classes are usually not directly used by users, and it's better we minimize code duplication.
  • What about QueryBuilder? It has major difference from base class and it also has a lot of duplicated code. I can't think of the best way to handle this.
  • What about ActiveRecord? I suppose this needs to be refactored further after @cebe finished his work, right?
Member

qiangxue commented Nov 22, 2013

It looks better now. Thanks.

A few more questions:

  • The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?
  • Can Schema, ColumnSchema and IndexSchema extend from the existing base classes? Yes, there will be some unneeded properties/methods, but I think it's fine because these classes are usually not directly used by users, and it's better we minimize code duplication.
  • What about QueryBuilder? It has major difference from base class and it also has a lot of duplicated code. I can't think of the best way to handle this.
  • What about ActiveRecord? I suppose this needs to be refactored further after @cebe finished his work, right?
@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Nov 22, 2013

Member

The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?

We have the same issue with elastic search. We have two things there index and type.
Redis hasn't its own schema or terms and emulates tables, so I am reusing the name "table" there.

Member

cebe commented Nov 22, 2013

The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?

We have the same issue with elastic search. We have two things there index and type.
Redis hasn't its own schema or terms and emulates tables, so I am reusing the name "table" there.

@klimov-paul

This comment has been minimized.

Show comment
Hide comment
@klimov-paul

klimov-paul Nov 22, 2013

Member

The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?

“Table” is a term, which is used only for relational database or other similar storages.
Most NOSQL solutions use quite another terms. For example: MongoDb uses “collection” term.
I can accept using “table” instead of “index”, but this confusing, especially, while relational database uses own “index” term. But for MongoDb term “table” is completely inconsistent, because Mongo collection is a list of JSON documents, which can not be represented as table anyhow.

It would be better if we use some generic term instead of “table” in the base “db” classes, for example: “entity”, “collection” or something.

Can Schema, ColumnSchema and IndexSchema extend from the existing base classes? Yes, there will be some unneeded properties/methods, but I think it's fine because these classes are usually not directly used by users, and it's better we minimize code duplication.

It makes no sense. Original classes store too much not needed fields:

  • in “yii\db\TableSchema”: “schemaName”, “sequenceName”, “foreignKeys” + method “fixPrimaryKey()”
  • in “yii\db\ColumnSchema”: “allowNull”, “defaultValue”, “enumValues”, “size”, “precision”, “scale”, “autoIncrement”, “unsigned”, “comment” + “typecast()” method differs.

“yii\sphinx\Schema” is similar to “yii\db\Schema”, but has significant difference: index type tracking, filling index and column schemas and so on. Direct inheritance will not save much code, while make class difficult to understand and maintain.

What about QueryBuilder? It has major difference from base class and it also has a lot of duplicated code. I can't think of the best way to handle this.

Direct inheritance makes no sense, because almost all methods should be redeclared. Perhaps we can extract some traits for “where” and “order” composition – this would be ideal.

What about ActiveRecord? I suppose this needs to be refactored further after @cebe finished his work, right?

I am waiting for the interface or/and base class extraction. I assume at least relation handling can be extracted into a base class or trait.
I have to drop “updateCounters” feature, because Sphinx does not support field incrementing. I was close to drop “optimisticLock” as well, but kept it in the end.
Again it would be nice to have traits for such things.

Member

klimov-paul commented Nov 22, 2013

The name "index" is used instead of "table". I understand this is the term for sphnix, but I would still ask if it is possible to use the name "table". @cebe Do nosql DBs have similar naming issue?

“Table” is a term, which is used only for relational database or other similar storages.
Most NOSQL solutions use quite another terms. For example: MongoDb uses “collection” term.
I can accept using “table” instead of “index”, but this confusing, especially, while relational database uses own “index” term. But for MongoDb term “table” is completely inconsistent, because Mongo collection is a list of JSON documents, which can not be represented as table anyhow.

It would be better if we use some generic term instead of “table” in the base “db” classes, for example: “entity”, “collection” or something.

Can Schema, ColumnSchema and IndexSchema extend from the existing base classes? Yes, there will be some unneeded properties/methods, but I think it's fine because these classes are usually not directly used by users, and it's better we minimize code duplication.

It makes no sense. Original classes store too much not needed fields:

  • in “yii\db\TableSchema”: “schemaName”, “sequenceName”, “foreignKeys” + method “fixPrimaryKey()”
  • in “yii\db\ColumnSchema”: “allowNull”, “defaultValue”, “enumValues”, “size”, “precision”, “scale”, “autoIncrement”, “unsigned”, “comment” + “typecast()” method differs.

“yii\sphinx\Schema” is similar to “yii\db\Schema”, but has significant difference: index type tracking, filling index and column schemas and so on. Direct inheritance will not save much code, while make class difficult to understand and maintain.

What about QueryBuilder? It has major difference from base class and it also has a lot of duplicated code. I can't think of the best way to handle this.

Direct inheritance makes no sense, because almost all methods should be redeclared. Perhaps we can extract some traits for “where” and “order” composition – this would be ideal.

What about ActiveRecord? I suppose this needs to be refactored further after @cebe finished his work, right?

I am waiting for the interface or/and base class extraction. I assume at least relation handling can be extracted into a base class or trait.
I have to drop “updateCounters” feature, because Sphinx does not support field incrementing. I was close to drop “optimisticLock” as well, but kept it in the end.
Again it would be nice to have traits for such things.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Nov 23, 2013

Member

All fine with me. The name "index" vs. "table" may depend on the design of AR interface. Let's decide on that later. Other @yiisoft/core-developers, could you also review this PR? Will merge this before alpha.

Member

qiangxue commented Nov 23, 2013

All fine with me. The name "index" vs. "table" may depend on the design of AR interface. Let's decide on that later. Other @yiisoft/core-developers, could you also review this PR? Will merge this before alpha.

extensions/sphinx/ActiveQuery.php
+ * Sets the [[snippetCallback]] to [[fetchSnippetSourceFromModels]], which allows to
+ * fetch the snippet source strings from the Active Record models, using method
+ * [[ActiveRecord::getSnippetSource()]].
+ * Warning: this option should NOT be used with [[asArray]] at the same time!

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

Should we check the fact instead of mentioning it in the phpdoc?

@samdark

samdark Nov 23, 2013

Member

Should we check the fact instead of mentioning it in the phpdoc?

This comment has been minimized.

@klimov-paul

klimov-paul Nov 23, 2013

Member

If this rule will be broken, it will PHP fatal error: "attempt to call method on non object"

@klimov-paul

klimov-paul Nov 23, 2013

Member

If this rule will be broken, it will PHP fatal error: "attempt to call method on non object"

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

Which is kinda weird error to get...

@samdark

samdark Nov 23, 2013

Member

Which is kinda weird error to get...

This comment has been minimized.

@klimov-paul

klimov-paul Nov 23, 2013

Member

That is why I have placed a warning

@klimov-paul

klimov-paul Nov 23, 2013

Member

That is why I have placed a warning

This comment has been minimized.

@cebe

cebe Nov 23, 2013

Member

better check for asArray and throw exception then.

@cebe

cebe Nov 23, 2013

Member

better check for asArray and throw exception then.

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

These methods could be called in any order so need 2 checks in this case.

@samdark

samdark Nov 23, 2013

Member

These methods could be called in any order so need 2 checks in this case.

extensions/sphinx/ActiveQuery.php
+ * Warning: this option should NOT be used with [[asArray]] at the same time!
+ * @return static the query object itself
+ */
+ public function snippetByModel()

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

From the description and method name it is not clear what it actually does. Should it be named like withSnippet?

@samdark

samdark Nov 23, 2013

Member

From the description and method name it is not clear what it actually does. Should it be named like withSnippet?

This comment has been minimized.

@klimov-paul

klimov-paul Nov 23, 2013

Member

There is a "snippetCallback", which determines the way to retrieve snippet source.
This method usees AR internal method to compose suhc callback.
Perhaps it could be named otherwise, but "withSnippet" sounds as much confusing.

@klimov-paul

klimov-paul Nov 23, 2013

Member

There is a "snippetCallback", which determines the way to retrieve snippet source.
This method usees AR internal method to compose suhc callback.
Perhaps it could be named otherwise, but "withSnippet" sounds as much confusing.

extensions/sphinx/ActiveQuery.php
+ */
+ public function snippetByModel()
+ {
+ $this->snippetCallback(array($this, 'fetchSnippetSourceFromModels'));

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

Short array syntax should be used.

@samdark

samdark Nov 23, 2013

Member

Short array syntax should be used.

extensions/sphinx/ActiveRecord.php
+ * (because another user has modified the data), a [[StaleObjectException]] exception will be thrown,
+ * and the update or deletion is skipped.
+ *
+ * Optimized locking is only supported by [[update()]] and [[delete()]].

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

Optimized → Optimistic?

@samdark

samdark Nov 23, 2013

Member

Optimized → Optimistic?

extensions/sphinx/Schema.php
+ if (($value = $this->db->pdo->quote($str)) !== false) {
+ return $value;
+ } else { // the driver doesn't support quote (e.g. oci)
+ return "'" . addcslashes(str_replace("'", "''", $str), "\000\n\r\\\032") . "'";

This comment has been minimized.

@samdark

samdark Nov 23, 2013

Member

Why do we need it if Sphinx work with only a single driver that is pdo_mysql?

@samdark

samdark Nov 23, 2013

Member

Why do we need it if Sphinx work with only a single driver that is pdo_mysql?

@klimov-paul

This comment has been minimized.

Show comment
Hide comment
@klimov-paul

klimov-paul Nov 23, 2013

Member

Code style and docs fixed.

Member

klimov-paul commented Nov 23, 2013

Code style and docs fixed.

@samdark

This comment has been minimized.

Show comment
Hide comment
@samdark

samdark Nov 24, 2013

Member

I think it is ready enough to get it into alpha.

Member

samdark commented Nov 24, 2013

I think it is ready enough to get it into alpha.

qiangxue added a commit that referenced this pull request Nov 24, 2013

Merge pull request #1259 from klimov-paul/sphinx
Sphinx fulltext search engine integration

@qiangxue qiangxue merged commit 13f6a11 into yiisoft:master Nov 24, 2013

1 check failed

default The Travis CI build could not complete due to an error
Details
@qiangxue

This comment has been minimized.

Show comment
Hide comment
Member

qiangxue commented Nov 24, 2013

Thanks @klimov-paul !

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Nov 24, 2013

Member

btw: how do we decide on which goes into extension and which goes into the core?
redis is in the core and I am working on elasticsearch to be there. why is sphinx an extension?

Member

cebe commented Nov 24, 2013

btw: how do we decide on which goes into extension and which goes into the core?
redis is in the core and I am working on elasticsearch to be there. why is sphinx an extension?

@samdark

This comment has been minimized.

Show comment
Hide comment
@samdark

samdark Nov 24, 2013

Member

I think elastic and maybe redis should be in extensions as well.

Member

samdark commented Nov 24, 2013

I think elastic and maybe redis should be in extensions as well.

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Nov 24, 2013

Member

redis provides Connection class that is used by the RedisCache, not sure how to deal with this.
Connection and AR are in the same namespace. Would be a problem for the autoloader to load classes from 2 directories....

There is nothing that depends on elasticsearch so it can go into extensions easily. Also it may require http library like guzzle would be useful to have a composer.json for that and yii core does not need it.

Member

cebe commented Nov 24, 2013

redis provides Connection class that is used by the RedisCache, not sure how to deal with this.
Connection and AR are in the same namespace. Would be a problem for the autoloader to load classes from 2 directories....

There is nothing that depends on elasticsearch so it can go into extensions easily. Also it may require http library like guzzle would be useful to have a composer.json for that and yii core does not need it.

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Nov 25, 2013

Member

moved redis to extensions.

Member

cebe commented Nov 25, 2013

moved redis to extensions.

@klimov-paul klimov-paul deleted the klimov-paul:sphinx branch Dec 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment