Feature/solrcloud #138

upayavira · 2014-11-27T10:24:20Z

An extension to pysolr to make it Zookeeper/SolrCloud aware. This is cloned from the code in the SolrJ client.
The tests are limited to proving that this does not break existing functionality, although I have tested (manually) that it does correctly failover between nodes when a node in a cluster fails.

Commit Checklist

Test coverage for ZooKeeper / SolrCloud error states
Add a Travis test matrix which runs without Kazoo installed to confirm that nothing breaks for traditional usage (the SolrCloud tests are supposed to be skipped)
Support SolrCloud 5 and have a Travis test matrix entry for both major versions
Add test that confirms that Pysolr fails-over correctly when one of the Solr nodes disappears (can be simulated with kill -STOP and kill -CONT)

danizen · 2015-03-31T15:07:22Z

+1 - I need this in my environment.

upayavira · 2015-11-12T00:14:36Z

Any suggestions as to what is holding this patch back from a merge? Thx!

(I notice there's conflicts which I will resolve)

mmroden · 2016-01-26T15:08:09Z

I would really like this feature to be merged in; what needs to be done to make that happen, apart from conflict resolution?

acdha · 2016-01-26T15:48:19Z

Beyond conflict cleanup & a quick style review, the main thing I'd want to see would be updating the test runner so it continues to test regular Solr along with running an instance with SolrCloud enabled.

The install packaging also might want a little review – this was rolled into the Tomcat target but seems like it should be a separate one, particularly since the Solr project has itself deprecated deployment on Tomcat.

acdha · 2016-01-26T15:48:40Z

That said, if you have time to work on this, I can review & ship a release.

upayavira · 2016-01-26T16:11:49Z

I can look into that, @acdha

acdha · 2016-01-26T18:19:48Z

@upayavira awesome, thanks!

upayavira · 2016-02-09T22:24:02Z

@acdha I've made some reasonably substantial changes to this PR. Basically, instead of starting Solr and getting on with it, we now need the tests to start/stop the correct type of Solr, so I improved your start-solr-test-server.sh script to prepare and start both non-cloud and cloud instances, and then extended the tests to make them able to start the right Solr in their setupClass method.

By extending the SolrTestCase with a SolrCloudTestCase, I was able to have most of your existing tests run a second time against a solrcloud instance.

The tests seem to consistently pass, but for some reason seem to hang at the end, for reasons I have yet to identify.

I am considering rewriting start-solr-test-server.sh in Python, but more importantly I want/need a test that hits Solr repeatedly whilst killing one of the two nodes that contains the collection being queried. Note, it starts three Solr nodes, one for Zookeeper and two to host collections. This means we can kill either of the collection nodes without disrupting Zookeeper.

Please let me know what you make of this (rather substantial) PR. Comments/suggestions welcome, before I start writing the above, more complex, test.

mmroden · 2016-02-10T16:31:01Z

For what it's worth, the tests work on my machine (macos 10.10.5) once I change to the feature/solrcloud branch and I install kazoo. I have these messages, but I think they're in kazoo, not pysolr:

Waiting for simple-solr ----No handlers could be found for logger "kazoo.client"
No handlers could be found for logger "kazoo.client"

This follows the capitalization used on http://zookeeper.apache.org/

* Don't force string interpolation for performance and compatibility with logging tools like Raven/Sentry * Pass full exceptions to logging * Remove unused variables

* Remove backwards-compatibility imports * Rename tests so unittest2 discovery works out of the box * PEP-8 import sorting & whitespace

This avoids leaking file handles until the end of the test run

acdha · 2016-02-10T21:39:03Z

@upayavira I've made some changes in upayavira#1. I definitely agree that we're probably close to the point where we should just port the server launcher to Python since we have a bunch of daemons to manage now and will want things like timeouts for hard-kills on the various processes.

* Remove unused BACKGROUND_SOLR option * prepare stops possible stale instances on startup * More detailed progress for Solr process management * Consistent ZooKeeper name * run-tests now includes output for major events * More consistent indentation

The call signature changes from Kazoo 2.0 to 2.2 and since we don’t care either way we can future-proof it.

upayavira · 2016-02-15T16:14:37Z

I presume by 100% test coverage you mean that all tests pass? That should now be the case. I also raise a few more explicit SolrErrors in there too. Let me know what you think.

acdha · 2016-02-16T14:30:01Z

@upayavira I was thinking that we have 100% coverage of the new code by the test suite. It looked like most of the remaining gaps were in error-handling, which is definitely easy to miss in future changes.

@upayavira

This optionally adds support for SolrCloud using the Kazoo client library. Thanks to @upayavira

acdha · 2016-02-16T15:43:54Z

master now has a few minor updates, the most notable being acdha@d23de58 to ensure that the byte-strings containing JSON are decoded first before passing them to json.loads

acdha · 2016-02-16T22:41:02Z

acdha@cbc07af adds an env: option to Travis CI so the tests are run with and without Kazoo installed, with the ~32 SolrCloud tests skipped in the latter case.

upayavira · 2016-02-26T11:27:07Z

I have some initial/prototype code that should work with both 4.x and 5.x for SolrCloud, and also implements some of the Collections API for creating/deleting collections/etc. I'll post it as a fresh PR once I've played with it a bit more. Really though, as with the above code, it will need some decent tests around it to prove it behaves properly in failover scenarios.

mmroden · 2016-03-07T19:59:13Z

Hi all,

Where do we stand on this? Is there a sense that pysolr 4.0 will be coming out soon-ish (ie, next two weeks), or should I go ahead and use this branch as a one-off in a private pypi?

Thanks!

vvolkman · 2016-03-08T20:45:58Z

+1 I really need the custom search_handler in a released version. Tired of pulling from the tip instead of the pip. thanks in advance!

acdha · 2016-03-08T21:01:49Z

I don't think we're ready for 4.0 yet unless someone has time to help add tests – in particular, it'd be good to start testing with Solr 5 as well as 4.x.

upayavira · 2016-03-08T21:18:56Z

I have a test for failover which seems to work (it passes). I need to add one in which both nodes are down and confirm that it fails.

This code WILL NOT work with Solr 5.x. I've got as far as working out exactly what we need to do (and have some hacked, unpublishable code), but just need to find the time to make it happen.

danizen · 2016-03-08T23:29:54Z

Upayavira, can you expand on where it won't work? I have at work pysolr
working with Solr 5 in a non-cloud mode, and I'm anticipating a switch to
cloud.

I have a test for failover which seems to work (it passes). I need to add
one in which both nodes are down and confirm that it fails.

This code WILL NOT work with Solr 5.x. I've got as far as working out
exactly what we need to do (and have some hacked, unpublishable code), but
just need to find the time to make it happen.

—
Reply to this email directly or view it on GitHub
#138 (comment).

upayavira · 2016-03-08T23:33:11Z

When Solr 4.0 released, SolrCloud stored its cluster information in clusterstate.json - every collection in a single file in Zookeeper. This worked, but people started building clusters with 1000s of collections, leading to an unmanageable file. Therefore, somewhere around the 5.0 mark it switched to having a per-collection state.json file. This allows a zookeeper client to follow just the collections it is interested in.

I don't see any reason why the client cannot be coded to handle both setups. We would need a way to test against both, though, which might require two different Solr installs to run our tests against.

The consequence is that you will get "collection not found" errors, even though you know the collection exists (pysolr looks for clusterstate.json but its state is actually in /collections/$COLLECTION/state.json).

upayavira · 2016-03-10T22:34:04Z

See #187 for further work here. Should we close this PR now?

acdha · 2016-03-10T22:35:46Z

@upayavira We have 3 unchecked tests in the todo list – looking like they might better as separate issues since some of them have a fair amount of work?

upayavira · 2016-03-10T22:48:23Z

Test coverage for ZooKeeper / SolrCloud error states

If you can clarify what you are looking at here, I can look into that.

Support SolrCloud 5 and have a Travis test matrix entry for both major versions
Solr 5.x support is in PR #187 .

I plan to pull the start-solr-test-server.sh for 4.10, rename it start-solr-test-server-4.10.1.sh, and add a way that run-tests.py can use one or the other. That'll mean we can have a Travis run for each version, both using the latest (5.x compatible) code.

Add test that confirms that Pysolr fails-over correctly when one of the Solr nodes disappears (can be simulated with kill -STOP and kill -CONT)

This test is, again, in PR #187 . I accept that I should have separated PR 187 into three PRs, but they kinda all happened at the same time. If you want them separating, let me know and I'll see how possible that is.

acdha · 2016-03-11T12:17:29Z

@upayavira The main thing I was thinking is confirming that we do something reasonable when ZooKeeper itself has either failed outright or is unresponsive. I think that'd just be a question of making sure our timeouts are used and that something useful is logged.

Since you have a test in #187 I checked the failover tests off the list here.

upayavira · 2016-03-11T13:32:12Z

I've been working all morning on the failover tests. Getting them right isn't easy. It seems that a kill -17 does not cause a change the live_nodes entry in ZooKeeper, so we cannot use a kill -17.

Test scenarios:

Does it handle when one replica is down for a shard of a collection?
Does it fail gracefully when there are no nodes available for a shard/collection?
Does it fail gracefully if it cannot connect to ZK, and recover once ZK comes back?

These are non-trivial tests. The code under test is WAY simpler. I'm mulling now on how to implement them, but I'm open to suggestions,

acdha · 2016-03-11T14:33:06Z

One thing I've been wondering is whether it makes sense to merge the work-in-progress here and postpone some of the more involved tests or simply defer them entirely to Kazoo or the upstream ZK/Solr projects and instead just use something like mock.patch to confirm that our logging code, etc. works when the important KazooClient methods raise exceptions.

upayavira · 2016-03-11T22:41:31Z

See code in #187 , just committed. I believe it does the job for the above three test scenarios. It turns out it was important to get the tests right, as they did show up one particular bug (not readding a watch when a watch is triggered).

All that is left is support for both 4.x and 5.x. And, by comparison, that is seriously trivial!

upayavira · 2016-03-14T14:15:41Z

I've now got code that works against both 4.x and 5.x. You can set an environment variable to tell it which startup script to use (which version of start-solr-test-server.sh), i.e. the 4.x or the 5.x one. Thus, it should be possible to keep a version of each running in Travis.

This also involved redoing the retry logic somewhat. Solr 4.10 wasn't updating its cluster state as fast as it should, so I made it, by default, retry every 0.02s, and retry 20 times. I saw maybe 4 or 5 retries needed in my tests. These can be overridden on the SolrCloud constructor. I also needed to call random.seed() as random.choice was giving me the same value every time.

I will commit this work against #187 as soon as I get a next moment.

acdha · 2016-03-14T14:46:57Z

@upayavira that's excellent news!

upayavira · 2016-03-14T20:19:51Z

Moments have been far apart, but #187 now has my latest changes. You can run the tests against 5.5 with:
./run-tests.py
or against 4.10.1 with:
PYSOLR_STARTER=./start-solr-test-server-4x.sh ./run-tests.py
Add both of these to Travis, and you should have good test coverage.

stale · 2018-06-05T14:16:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mylanium added a commit to mylanium/pysolr that referenced this pull request Sep 3, 2015

Pulled django-haystack#138 Feature/solrcloud changes

0c8b5cf

upayavira force-pushed the feature/solrcloud branch from 55a81c3 to 0d62ba4 Compare February 2, 2016 12:52

upayavira force-pushed the feature/solrcloud branch from 0d62ba4 to 0dcc4a8 Compare February 9, 2016 22:15

Upayavira and others added 14 commits February 10, 2016 13:55

Initial pass at a SolrCloud aware pysolr

7329916

Update README

6c6940c

Add kazoo dependency to setup.py

d07266a

Get decent tests in place for SolrCloud as well as non-cloud case

00c1522

Make kazoo import optional

8d72133

s/Zookeeper/ZooKeeper/g

362ef2e

This follows the capitalization used on http://zookeeper.apache.org/

Remove duplicate SolrCloudTestCase

aba7c7b

Enable SolrCloud tests

d2455c6

PEP-8

a26945f

SolrCloud logging style

e96593b

* Don't force string interpolation for performance and compatibility with logging tools like Raven/Sentry * Pass full exceptions to logging * Remove unused variables

Update for Python 2.6 deprecation

a893ae5

* Remove backwards-compatibility imports * Rename tests so unittest2 discovery works out of the box * PEP-8 import sorting & whitespace

Have test_cloud.SolrCloudTestCase start the right Solr

7900b8f

test solr logging adjustments

0cb19fc

This avoids leaking file handles until the end of the test run

Change test_cloud to use the right ZooKeeper port

650e094

acdha added 3 commits February 10, 2016 16:50

Update Solr test launcher

8555143

* Remove unused BACKGROUND_SOLR option * prepare stops possible stale instances on startup * More detailed progress for Solr process management * Consistent ZooKeeper name * run-tests now includes output for major events * More consistent indentation

ZooKeeper: use new-style classes on Python 2

b96053f

ZooKeeper: update watchClusterState function signature

93d1acd

The call signature changes from Kazoo 2.0 to 2.2 and since we don’t care either way we can future-proof it.

Log everything into the /logs directory

96d3b99

acdha added a commit to acdha/pysolr that referenced this pull request Feb 16, 2016

New: SolrCloud support (see django-haystack#138)

366f14d

This optionally adds support for SolrCloud using the Kazoo client library. Thanks to @upayavira

acdha added this to the v4.0.0 milestone Feb 16, 2016

acdha added the feature label Feb 16, 2016

acdha self-assigned this Feb 16, 2016

acdha mentioned this pull request May 27, 2016

Recent introduction of __del__ exposes a bug in Python 3.4 when requests library is used #193

Closed

stale bot added the stale label Jun 5, 2018

stale bot closed this Jul 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/solrcloud #138

Feature/solrcloud #138

upayavira commented Nov 27, 2014

danizen commented Mar 31, 2015

upayavira commented Nov 12, 2015

mmroden commented Jan 26, 2016

acdha commented Jan 26, 2016

acdha commented Jan 26, 2016

upayavira commented Jan 26, 2016

acdha commented Jan 26, 2016

upayavira commented Feb 9, 2016

mmroden commented Feb 10, 2016

acdha commented Feb 10, 2016

upayavira commented Feb 15, 2016

acdha commented Feb 16, 2016

acdha commented Feb 16, 2016

acdha commented Feb 16, 2016

upayavira commented Feb 26, 2016

mmroden commented Mar 7, 2016

vvolkman commented Mar 8, 2016

acdha commented Mar 8, 2016

upayavira commented Mar 8, 2016

danizen commented Mar 8, 2016

upayavira commented Mar 8, 2016

upayavira commented Mar 10, 2016

acdha commented Mar 10, 2016

upayavira commented Mar 10, 2016

acdha commented Mar 11, 2016

upayavira commented Mar 11, 2016

acdha commented Mar 11, 2016

upayavira commented Mar 11, 2016

upayavira commented Mar 14, 2016

acdha commented Mar 14, 2016

upayavira commented Mar 14, 2016

stale bot commented Jun 5, 2018

Feature/solrcloud #138

Feature/solrcloud #138

Conversation

upayavira commented Nov 27, 2014

Commit Checklist

danizen commented Mar 31, 2015

upayavira commented Nov 12, 2015

mmroden commented Jan 26, 2016

acdha commented Jan 26, 2016

acdha commented Jan 26, 2016

upayavira commented Jan 26, 2016

acdha commented Jan 26, 2016

upayavira commented Feb 9, 2016

mmroden commented Feb 10, 2016

acdha commented Feb 10, 2016

upayavira commented Feb 15, 2016

acdha commented Feb 16, 2016

acdha commented Feb 16, 2016

acdha commented Feb 16, 2016

upayavira commented Feb 26, 2016

mmroden commented Mar 7, 2016

vvolkman commented Mar 8, 2016

acdha commented Mar 8, 2016

upayavira commented Mar 8, 2016

danizen commented Mar 8, 2016

upayavira commented Mar 8, 2016

upayavira commented Mar 10, 2016

acdha commented Mar 10, 2016

upayavira commented Mar 10, 2016

acdha commented Mar 11, 2016

upayavira commented Mar 11, 2016

acdha commented Mar 11, 2016

upayavira commented Mar 11, 2016

upayavira commented Mar 14, 2016

acdha commented Mar 14, 2016

upayavira commented Mar 14, 2016

stale bot commented Jun 5, 2018