New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/solrcloud #138
Feature/solrcloud #138
Conversation
+1 - I need this in my environment. |
Any suggestions as to what is holding this patch back from a merge? Thx! (I notice there's conflicts which I will resolve) |
I would really like this feature to be merged in; what needs to be done to make that happen, apart from conflict resolution? |
Beyond conflict cleanup & a quick style review, the main thing I'd want to see would be updating the test runner so it continues to test regular Solr along with running an instance with SolrCloud enabled. The install packaging also might want a little review – this was rolled into the Tomcat target but seems like it should be a separate one, particularly since the Solr project has itself deprecated deployment on Tomcat. |
That said, if you have time to work on this, I can review & ship a release. |
I can look into that, @acdha |
@upayavira awesome, thanks! |
55a81c3
to
0d62ba4
Compare
0d62ba4
to
0dcc4a8
Compare
@acdha I've made some reasonably substantial changes to this PR. Basically, instead of starting Solr and getting on with it, we now need the tests to start/stop the correct type of Solr, so I improved your start-solr-test-server.sh script to prepare and start both non-cloud and cloud instances, and then extended the tests to make them able to start the right Solr in their setupClass method. By extending the SolrTestCase with a SolrCloudTestCase, I was able to have most of your existing tests run a second time against a solrcloud instance. The tests seem to consistently pass, but for some reason seem to hang at the end, for reasons I have yet to identify. I am considering rewriting start-solr-test-server.sh in Python, but more importantly I want/need a test that hits Solr repeatedly whilst killing one of the two nodes that contains the collection being queried. Note, it starts three Solr nodes, one for Zookeeper and two to host collections. This means we can kill either of the collection nodes without disrupting Zookeeper. Please let me know what you make of this (rather substantial) PR. Comments/suggestions welcome, before I start writing the above, more complex, test. |
For what it's worth, the tests work on my machine (macos 10.10.5) once I change to the feature/solrcloud branch and I install kazoo. I have these messages, but I think they're in kazoo, not pysolr:
|
This follows the capitalization used on http://zookeeper.apache.org/
* Don't force string interpolation for performance and compatibility with logging tools like Raven/Sentry * Pass full exceptions to logging * Remove unused variables
* Remove backwards-compatibility imports * Rename tests so unittest2 discovery works out of the box * PEP-8 import sorting & whitespace
This avoids leaking file handles until the end of the test run
@upayavira I've made some changes in upayavira#1. I definitely agree that we're probably close to the point where we should just port the server launcher to Python since we have a bunch of daemons to manage now and will want things like timeouts for hard-kills on the various processes. |
* Remove unused BACKGROUND_SOLR option * prepare stops possible stale instances on startup * More detailed progress for Solr process management * Consistent ZooKeeper name * run-tests now includes output for major events * More consistent indentation
The call signature changes from Kazoo 2.0 to 2.2 and since we don’t care either way we can future-proof it.
I presume by 100% test coverage you mean that all tests pass? That should now be the case. I also raise a few more explicit SolrErrors in there too. Let me know what you think. |
@upayavira I was thinking that we have 100% coverage of the new code by the test suite. It looked like most of the remaining gaps were in error-handling, which is definitely easy to miss in future changes. |
This optionally adds support for SolrCloud using the Kazoo client library. Thanks to @upayavira
master now has a few minor updates, the most notable being acdha@d23de58 to ensure that the byte-strings containing JSON are decoded first before passing them to |
acdha@cbc07af adds an |
I have some initial/prototype code that should work with both 4.x and 5.x for SolrCloud, and also implements some of the Collections API for creating/deleting collections/etc. I'll post it as a fresh PR once I've played with it a bit more. Really though, as with the above code, it will need some decent tests around it to prove it behaves properly in failover scenarios. |
Hi all, Where do we stand on this? Is there a sense that pysolr 4.0 will be coming out soon-ish (ie, next two weeks), or should I go ahead and use this branch as a one-off in a private pypi? Thanks! |
+1 I really need the custom search_handler in a released version. Tired of pulling from the tip instead of the pip. thanks in advance! |
I don't think we're ready for 4.0 yet unless someone has time to help add tests – in particular, it'd be good to start testing with Solr 5 as well as 4.x. |
I have a test for failover which seems to work (it passes). I need to add one in which both nodes are down and confirm that it fails. This code WILL NOT work with Solr 5.x. I've got as far as working out exactly what we need to do (and have some hacked, unpublishable code), but just need to find the time to make it happen. |
Upayavira, can you expand on where it won't work? I have at work pysolr I have a test for failover which seems to work (it passes). I need to add This code WILL NOT work with Solr 5.x. I've got as far as working out — |
When Solr 4.0 released, SolrCloud stored its cluster information in clusterstate.json - every collection in a single file in Zookeeper. This worked, but people started building clusters with 1000s of collections, leading to an unmanageable file. Therefore, somewhere around the 5.0 mark it switched to having a per-collection state.json file. This allows a zookeeper client to follow just the collections it is interested in. I don't see any reason why the client cannot be coded to handle both setups. We would need a way to test against both, though, which might require two different Solr installs to run our tests against. The consequence is that you will get "collection not found" errors, even though you know the collection exists (pysolr looks for clusterstate.json but its state is actually in /collections/$COLLECTION/state.json). |
See #187 for further work here. Should we close this PR now? |
@upayavira We have 3 unchecked tests in the todo list – looking like they might better as separate issues since some of them have a fair amount of work? |
If you can clarify what you are looking at here, I can look into that.
I plan to pull the start-solr-test-server.sh for 4.10, rename it start-solr-test-server-4.10.1.sh, and add a way that run-tests.py can use one or the other. That'll mean we can have a Travis run for each version, both using the latest (5.x compatible) code.
This test is, again, in PR #187 . I accept that I should have separated PR 187 into three PRs, but they kinda all happened at the same time. If you want them separating, let me know and I'll see how possible that is. |
@upayavira The main thing I was thinking is confirming that we do something reasonable when ZooKeeper itself has either failed outright or is unresponsive. I think that'd just be a question of making sure our timeouts are used and that something useful is logged. Since you have a test in #187 I checked the failover tests off the list here. |
I've been working all morning on the failover tests. Getting them right isn't easy. It seems that a kill -17 does not cause a change the live_nodes entry in ZooKeeper, so we cannot use a kill -17. Test scenarios:
These are non-trivial tests. The code under test is WAY simpler. I'm mulling now on how to implement them, but I'm open to suggestions, |
One thing I've been wondering is whether it makes sense to merge the work-in-progress here and postpone some of the more involved tests or simply defer them entirely to Kazoo or the upstream ZK/Solr projects and instead just use something like |
See code in #187 , just committed. I believe it does the job for the above three test scenarios. It turns out it was important to get the tests right, as they did show up one particular bug (not readding a watch when a watch is triggered). All that is left is support for both 4.x and 5.x. And, by comparison, that is seriously trivial! |
I've now got code that works against both 4.x and 5.x. You can set an environment variable to tell it which startup script to use (which version of start-solr-test-server.sh), i.e. the 4.x or the 5.x one. Thus, it should be possible to keep a version of each running in Travis. This also involved redoing the retry logic somewhat. Solr 4.10 wasn't updating its cluster state as fast as it should, so I made it, by default, retry every 0.02s, and retry 20 times. I saw maybe 4 or 5 retries needed in my tests. These can be overridden on the SolrCloud constructor. I also needed to call random.seed() as random.choice was giving me the same value every time. I will commit this work against #187 as soon as I get a next moment. |
@upayavira that's excellent news! |
Moments have been far apart, but #187 now has my latest changes. You can run the tests against 5.5 with: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
An extension to pysolr to make it Zookeeper/SolrCloud aware. This is cloned from the code in the SolrJ client.
The tests are limited to proving that this does not break existing functionality, although I have tested (manually) that it does correctly failover between nodes when a node in a cluster fails.
Commit Checklist