-
Notifications
You must be signed in to change notification settings - Fork 150
solr not working on integers? #9
Comments
Ok thanks, I'll take a look |
I think I found the solution to this. Take this with a grain of salt, I'm new to Lucandra/Solr. We had the same problem with long and used slong data type instead and that fixed the problem. So, try changing the popularity field data type to sint in your schema.xml file. |
Is this still happening? |
I'm having similar issues (with long and double instead of int though). Range queries not working... slong in solr might be the walk arround as sdonelow commented, but I prefer not using solr in my project... |
Hey guys. I'm assuming you're still having this issue? I'm trying to sort it out, and it appears to be functionally impossible with the current implementation. Basically, the number bits of the data type is right shifted 4 bits at a time. The first byte then holds the number of bits shifted off. You can view the logic here for creating the trie structures. http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/NumericRangeQuery.html This allows for faster range scanning and in makes seeks faster. However according to the IndexReader spec here, "If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term." The current implementation does not do this, it merely returns no results since it scans over 2 keys, then returns only 2 key spaces and returns an empty result set. I'm looking into trying to rectify this problem. Correcting it may involve reading far more than 2 keys initially, so it will not be a very efficient operation. |
I've created a simple unit test that mimics how keys are written with my own IndexReader and TermEnum here. http://github.com/tnine/Lucandra/blob/master/test/lucandra/BytesOrderingEnumTest.java It doesn't complete because I haven't implemented all the document scoring. However, it does correctly identify records to enumerate over when no prefix is present. If you uncomment my commented lines, you will see the byte comparator used no longer seeks to the correct index when index\docfield prefixes are used. Therefore, something isn't quite right with the prefix and the byte ordering. I just can't put my finger on it, since all the prefix bytes should be the same in common fields, and hence irrelevant in the byte comparison up to the first byte in the trie structure. |
After much digging this appears to be an encoding issue with thrift and batch mutate itself. The issue and corresponding unit test is here. |
Is sorting on integer/float fields supported in solr-cassandra? I tried the below query on the index of example docs. But did not get results in correct order I have tried changing field type to "sint" "tint" but no success. Sorting on string field type works though. Any suggestion on how to fix sorting issue for integer and float? |
see the underlying bug. We can't properly encode any numeric fields, as a result, you can't perform sorting on them. Until Cassandra fixes this issue, no numeric field searching/sorting will work. |
Shouldn't sorting work in such a case? I'm confused on how the cassandra bug impacts sorting while we can fetch the stored data correctly from the index. |
I could be wrong in how solr stores and retrieves indexes. However I know I'm accurate in stating that we currently can't store numeric values in Cassandra correctly/consistently. Run my test cases and you'll see exactly what I mean. You will occasionally get correct behavior as the encoding problem does not present itself with all values. It seems to depend on the byte value that is stored. This fix was bumped from 0.6.4 to 0.6.5, so it doesn't seem to be getting fixed anytime soon. Check out the Solr code, and see if it's using numeric values in the underlying fields. If it is, you can't use it until the Cassandra bug is fixed. |
Actually, cassandra guys decided to ditch String keys for byte[], this will fix the issue, I assume its going in 0.6.5 but you can see it now in trunk. |
tjake, what does this mean "ditch String keys for byte[]"? |
Currently all keys in Cassandra are UTF8 strings. This has been removed in favor of using native bytes in the new version. This should eliminate the issues we see with shifting 7 bits of numeric types into the lower 7 bits of a UTF8 byte. Hence removing the limitation of numeric fields in Lucandra. Note that this will require a decent amount of rework of Lucandra, but I plan on doing that as soon at 0.7 is release since we really need numeric functionality. |
Just an fyi guys. This has been fixed in release 0.6.5 of Cassandra, so numeric fields should now work. |
fixed in cassandra 0.6.5 |
I posted the default xmls solr examples xml's
I tried to sort on the int field, it's not working "sort=popularity desc"
range queries on the int field is not working also, popularity:[0 TO 10]
The text was updated successfully, but these errors were encountered: