solr not working on integers? #9

eurospy · 2010-04-22T13:19:41Z

I posted the default xmls solr examples xml's
I tried to sort on the int field, it's not working "sort=popularity desc"
range queries on the int field is not working also, popularity:[0 TO 10]

tjake · 2010-04-25T18:09:57Z

Ok thanks, I'll take a look

sdonelow · 2010-04-28T21:21:27Z

I think I found the solution to this. Take this with a grain of salt, I'm new to Lucandra/Solr. We had the same problem with long and used slong data type instead and that fixed the problem. So, try changing the popularity field data type to sint in your schema.xml file.

tjake · 2010-05-27T01:27:02Z

Is this still happening?

leoz-xx · 2010-06-01T09:36:15Z

I'm having similar issues (with long and double instead of int though). Range queries not working... slong in solr might be the walk arround as sdonelow commented, but I prefer not using solr in my project...

tnine · 2010-06-27T08:09:48Z

Hey guys. I'm assuming you're still having this issue? I'm trying to sort it out, and it appears to be functionally impossible with the current implementation. Basically, the number bits of the data type is right shifted 4 bits at a time. The first byte then holds the number of bits shifted off. You can view the logic here for creating the trie structures.

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/NumericRangeQuery.html

This allows for faster range scanning and in makes seeks faster. However according to the IndexReader spec here,

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#terms(org.apache.lucene.index.Term)

"If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term."

The current implementation does not do this, it merely returns no results since it scans over 2 keys, then returns only 2 key spaces and returns an empty result set. I'm looking into trying to rectify this problem. Correcting it may involve reading far more than 2 keys initially, so it will not be a very efficient operation.

tnine · 2010-06-27T09:14:46Z

I've created a simple unit test that mimics how keys are written with my own IndexReader and TermEnum here.

http://github.com/tnine/Lucandra/blob/master/test/lucandra/BytesOrderingEnumTest.java

It doesn't complete because I haven't implemented all the document scoring. However, it does correctly identify records to enumerate over when no prefix is present. If you uncomment my commented lines, you will see the byte comparator used no longer seeks to the correct index when index\docfield prefixes are used. Therefore, something isn't quite right with the prefix and the byte ordering. I just can't put my finger on it, since all the prefix bytes should be the same in common fields, and hence irrelevant in the byte comparison up to the first byte in the trie structure.

tnine · 2010-06-28T03:44:16Z

After much digging this appears to be an encoding issue with thrift and batch mutate itself. The issue and corresponding unit test is here.

https://issues.apache.org/jira/browse/CASSANDRA-1235

tmahesh · 2010-07-24T06:18:47Z

Is sorting on integer/float fields supported in solr-cassandra?

I tried the below query on the index of example docs. But did not get results in correct order
http://localhost:8983/solr/select/?q=cat:electronics&sort=price%20asc

I have tried changing field type to "sint" "tint" but no success. Sorting on string field type works though. Any suggestion on how to fix sorting issue for integer and float?

tnine · 2010-07-26T11:07:33Z

see the underlying bug. We can't properly encode any numeric fields, as a result, you can't perform sorting on them. Until Cassandra fixes this issue, no numeric field searching/sorting will work.

tmahesh · 2010-07-26T12:01:49Z

We can store integer/float data and fetch it out correctly (i.e., price filed fetched from the index is as it was stored)
From what i understand, sorting of result set happens inside solr indexsearcher

Shouldn't sorting work in such a case?

I'm confused on how the cassandra bug impacts sorting while we can fetch the stored data correctly from the index.

tnine · 2010-07-27T19:28:18Z

I could be wrong in how solr stores and retrieves indexes. However I know I'm accurate in stating that we currently can't store numeric values in Cassandra correctly/consistently. Run my test cases and you'll see exactly what I mean. You will occasionally get correct behavior as the encoding problem does not present itself with all values. It seems to depend on the byte value that is stored. This fix was bumped from 0.6.4 to 0.6.5, so it doesn't seem to be getting fixed anytime soon. Check out the Solr code, and see if it's using numeric values in the underlying fields. If it is, you can't use it until the Cassandra bug is fixed.

tjake · 2010-07-27T23:24:53Z

Actually, cassandra guys decided to ditch String keys for byte[], this will fix the issue, I assume its going in 0.6.5 but you can see it now in trunk.

sdonelow · 2010-07-28T13:31:00Z

tjake, what does this mean "ditch String keys for byte[]"?

tnine · 2010-07-29T22:28:08Z

Currently all keys in Cassandra are UTF8 strings. This has been removed in favor of using native bytes in the new version. This should eliminate the issues we see with shifting 7 bits of numeric types into the lower 7 bits of a UTF8 byte. Hence removing the limitation of numeric fields in Lucandra. Note that this will require a decent amount of rework of Lucandra, but I plan on doing that as soon at 0.7 is release since we really need numeric functionality.

tnine · 2010-09-19T20:42:45Z

Just an fyi guys. This has been fixed in release 0.6.5 of Cassandra, so numeric fields should now work.

tjake · 2010-10-02T01:51:40Z

fixed in cassandra 0.6.5

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solr not working on integers? #9

solr not working on integers? #9

eurospy commented Apr 22, 2010

tjake commented Apr 25, 2010

sdonelow commented Apr 28, 2010

tjake commented May 27, 2010

leoz-xx commented Jun 1, 2010

tnine commented Jun 27, 2010

tnine commented Jun 27, 2010

tnine commented Jun 28, 2010

tmahesh commented Jul 24, 2010

tnine commented Jul 26, 2010

tmahesh commented Jul 26, 2010

tnine commented Jul 27, 2010

tjake commented Jul 27, 2010

sdonelow commented Jul 28, 2010

tnine commented Jul 29, 2010

tnine commented Sep 19, 2010

tjake commented Oct 2, 2010

solr not working on integers? #9

solr not working on integers? #9

Comments

eurospy commented Apr 22, 2010

tjake commented Apr 25, 2010

sdonelow commented Apr 28, 2010

tjake commented May 27, 2010

leoz-xx commented Jun 1, 2010

tnine commented Jun 27, 2010

tnine commented Jun 27, 2010

tnine commented Jun 28, 2010

tmahesh commented Jul 24, 2010

tnine commented Jul 26, 2010

tmahesh commented Jul 26, 2010

tnine commented Jul 27, 2010

tjake commented Jul 27, 2010

sdonelow commented Jul 28, 2010

tnine commented Jul 29, 2010

tnine commented Sep 19, 2010

tjake commented Oct 2, 2010