Skip to content
This repository has been archived by the owner on Nov 22, 2017. It is now read-only.

solr not working on integers? #9

Closed
eurospy opened this issue Apr 22, 2010 · 16 comments
Closed

solr not working on integers? #9

eurospy opened this issue Apr 22, 2010 · 16 comments

Comments

@eurospy
Copy link

eurospy commented Apr 22, 2010

I posted the default xmls solr examples xml's
I tried to sort on the int field, it's not working "sort=popularity desc"
range queries on the int field is not working also, popularity:[0 TO 10]

@tjake
Copy link
Owner

tjake commented Apr 25, 2010

Ok thanks, I'll take a look

@sdonelow
Copy link

I think I found the solution to this. Take this with a grain of salt, I'm new to Lucandra/Solr. We had the same problem with long and used slong data type instead and that fixed the problem. So, try changing the popularity field data type to sint in your schema.xml file.

@tjake
Copy link
Owner

tjake commented May 27, 2010

Is this still happening?

@leoz-xx
Copy link

leoz-xx commented Jun 1, 2010

I'm having similar issues (with long and double instead of int though). Range queries not working... slong in solr might be the walk arround as sdonelow commented, but I prefer not using solr in my project...

@tnine
Copy link

tnine commented Jun 27, 2010

Hey guys. I'm assuming you're still having this issue? I'm trying to sort it out, and it appears to be functionally impossible with the current implementation. Basically, the number bits of the data type is right shifted 4 bits at a time. The first byte then holds the number of bits shifted off. You can view the logic here for creating the trie structures.

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/NumericRangeQuery.html

This allows for faster range scanning and in makes seeks faster. However according to the IndexReader spec here,

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#terms(org.apache.lucene.index.Term)

"If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term."

The current implementation does not do this, it merely returns no results since it scans over 2 keys, then returns only 2 key spaces and returns an empty result set. I'm looking into trying to rectify this problem. Correcting it may involve reading far more than 2 keys initially, so it will not be a very efficient operation.

@tnine
Copy link

tnine commented Jun 27, 2010

I've created a simple unit test that mimics how keys are written with my own IndexReader and TermEnum here.

http://github.com/tnine/Lucandra/blob/master/test/lucandra/BytesOrderingEnumTest.java

It doesn't complete because I haven't implemented all the document scoring. However, it does correctly identify records to enumerate over when no prefix is present. If you uncomment my commented lines, you will see the byte comparator used no longer seeks to the correct index when index\docfield prefixes are used. Therefore, something isn't quite right with the prefix and the byte ordering. I just can't put my finger on it, since all the prefix bytes should be the same in common fields, and hence irrelevant in the byte comparison up to the first byte in the trie structure.

@tnine
Copy link

tnine commented Jun 28, 2010

After much digging this appears to be an encoding issue with thrift and batch mutate itself. The issue and corresponding unit test is here.

https://issues.apache.org/jira/browse/CASSANDRA-1235

@tmahesh
Copy link

tmahesh commented Jul 24, 2010

Is sorting on integer/float fields supported in solr-cassandra?

I tried the below query on the index of example docs. But did not get results in correct order
http://localhost:8983/solr/select/?q=cat:electronics&sort=price%20asc

I have tried changing field type to "sint" "tint" but no success. Sorting on string field type works though. Any suggestion on how to fix sorting issue for integer and float?

@tnine
Copy link

tnine commented Jul 26, 2010

see the underlying bug. We can't properly encode any numeric fields, as a result, you can't perform sorting on them. Until Cassandra fixes this issue, no numeric field searching/sorting will work.

@tmahesh
Copy link

tmahesh commented Jul 26, 2010

  1. We can store integer/float data and fetch it out correctly (i.e., price filed fetched from the index is as it was stored)
  2. From what i understand, sorting of result set happens inside solr indexsearcher

Shouldn't sorting work in such a case?

I'm confused on how the cassandra bug impacts sorting while we can fetch the stored data correctly from the index.

@tnine
Copy link

tnine commented Jul 27, 2010

I could be wrong in how solr stores and retrieves indexes. However I know I'm accurate in stating that we currently can't store numeric values in Cassandra correctly/consistently. Run my test cases and you'll see exactly what I mean. You will occasionally get correct behavior as the encoding problem does not present itself with all values. It seems to depend on the byte value that is stored. This fix was bumped from 0.6.4 to 0.6.5, so it doesn't seem to be getting fixed anytime soon. Check out the Solr code, and see if it's using numeric values in the underlying fields. If it is, you can't use it until the Cassandra bug is fixed.

@tjake
Copy link
Owner

tjake commented Jul 27, 2010

Actually, cassandra guys decided to ditch String keys for byte[], this will fix the issue, I assume its going in 0.6.5 but you can see it now in trunk.

@sdonelow
Copy link

tjake, what does this mean "ditch String keys for byte[]"?

@tnine
Copy link

tnine commented Jul 29, 2010

Currently all keys in Cassandra are UTF8 strings. This has been removed in favor of using native bytes in the new version. This should eliminate the issues we see with shifting 7 bits of numeric types into the lower 7 bits of a UTF8 byte. Hence removing the limitation of numeric fields in Lucandra. Note that this will require a decent amount of rework of Lucandra, but I plan on doing that as soon at 0.7 is release since we really need numeric functionality.

@tnine
Copy link

tnine commented Sep 19, 2010

Just an fyi guys. This has been fixed in release 0.6.5 of Cassandra, so numeric fields should now work.

@tjake
Copy link
Owner

tjake commented Oct 2, 2010

fixed in cassandra 0.6.5

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants