Skip to content
This repository has been archived by the owner on Nov 22, 2017. It is now read-only.

Numeric fields are not properly stored or indexed #40

Closed
tnine opened this issue Oct 4, 2010 · 4 comments
Closed

Numeric fields are not properly stored or indexed #40

tnine opened this issue Oct 4, 2010 · 4 comments

Comments

@tnine
Copy link

tnine commented Oct 4, 2010

Hi Jake,
Take a look at my fork, I've added tests from Uwe's numeric tests on the lucene core. Only a handful of tests appear to be working. I'll be correcting this in my fork and I'll let you know when I'm done.

@tnine
Copy link
Author

tnine commented Oct 6, 2010

Hey Jake,
I've investigated this further, and I have determined the issue. The LuceneTermEnum does not properly match the spec when enumerating numeric trie terms. I've added some debug output when using the default RamDirectory on version 2.9.3 and running the TestNumericRangeQuery32 tests. I receive this enumeration order when the "term()" method is invoked on their SegmentTermEnum class.

Returning term for field 'field8' hex value is : 60077f7e6814
Returning term for field 'field8' hex value is : 60077f7e6814
Returning term for field 'field8' hex value is : 60077f7e6814
Returning term for field 'field8' hex value is : 60077f7e6814
Returning term for field 'field8' hex value is : 68037f7f00
Returning term for field 'field8' hex value is : 68037f7f00
Returning term for field 'field8' hex value is : 68037f7f00
Returning term for field 'field8' hex value is : 68037f7f00
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 6804000002
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 70017f7f
Returning term for field 'field8' hex value is : 70017f7f
Returning term for field 'field8' hex value is : 70017f7f
Returning term for field 'field8' hex value is : 70017f7f
Returning term for field 'field8' hex value is : 78007f
Returning term for field 'field8' hex value is : 78007f
Returning term for field 'field8' hex value is : 78007f
Returning term for field 'field8' hex value is : 78007f
Returning term for field 'field8' hex value is : 780100
Returning term for field 'field8' hex value is : 780100
Returning term for field 'field8' hex value is : 780100
Returning term for field 'field8' hex value is : 780100
Returning term for field 'field8' hex value is : 780100
Returning term for field 'field8' hex value is : 780100

These are the results with LucandraTermEnum

Returning term for field 'field8' hex value is : 60077f7e6814
Returning term for field 'field8' hex value is : 600809433244
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f34
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f4e
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 68037f7f68
Returning term for field 'field8' hex value is : 6804000002
Returning term for field 'field8' hex value is : 6804046008
Returning term for field 'field8' hex value is : 6804046008

As you can see the results are not properly enumerated. Given that you're using a Tree for the cached terms, they should be ordered properly after insert. It seems that this may be an issue with the way loadTerms is invoked

@tnine
Copy link
Author

tnine commented Oct 7, 2010

Hi Jake,
I've been digging into this one all day. After searching a bit more, I found an issue in my local copy of the TermEnum which I have corrected. This resolves the enumeration issue I described above. However, the documents are not returned in "default" order. I.E. the order they were added to the index as the test expects. Im assuming this is a bug in the LucandraTermDocs, but I'm having a hard time locating it. Thoughts?

@tnine
Copy link
Author

tnine commented Oct 7, 2010

I've updated my test case on my fork that shows the issue.

http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRangeTests.java

It appears to still be term enum related. The calls to IndexReader.addDocument are occurring in a different order than the insertion.

@tjake
Copy link
Owner

tjake commented Jan 27, 2011

fixed.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants