Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Source Data ROWID: ''2.1'' #37

Closed
okennedy opened this issue Jul 30, 2015 · 5 comments
Closed

Invalid Source Data ROWID: ''2.1'' #37

okennedy opened this issue Jul 30, 2015 · 5 comments

Comments

@okennedy
Copy link
Member

Follow the steps in the Demo and then run the following query:

SELECT * FROM FINALDATA where (rating > 4)
@okennedy okennedy modified the milestone: Phase 2 Demo Jul 30, 2015
@okennedy
Copy link
Member Author

The issue seems to have nothing to do with the selection predicate itself. Rather, the FINALDATA lens itself seems to be having some issues when called on to classify rows that do not appear in the final result set. Concretely, MISSING_VALUE runs data-harvesting queries of the form:

PROJECT[ROWID <= JOIN_ROWIDS(__LHS_ROWID, __RHS_ROWID), ID <= PRODUCT_ID, CATEGORY <= PRODUCT_CATEGORY, NAME <= PRODUCT_NAME, RATING <= {{ TYPEDRATINGS1_1[__LHS_ROWID] }}, PID <= {{ TYPEDRATINGS1_0[__LHS_ROWID] }}, REVIEW_CT <= {{ TYPEDRATINGS1_2[__LHS_ROWID] }}, BRAND <= PRODUCT_BRAND, __MIMIR_CONDITION <=  ( ({{ TYPEDRATINGS1_0[__LHS_ROWID] }}=PRODUCT_ID)  AND  (JOIN_ROWIDS(__LHS_ROWID, __RHS_ROWID)='2.1') ) ](
  JOIN(
    PROJECT[RATINGS1_PID <= RATINGS1_PID, RATINGS1_RATING <= RATINGS1_RATING, RATINGS1_REVIEW_CT <= RATINGS1_REVIEW_CT, __LHS_ROWID <= ROWID](
      RATINGS1(RATINGS1_PID:string, RATINGS1_RATING:string, RATINGS1_REVIEW_CT:string // ROWID:rowid)
    ),
    PROJECT[PRODUCT_ID <= PRODUCT_ID, PRODUCT_NAME <= PRODUCT_NAME, PRODUCT_BRAND <= PRODUCT_BRAND, PRODUCT_CATEGORY <= PRODUCT_CATEGORY, __RHS_ROWID <= ROWID](
      PRODUCT(PRODUCT_ID:string, PRODUCT_NAME:string, PRODUCT_BRAND:string, PRODUCT_CATEGORY:string // ROWID:rowid)
    )
  )
)

Note the condition:

( ({{ TYPEDRATINGS1_0[__LHS_ROWID] }}=PRODUCT_ID)  AND  (JOIN_ROWIDS(__LHS_ROWID, __RHS_ROWID)='2.1') )

2.1 is the rowid of a row that the MISSING_VALUE lens is being asked to classify a record for, specifically the 2nd row of ratings1 and the 1st row of product. Looking at the data --- these do not join, and ({{ TYPEDRATINGS1_0[__LHS_ROWID] }}=PRODUCT_ID) is false.

What seems to be happening is that rating>4 is triggering some sort of premature evaluation of classify() for a row that is straight up not in the result set.

@okennedy
Copy link
Member Author

For reference, here's the full query:

--- Optimized Query ---
PROJECT[NAME <= PRODUCT_NAME, BRAND <= PRODUCT_BRAND, CATEGORY <= PRODUCT_CATEGORY, REVIEW_CT <= {{ TYPEDRATINGS1_2[__LHS_ROWID] }}, PID <= {{ TYPEDRATINGS1_0[__LHS_ROWID] }}, ID <= PRODUCT_ID, RATING <= CASE WHEN {{ TYPEDRATINGS1_1[__LHS_ROWID] }} IS NULL THEN {{ FINALDATA_3[JOIN_ROWIDS(__LHS_ROWID, __RHS_ROWID)] }} ELSE {{ TYPEDRATINGS1_1[__LHS_ROWID] }} END, __MIMIR_CONDITION <=  ( ({{ TYPEDRATINGS1_0[__LHS_ROWID] }}=PRODUCT_ID)  AND  (CASE WHEN {{ TYPEDRATINGS1_1[__LHS_ROWID] }} IS NULL THEN {{ FINALDATA_3[JOIN_ROWIDS(__LHS_ROWID, __RHS_ROWID)] }} ELSE {{ TYPEDRATINGS1_1[__LHS_ROWID] }} END>4) ) ](
  JOIN(
    PROJECT[RATINGS1_PID <= RATINGS1_PID, RATINGS1_RATING <= RATINGS1_RATING, RATINGS1_REVIEW_CT <= RATINGS1_REVIEW_CT, __LHS_ROWID <= ROWID](
      RATINGS1(RATINGS1_PID:string, RATINGS1_RATING:string, RATINGS1_REVIEW_CT:string // ROWID:rowid)
    ),
    PROJECT[PRODUCT_ID <= PRODUCT_ID, PRODUCT_NAME <= PRODUCT_NAME, PRODUCT_BRAND <= PRODUCT_BRAND, PRODUCT_CATEGORY <= PRODUCT_CATEGORY, __RHS_ROWID <= ROWID](
      PRODUCT(PRODUCT_ID:string, PRODUCT_NAME:string, PRODUCT_BRAND:string, PRODUCT_CATEGORY:string // ROWID:rowid)
    )
  )
)

@Legacy25
Copy link
Collaborator

This is also fixed I think with commit fe0ae23

screenshot from 2015-08-22 16 46 47

@okennedy
Copy link
Member Author

I'd like to test things a bit more before closing the issue outright, since I still don't have an idea why this got broken in the first place. Do you know why the fix fixed things?

@Legacy25 Legacy25 reopened this Aug 23, 2015
@Legacy25
Copy link
Collaborator

The missing value lens was breaking for multiple columns because every missing value model created by the missing value lens was using the same iterator to get the results. So if there were a combination of multiple missing value models and no-op models, only one of the missing value models was getting the actual data, since as of now there is no reset() in the iterator interface.

Now each model gets its own iterator. I think this is why this issue is being resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants