Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to join cursors by specific columns(PK/UK) #453

Closed
jgebal opened this issue Aug 10, 2017 · 2 comments · Fixed by #666
Closed

Add ability to join cursors by specific columns(PK/UK) #453

jgebal opened this issue Aug 10, 2017 · 2 comments · Fixed by #666
Assignees
Milestone

Comments

@jgebal
Copy link
Member

jgebal commented Aug 10, 2017

Ability to join cursors by specific columns(PK/UK)


This feature could/should be implemented in a similar fashion as excluding columns with a_exclude parameter.

We would have additional parameter for the ut_equal matcher if the matcher is used with refcursor.

If the parameter is present, we can transform it into XPATH to extract PK/UK/join specific column values from the data and store them in a separate VARCHAR2 column of the temp table for cursor data (with limit of 4000 bytes).
There are few ways of extracting column data from XML - XSL
https://stackoverflow.com/questions/17314062/how-to-convert-xml-to-csv-using-xsl
or
Extract with XPath:

select extract(dbms_xmlgen.getxmltype('
       select object_name, object_type
       from user_objects
       '),'//OBJECT_NAME/text()|//OBJECT_TYPE/text()') xml
from dual;

The temp table keys column could be indexed with a unique index, so that we can actually validate that the join columns are unique.

If columns are not unique, the expectation should fail with meaningful message informing that the cursor data contains duplicates.
We need to be able to inform the user which of the compared cursors doesn't have uniqueness which is again tricky as the data_value are not aware of where they were created from (actual or expected).

@jgebal jgebal added this to To Do in expectations Oct 25, 2017
@jgebal
Copy link
Member Author

jgebal commented Apr 19, 2018

Currently the cursor comparison is done as comparison of list data (order of rows matters).
We should implement an option to do a set-based comparison, were order is not relevant.
Here is how it could be done (it is still a PoC).

Implement new methods:

  • ut_equal.join_using(a_columns varchar2), ut_equal.join_using(a_columns ut_varchar2_list) - those methods would be building list of join columns to use for comparison
  • ut_equal.unordered() - this method would to a set - based comparison by performing "join" using whole row content

The main difference between the two is that when using unordered (whole row join), we can only identify that rows were joined or not, so in fact we can find rows that:

  • are missing in actual
  • are unexpected in actual

When joining with join_using, we can clearly identify rows that:

  • were joined but the content is different
  • are missing in actual
  • are unexpected in actual

Snippets for PoC

Existing solution - joining by row_no and comparing whole row - simplified example (without dbms_xml)

with expected_cursor as (select * from user_objects order by 1 desc),
     actual_cursor as (select * from user_objects union all select * from user_objects where rownum = 1 order by 1 asc)
select nvl(exp.row_no, act.row_no) diff_rows
  from (select column_value row_data, rownum as row_no
          from table( xmlsequence( extract( XMLTYPE(cursor( select * from expected_cursor) ),'ROWSET/*') ) ) ucd
        ) exp
        full outer join 
        (select column_value row_data, rownum as row_no
          from table( xmlsequence( extract( XMLTYPE(cursor( select * from actual_cursor) ),'ROWSET/*') ) ) ucd
        ) act
        on (exp.row_no = act.row_no)
where nvl(dbms_lob.compare(xmlserialize( content exp.row_data no indent), xmlserialize( content act.row_data no indent)),1) != 0;

New style - joining by whole row data - simplified example (without dbms_xml)

with expected_cursor as (select * from user_objects order by 1 desc),
     actual_cursor as (select * from user_objects union all select * from user_objects where rownum = 1 order by 1 asc)
select coalesce(exp.row_hash,act.row_hash) row_hash,
       coalesce(exp.duplicate_no, act.duplicate_no) duplicate_no,
       case when exp.row_hash is null then 'extra row' else 'missing row' end diff_type
  from (select ucd.*, row_number() over(partition by row_hash order by row_hash) duplicate_no
          from (select ucd.column_value row_data,
                       dbms_crypto.hash( value(ucd).getclobval(),3/*HASH_SH1*/) row_hash
                  from table( xmlsequence( extract( XMLTYPE(cursor( select * from expected_cursor) ),'ROWSET/*') ) ) ucd
                ) ucd
        )  exp
full outer join
       (select ucd.*, row_number() over(partition by row_hash order by row_hash) duplicate_no
          from (select ucd.column_value row_data,
                       dbms_crypto.hash( value(ucd).getclobval(),3/*HASH_SH1*/) row_hash
                  from table( xmlsequence( extract( XMLTYPE(cursor( select * from actual_cursor) ),'ROWSET/*') ) ) ucd
                ) ucd
       )  act
    on  exp.row_hash = act.row_hash
    and exp.duplicate_no = act.duplicate_no
where exp.row_hash is null or act.row_hash is null;

New style query comparing with PK - simplified example (without dbms_xml)

with expected_cursor as 
        (select 4 id, 'd' value from dual union all
         select 2 id, 'b' value from dual union all
         select 3 id, 'b' value from dual union all
         select 3 id, 'c' value from dual union all
         select 1 id, 'a' value from dual 
         ),
     actual_cursor as 
        (select 1 id, 'a' value from dual union all
         select 2 id, 'b' value from dual union all
         select 3 id, 'c' value from dual union all
         select 5 id, 'x' value from dual union all
         select 4 id, 'w' value from dual 
         )
select coalesce(exp.row_hash,act.row_hash) row_hash,
       coalesce(exp.pk_duplicate_no, act.pk_duplicate_no) pk_duplicate_no,
       case
         when act.row_hash is null then 'missing row'
         when exp.row_hash is null then 'extra row' 
         else 'diff row' 
       end diff_type,
       exp.row_data.getClobVal() expected_data,
       act.row_data.getClobVal() actual_data
  from (select ucd.*, row_number() over(partition by pk_hash order by row_hash) pk_duplicate_no
          from (select ucd.column_value row_data, 
                       dbms_crypto.hash( value(ucd).getclobval(),3/*HASH_SH1*/) row_hash,
                       dbms_crypto.hash( extract(value(ucd),'ROW/ID').getClobVal(),3/*HASH_SH1*/) pk_hash
                  from table( xmlsequence( extract( XMLTYPE(cursor( select * from expected_cursor) ),'ROWSET/*') ) ) ucd
                ) ucd
        )  exp
full outer join
       (select ucd.*, row_number() over(partition by pk_hash order by row_hash) pk_duplicate_no
          from (select ucd.column_value row_data, 
                       dbms_crypto.hash( value(ucd).getclobval(),3/*HASH_SH1*/) row_hash,
                       dbms_crypto.hash( extract(value(ucd),'ROW/ID').getClobVal(),3/*HASH_SH1*/) pk_hash
                  from table( xmlsequence( extract( XMLTYPE(cursor( select * from actual_cursor) ),'ROWSET/*') ) ) ucd
                ) ucd
       )  act
    on  exp.pk_hash = act.pk_hash
    and exp.pk_duplicate_no = act.pk_duplicate_no
where exp.row_hash is null or act.row_hash is null or exp.row_hash != act.row_hash;

@lwasylow lwasylow self-assigned this Apr 20, 2018
@lwasylow
Copy link
Member

lwasylow commented May 3, 2018

Work in Progress

@lwasylow lwasylow added this to the 3.1.2 milestone May 13, 2018
@jgebal jgebal moved this from To Do to In Progress in expectations Jul 13, 2019
@jgebal jgebal moved this from In Progress to Done in expectations Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants