Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code coverage report no longer ends on huge PL/SQL code base #882

Closed
maxpsq opened this issue Mar 28, 2019 · 25 comments

Comments

Projects
None yet
2 participants
@maxpsq
Copy link

commented Mar 28, 2019

After upgrading from version 3.1.3 to 3.1.6 the coverage report on a huge 4.7M lines code base seems to never end. I shut the execution down after more then 7 hours while on version 3.1.3 it took less then 10 minutes to generate the report.

The script was run via SQL*Plus:

SET define off
SET serveroutput on size unlimited
SET linesize 250
SET echo off feedback off termout off

spool full_schema_coverage.html
BEGIN
  ut.run(ut_coverage_html_reporter()); 
END;
/
spool off
set serveroutput off

Information about utPLSQL and Database version,

Oracle XE 12.1
DB version 12.1.0.1.0
DB compatibility 12.1.0.0.0


UT_VERSION                                                                        
--------------------------------------------------------------------------------
v3.1.6.2735

BANNER                                                                               
--------------------------------------------------------------------------------
Personal Oracle Database 12c Release 12.1.0.1.0 - 64bit Production           
PL/SQL Release 12.1.0.1.0 - Production                                        
CORE 12.1.0.1.0	Production                                                     
TNS for 64-bit Windows: Version 12.1.0.1.0 - Production                     
NLSRTL Version 12.1.0.1.0 - Production                                          

PARAMETER                      VALUE                                             
------------------------------ --------------------------------------------------
NLS_LANGUAGE                   ITALIAN                                           
NLS_TERRITORY                  ITALY                                             
NLS_CURRENCY                   €                                                 
NLS_ISO_CURRENCY               ITALY                                             
NLS_NUMERIC_CHARACTERS         ,.                                                
NLS_CALENDAR                   GREGORIAN                                         
NLS_DATE_FORMAT                DD-MON-RR                                         
NLS_DATE_LANGUAGE              ITALIAN                                           
NLS_SORT                       WEST_EUROPEAN                                     
NLS_TIME_FORMAT                HH24:MI:SSXFF                                     
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH24:MI:SSXFF                           
NLS_TIME_TZ_FORMAT             HH24:MI:SSXFF TZR                                 
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH24:MI:SSXFF TZR                       
NLS_DUAL_CURRENCY              €                                                 
NLS_COMP                       BINARY                                            
NLS_LENGTH_SEMANTICS           BYTE                                              
NLS_NCHAR_CONV_EXCP            FALSE

PORT_STRING                                                                           
-------------------------------------------------------------------------------------
IBMPC/WIN_NT64-9.1.0

According to Oracle Enterprise Manager, the SQL statement running for hours is

SELECT TEXT, ITEM_TYPE FROM TABLE((:B1 ).GET_LINES(:B2 , :B3 ))

@maxpsq maxpsq changed the title Code coverage report no longer ends on huge PL/SQL report Code coverage report no longer ends on huge PL/SQL codebase Mar 28, 2019

@maxpsq maxpsq changed the title Code coverage report no longer ends on huge PL/SQL codebase Code coverage report no longer ends on huge PL/SQL code base Mar 28, 2019

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

Hi @maxpsq ,
Thanks for reporting this issue.

Two questions:

  1. Is this issue continuously reproducable?
  2. Can you try running with:
  • utPLSQL-cli
  • select * from ut.run(ut_coverage_html_reporter()); in the same tool.
  • SQLPlus (not sqlcl/SQLDeveloper)

and see if you get the same problem?

  1. Can you check which wait events is the session waiting on mostly, after 10 minutes of running?
@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

Can you also check if going back to VARCHAR2 in table definition for output-buffer will help?

To do so:

  • Make sure no other session is using the table.
  • Connect as ut3 or other user that has privileges to modify the table and execute below.
-- trying to go back to old-style table structure
truncate table ut3.ut_output_buffer_tmp$;
alter table ut3.ut_output_buffer_tmp$ drop constraint ut_output_buffer_tmp_ck;
alter table ut3.ut_output_buffer_tmp$ add text_new varchar2(4000);
alter table ut3.ut_output_buffer_tmp$ rename column text to text_old;
alter table ut3.ut_output_buffer_tmp$ drop column text_old;
alter table ut3.ut_output_buffer_tmp$ rename column text_new to text;
alter table ut3.ut_output_buffer_tmp$ add constraint ut_output_buffer_tmp_ck check(
         is_finished = 0 and (text is not null or item_type is not null )
      or is_finished = 1 and text is null and item_type is null );

Try running your tests again with coverage and see if it helps with performance.

Please provide feedback on this so we can think of a proper fix, as the column type is currently expected to be CLOB by some of new reporters.

Here is a script to revert the change back.

-- revert temporary table structure change
truncate table ut3.ut_output_buffer_tmp$;
alter table ut3.ut_output_buffer_tmp$ drop constraint ut_output_buffer_tmp_ck;
alter table ut3.ut_output_buffer_tmp$ add text_old clob;
alter table ut3.ut_output_buffer_tmp$ rename column text to text_new;
alter table ut3.ut_output_buffer_tmp$ drop column text_new;
alter table ut3.ut_output_buffer_tmp$ rename column text_old to text;
alter table ut3.ut_output_buffer_tmp$ add constraint ut_output_buffer_tmp_ck check(
         is_finished = 0 and (text is not null or item_type is not null )
      or is_finished = 1 and text is null and item_type is null );
@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 30, 2019

Hi Jacek,

not yet tried what you suggested, I'll do it soon... by the way I have some interesting details:

I tried to run the report one more time, and this time it finished after 70 minutes (first success on 4 attempts).

I monitored the content of table UT_OUTPUT_BUFFER_TMP and I noticed that all others OUTPUT_IDs within the table (related to previous runs that never ended) have the same problem: one single row whose content is exactly the same for all output_id:

message_id ut_output_data_row is_finished
14767460 (; initialize) 0

These records prevented each of the previous runs to exit the loop within the routine UT_OUTPUT_TABLE_BUFFER.GET_LINES because of TEXT=NULL and IS_FINISHED=0. We need to investigate what set these values in the records.

The only thing has changed, is the addition of suitepath annotation to the only test package I have in my repository. I have no idea if that may have led the last run to finish.

Of course, the report generation time has significantly grown since version 3.1.3 (almost one order of magnitude), possibly related to the introduction of LOBs. It would be nice to make it faster.

Here you can find a comparison between:

v3.1.3

v3 1 3

and v3.1.6
v3 1 6

Almost everything is slower in v3.1.6, but the statement that performs the worst is

SELECT TEXT, ITEM_TYPE FROM TABLE((:B1 ).GET_LINES(:B2 , :B3 ))

I also notice the deletion of UT_OUTPUT_BUFFER_TMP may in some circumstances take a poor performing access path. I'll send you a pull request with a simple semantic change in the statement that should suggest the CBO to use the right access path.

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Output buffer read process always waits for main process to finish writing data.
The finish message is sent from utplsql with finalize message.
Buffer is purged from orphaned data daily (on next run) so you might see some orphans in output buffer for some time if youbruns dont finish successfully.

I would suggest trying out the experiment i suggested first. If you want to invest more time try enabling trace on the long running sessions to see what takes time.

This is odd as you are running in a single session so run must be finisjed before reading the output.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 30, 2019

Here are my findings:

VARCHAR2 vs CLOB

As expected, VARCHAR2 version runs faster:

  1. CLOB version runs in 70 minutes
  2. VARCHAR2 version runs in 41.5 minutes

Wait events

There are no significant wait events.

Wait Class Waits Total Wait Time (sec) Avg Wait (ms) % DB time Avg Active Sessions
DB CPU 2,437 89.3 0.7
System I/O 110,157 179 2 6.6 0.0
Configuration 620 177 286 6.5 0.0
User I/O 196,734 115 1 4.2 0.0
Other 15,009 30 2 1.1 0.0
Application 15 0 8 .0 0.0
Concurrency 13 0 2 .0 0.0
Commit 69 0 0 .0 0.0
Network 285 0 0 .0 0.0

Bulk size

I got better performance increasing lc_bulk_limit. Here's some figures using the original CLOB version:

Size Time (m)
3000 20
1000 22
500 29
300 36
100 70

What about giving the possibility to pass the bulk size as a parameter somewhere, so that one can tune the extraction according to the code base size?

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

If deletes are slow, the most efficient way to do it would be:

  • Change table to heap (not index organized table)
  • Keep the index as is
  • in select statement return rowids
  • delete by rowids
    That should give significant boost.
@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

create table ut_output_buffer_tmp$(
  output_id      raw(32) not null,
  message_id     number(38,0) not null,
  text           clob,
  item_type      varchar2(1000),
  is_finished    number(1,0) default 0 not null,
  constraint ut_output_buffer_tmp_pk primary key(output_id, message_id),
  constraint ut_output_buffer_tmp_ck check(
         is_finished = 0 and (text is not null or item_type is not null )
      or is_finished = 1 and text is null and item_type is null ),
  constraint ut_output_buffer_fk1 foreign key (output_id) references ut_output_buffer_info_tmp$(output_id)
) lob(text) store as securefile ut_output_text(retention none);

Working example:

declare
  self_output_id       raw(32) := '853D05E7BECB0205E053020011AC5386';
  a_initial_timeout    natural := null;
  a_timeout_sec        natural := null;
  l_buffer_data        ut_output_data_rows;
  type t_rowid_tab     is table of urowid;
  l_message_rowids     t_rowid_tab;
  l_finished_flags     ut_integer_list;
  l_already_waited_for number(10,2) := 0;
  l_finished           boolean := false;
  lc_init_wait_sec     constant naturaln := coalesce(a_initial_timeout, 5 ); -- 5 seconds
  lc_max_wait_sec      constant naturaln := coalesce(a_timeout_sec, 5); -- 5 seconds
  l_wait_for           integer := lc_init_wait_sec;
  lc_short_sleep_time  constant number(1,1) := 0.1; --sleep for 100 ms between checks
  lc_long_sleep_time   constant number(1) := 1;     --sleep for 1 s when waiting long
  lc_long_wait_time    constant number(1) := 1;     --waiting more than 1 sec
  l_sleep_time         number(2,1) := lc_short_sleep_time;
  lc_bulk_limit        constant integer := 100;

  procedure remove_read_data(a_message_rowids t_rowid_tab) is
    pragma autonomous_transaction;
  begin
    forall i in 1 .. a_message_rowids.count
    delete from ut_output_buffer_tmp a
     where rowid = a_message_rowids(i);
    commit;
  end;

  procedure remove_buffer_info is
    pragma autonomous_transaction;
  begin
    delete from ut_output_buffer_info_tmp a
     where a.output_id = self_output_id;
    commit;
  end;

  begin
  while not l_finished loop
  
    with ordered_buffer as (
      select a.rowid, ut_output_data_row(a.text, a.item_type), is_finished
        from ut_output_buffer_tmp a
       where a.output_id = self_output_id
       order by a.message_id
    )
    select b.*
      bulk collect into l_message_rowids, l_buffer_data, l_finished_flags
      from ordered_buffer b
     where rownum <= lc_bulk_limit;

    --nothing fetched from output, wait and try again
    if l_buffer_data.count = 0 then
      $if dbms_db_version.version >= 18 $then
        dbms_session.sleep(l_sleep_time);
      $else
        dbms_lock.sleep(l_sleep_time);
      $end
      l_already_waited_for := l_already_waited_for + l_sleep_time;
      if l_already_waited_for > lc_long_wait_time then
        l_sleep_time := lc_long_sleep_time;
      end if;
    else
      --reset wait time
      -- we wait lc_max_wait_sec for new message
      l_wait_for := lc_max_wait_sec;
      l_already_waited_for := 0;
      l_sleep_time := lc_short_sleep_time;
      for i in 1 .. l_buffer_data.count loop
        if l_buffer_data(i).text is not null then
--          pipe row( l_buffer_data(i) );
            dbms_output.put_line(l_buffer_data(i).text);
            dbms_output.put_line(l_buffer_data(i).item_type);
        elsif l_finished_flags(i) = 1 then
          l_finished := true;
          exit;
        end if;
      end loop;
      remove_read_data(l_message_rowids);
    end if;
    if l_finished or l_already_waited_for >= l_wait_for then
      remove_buffer_info();
      if l_already_waited_for > 0 and l_already_waited_for >= l_wait_for then
        raise_application_error(
          ut_utils.gc_out_buffer_timeout,
          'Timeout occurred while waiting for output data. Waited for: '||l_already_waited_for||' seconds.'
        );
      end if;
    end if;
  end loop;
end;
/
@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

If you could adjust the output buffer to use rowids and see how it works, that would be great help.
We should be able to increase the bulk size to 1000 quite safely .

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

In 3.1.3 we were using delete from (select ... order by) returning
This is however not supported for CLOB columns and with 3.1.4 and new reporting capabilities we need CLOB.

There are two ways around it:

  • we may need to create two separate output buffers for separate use-cases
  • we tune the existing output buffer to be more efficient

Let me know where will you get with the tuning.

Could you try:

  • delete by rowid with CLOB table
  • delete by rowid with VARCHAR table

If we get good-enough results with ROWID deletes we could continue to use shared buffer code.
If not, new buffer type will need to be created.
We would then use new buffer type only for the reporters that need produce more than 4k data in one row.

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 30, 2019

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

15 minutes is still much slower than "less than 10 minutes".
Maybe we can do better.

Two dedicated output buffers (varchar2/clob) will do the trick.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Well, in that case I suppose we should focus even on other things... the SQL Monitor graphs I posted, reveal there are other statemets that perform worser then before. I'll give a look there before playing the "duble buffer card". I'm extracting a new SQL monitor graph right now so we can make a comparison.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Here you can find what has changed, of course these are only the statement considered slow by the Oracle Enterprise manager. The insert in the same table for example is 4 times slower then before.

For the record, this last run took 21.3 miuntes (slower then yesterday)

v3.1.3
image

out_buffer_performance:
image

A possible further improvement can be reading the buffer table: now the bulk limit require the whole dataset to be ordered before fetching. What about calculating the message_id on which to stop so we can use a where condition to limit both the access and the fetch? This should make the access time linear in time, while now it is faster and faster on each iteration because of deletion.

@jgebal jgebal self-assigned this Mar 31, 2019

@jgebal jgebal added the bug label Mar 31, 2019

@jgebal jgebal added this to the v3.1.7 milestone Mar 31, 2019

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

That is actually really good idea @maxpsq
I've added this functionality to the PR.
Let me know if you can check the performance on your schema now.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Here's the results:

  1. INSERT is faster because there's no longer SEQUENCE related overhead: 8.2 m => 5.7 m
  2. SELECT is faster: 4.9 m => 3.8 m
  3. Total running time: 19.0 m
    Approved!
@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

Can you see if Index organized table performs any better?
Change:

create table ut_output_buffer_tmp$(
  output_id      raw(32) not null,
  message_id     number(38,0) not null,
  text           clob,
  item_type      varchar2(1000),
  is_finished    number(1,0) default 0 not null,
  constraint ut_output_buffer_tmp_pk primary key(output_id, message_id),
  constraint ut_output_buffer_tmp_ck check(
         is_finished = 0 and (text is not null or item_type is not null )
      or is_finished = 1 and text is null and item_type is null ),
  constraint ut_output_buffer_fk1 foreign key (output_id) references ut_output_buffer_info_tmp$(output_id)
) nologging initrans 100
lob(text) store as securefile ut_output_text(retention none enable storage in row)

to:

create table ut_output_buffer_tmp$(
  output_id      raw(32) not null,
  message_id     number(38,0) not null,
  text           clob,
  item_type      varchar2(1000),
  is_finished    number(1,0) default 0 not null,
  constraint ut_output_buffer_tmp_pk primary key(output_id, message_id),
  constraint ut_output_buffer_tmp_ck check(
         is_finished = 0 and (text is not null or item_type is not null )
      or is_finished = 1 and text is null and item_type is null ),
  constraint ut_output_buffer_fk1 foreign key (output_id) references ut_output_buffer_info_tmp$(output_id)
) organization index
    nologging initrans 100
  overflow nologging initrans 100
lob(text) store as securefile ut_output_text(retention none enable storage in row)

Thanks

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

I wonder if there is any / significant difference.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Yes, there is, but IOT is slower:

  1. INSERT has not changed (5.7 => 5.7)
  2. SELECT runs at a third of the speed ( 118 s => 368 s)
    TOTAL: 19.0 m => 22.7 m
@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

Thanks a million @maxpsq
You've helped me a lot with this issue.
One last thing to check would be, the thing I asked for initially.
How mych better would it be if we would use VARCHAR2 instead of CLOB in the buffer.

If you could provide info on how much better (if at all) it is performing - that would be really helpful.
Thanks
Jacek

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

I hope we can actually get to similar/better performance than in 3.1.3

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Here's CLOB vs VARCHAR2 version timings: TOTAL: 19.0 m => 12.1 m

I suspect the higher time may also be related to redo log activity: now there are more transactions within the loop, compared to v3.1.3. For what I see, I have:

  1. Log file switch (checkpoint incomplete): 15% of total time
  2. Log file switch completion: 2% ot total time

Of course this is also related to redo log file configuration (Oracle default on my laptop)

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

You can try the latest version of feature branch - should be even better.

@maxpsq

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Here are three different runs of the same version, because I see that wait events related to redo-log activity may vary:

Run 1: 13.1 m / 25% Log file switch
Run 2: 12.7 m / 22% Log file switch
Run 3: 12.6 m / 24% Log file switch

In all cases, wait events on redo-log have grown compared to the previous version and this is possibly hiding the benefits you may have brought to the code. Notice the redo-log activity is for the most part related to the INSERT statement (77%~79% of insertion time).
What if the buffer tables are tranformed in Global Temporary Tables? According to the theory, less redo-log are generated, but this may have a significant impact on design, unless you use the "preserve on commit clause". Just an idea, though ...

Redo log activity for INSERT on previous versions was 29%~31% of the insert time.

@jgebal

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

Unfortunately this is not applicable as we use multi-session for realtime reporting.
So main session writes reporter(s) data to buffer. Other session(s) read the buffer.
This enables real-time reporting with utplsql-cli.
That's why we need physical table for buffer at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.