MERGE or UPSERT?

I’ve been recently involved in an Extract, Transform and Load (ETL) project, where I had to run some complex queries in an OTLP database, process the data and save the results in a intermediate table, in order to loading them in a Data Warehouse. First of all, I reviewed previous PL/SQL code existing in the database, they employed the typical UPSERT structure to save the facts and measures, inserting them or just updating the measures, depending on the occurrence of the same facts in the table, a block like this:

DECLARE
  CURSOR cur IS ...;
BEGIN
  FOR reg IN cur
  LOOP
    BEGIN
      INSERT INTO ...;
    EXCEPTION
      WHEN dup_val_on_index THEN
        UPDATE ... SET ...;
    END;
  END LOOP;
END;

On the other hand, I knew the MERGE statement, available from Oracle 9i, but I didn’t know how to use it within this case, where I should query the data, perform a complex process, and load them register by register. Finally, after a search in several forums, I got the solution: select the register from the dual table, a structure like this:

DECLARE
  CURSOR cur IS ...;
BEGIN
  FOR reg IN cur
  LOOP
    MERGE INTO ... 
    USING (SELECT ... register VALUES ... FROM dual) r ON (...)
    WHEN matched THEN
      UPDATE SET ...;
    WHEN NOT matched THEN
      INSERT ...;
  END LOOP;
END;

At this point I wondered, what about performance? To answer this question, I downloaded the advanced variation on Tom Kyte’s runstats utility created by Adrian Billington. Later on, I created a test table:

CREATE TABLE merge_tests AS
SELECT object_id,
  owner,
  object_type,
  object_name,
  1 counter
FROM all_objects
WHERE object_type = 'TABLE';
ALTER TABLE merge_tests ADD CONSTRAINT merge_tests_pk PRIMARY KEY (object_id);

And prepare two loading PL/SQL blocks, UPSERT:

DECLARE
  TYPE object_types_array IS VARRAY(4) OF VARCHAR2(15);
  object_types object_types_array := object_types_array('TABLE', 'INDEX', 'VIEW', 'TRIGGER');
BEGIN
  FOR i IN 1..4
  LOOP
    FOR j IN 1..i
    LOOP
      FOR reg IN
      (SELECT object_id, owner, object_type, object_name
       FROM all_objects
       WHERE object_type = object_types(j))
      LOOP
        BEGIN
          -- Some complex processes here
          INSERT INTO merge_tests 
          (object_id, owner, object_type, object_name, counter)
          VALUES (reg.object_id, reg.owner, reg.object_type, reg.object_name, 1);
        EXCEPTION
        WHEN dup_val_on_index THEN
          UPDATE merge_tests SET counter = counter + 1 WHERE object_id = reg.object_id;
        END;
      END LOOP;
    END LOOP;
  END LOOP;
  COMMIT;
END;

And MERGE:

DECLARE  
  TYPE object_types_array IS VARRAY(4) OF VARCHAR2(15);
  object_types object_types_array := object_types_array('TABLE', 'INDEX', 'VIEW', 'TRIGGER');
BEGIN
  FOR i IN 1..4
  LOOP
    FOR j IN 1..i
    LOOP
      FOR reg IN
      (SELECT object_id, owner, object_type, object_name
       FROM all_objects
       WHERE object_type = object_types(j))
      LOOP
        -- Some complex processes here
        MERGE INTO merge_tests 
        USING (SELECT reg.object_id object_id, reg.owner owner, 
                      reg.object_type object_type, reg.object_name object_name
               FROM dual) r 
        ON (merge_tests.object_id = r.object_id)
        WHEN matched THEN
          UPDATE SET merge_tests.counter = merge_tests.counter + 1 
        WHEN NOT matched THEN
          INSERT (object_id, owner, object_type, object_name, counter) 
          VALUES (r.object_id, r.owner, r.object_type, r.object_name, 1);
      END LOOP;
    END LOOP;
  END LOOP;
  COMMIT;
END;

Finally, I compared the statistics results of executing both blocks (you can download the complete test script following this link):

================================================================================
Runstats report : 19-MAY-2011 20:17:02
================================================================================

--------------------------------------------------------------------------------
1. Summary timings
--------------------------------------------------------------------------------
Run1 ran in 2825 hsecs
Run2 ran in 2005 hsecs
Run2 was 29% quicker than Run1

--------------------------------------------------------------------------------
2. Statistics report
--------------------------------------------------------------------------------

Type  Name                                        Run1         Run2         Diff
----- ----------------------------------- ------------ ------------ ------------
STAT  IMU undo allocation size                   1,176           52       -1,124
LATCH object queue header operation              2,056          791       -1,265
STAT  RowCR hits                                 9,389        6,315       -3,074
STAT  RowCR attempts                             9,389        6,262       -3,127
LATCH cache buffer handles                       5,962        2,682       -3,280
STAT  sorts (memory)                            15,097        9,061       -6,036
STAT  workarea executions - optimal             15,136        9,084       -6,052
STAT  shared hash latch upgrades - no wait      26,509       17,978       -8,531
STAT  Heap Segment Array Inserts                     0       10,428       10,428
STAT  rows fetched via callback                 27,799       14,655      -13,144
STAT  file io wait time                         14,323          554      -13,769
STAT  session uga memory                       196,560      181,380      -15,180
LATCH simulator hash latch                      51,697       30,412      -21,285
STAT  table scan blocks gotten                  70,881       44,692      -26,189
STAT  rollback changes - undo records appl      27,289            0      -27,289
STAT  parse count (total)                       27,328           37      -27,291
LATCH session allocation                        27,312           16      -27,296
LATCH session idle bit                          27,339           36      -27,303
STAT  cluster key scan block gets               27,863          346      -27,517
STAT  cluster key scans                         27,863          346      -27,517
STAT  buffer is pinned count                    73,158       43,386      -29,772
STAT  calls to get snapshot scn: kcmgss         55,757       25,517      -30,240
STAT  HSC Heap Segment Block Changes            65,007       24,778      -40,229
LATCH shared pool                               65,448       24,984      -40,464
STAT  cell physical IO interconnect bytes       98,304       57,344      -40,960
STAT  physical read bytes                       98,304       57,344      -40,960
STAT  physical read total bytes                 98,304       57,344      -40,960
LATCH In memory undo latch                      55,002           12      -54,990
STAT  session cursor cache hits                -37,661       25,456       63,117
STAT  table fetch by rowid                     118,041       52,432      -65,609
STAT  redo entries                             103,653       36,125      -67,528
STAT  execute count                             93,431       25,475      -67,956
STAT  opened cursors cumulative                 93,440       25,483      -67,957
LATCH corrupted undo seg latch                  81,872            0      -81,872
STAT  index fetch by key                       145,893       63,195      -82,698
STAT  non-idle wait count                      262,196      157,319     -104,877
STAT  db block gets from cache (fastpath)      120,257       10,845     -109,412
STAT  sorts (rows)                             316,897      190,141     -126,756
STAT  consistent gets - examination            188,150       60,158     -127,992
STAT  db block changes                         207,750       72,103     -135,647
STAT  buffer is not pinned count               344,148      113,138     -231,010
STAT  db block gets                            314,112       47,875     -266,237
STAT  db block gets from cache                 314,112       47,875     -266,237
STAT  recursive calls                          350,808       51,579     -299,229
LATCH SQL memory manager workarea list lat     764,457      458,682     -305,775
STAT  consistent gets from cache (fastpath     692,627      369,149     -323,478
STAT  no work - consistent read gets           702,273      376,315     -325,958
STAT  index scans kdiixs1                    1,062,372      621,059     -441,313
STAT  consistent gets                          908,133      447,972     -460,161
STAT  consistent gets from cache               908,133      447,972     -460,161
STAT  session logical reads                  1,222,245      495,847     -726,398
LATCH row cache objects                      2,389,490    1,384,217   -1,005,273
STAT  session pga memory                    -1,441,792     -131,072    1,310,720
STAT  table scan rows gotten                 4,169,138    2,691,711   -1,477,427
LATCH cache buffers chains                   2,640,231    1,050,454   -1,589,777
STAT  undo change vector size                5,725,328    2,985,268   -2,740,060
STAT  redo size                             24,358,572    9,067,896  -15,290,676

--------------------------------------------------------------------------------
3. Latching report
--------------------------------------------------------------------------------
Run1 used 6,117,424 latches
Run2 used 2,955,389 latches
Run2 used 51,7% fewer latches than Run1

================================================================================
End of report
================================================================================

As you can read, I got considerable savings using the MERGE structure, so I used it and even proposed changing the previous code. A final tip: make a complete use of aliases to avoid ORA-06512 exceptions.


This test script was written using Oracle SQL Developer and was run in an Oracle 11gR2 (11.2.0.2.0) Database.


Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s