SQL Optimization – Cost Displayed by EXPLAIN for INSERT

execution-planoptimization

I wonder why there is a difference in execution plan cost for the 2 queries :

create table test_insert(id int not null primary key, value varchar2(10));

--1 
insert into test_insert(id,value)
values (2,'111');

--2 
insert into test_insert(id,value)
select 3,'111' from dual where not exists (select null from test_insert where id =3);

Cost for query 2 is always higher (how much depends on number of rows in the table).
In my understanding not exits ... should not add any overhead – in order to enforce PK constraint the engine must check the corresponding unique index anyway, so subquery adds extra parsing time, but not execution time or extra step to the plan.

The example is for Oracle, but I checked Postgres as well, results are similar.

For example,

--1
| Id  | Operation                | Name        | Cost  |
--------------------------------------------------------
|   0 | INSERT STATEMENT         |             |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | TEST_INSERT |       |
--------------------------------------------------------
--2 
| Id  | Operation                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |       |       |     3 (100)|          |
|   1 |  LOAD TABLE CONVENTIONAL | TEST_INSERT  |       |       |            |          |
|*  2 |   FILTER                 |              |       |       |            |          |
|   3 |    FAST DUAL             |              |     1 |       |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN     | SYS_C0012345 |     1 |    13 |     1   (0)| 00:00:01 |

Best Answer

I'm not exactly sure what you're trying to demonstrate, in the bottom you have a NOT EXISTS. That's entails either an

  • anti-join or
  • as PostgreSQL and Oracle plan it, a correlated index only scan

PostgreSQL plan,

test=# explain analyze insert into test_insert(id,value)
select 3,'111' where not exists (select 1 from test_insert where id =3);
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test_insert  (cost=8.17..8.19 rows=1 width=4) (actual time=0.190..0.190 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*"  (cost=8.17..8.19 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1)
         ->  Result  (cost=8.17..8.18 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)
               One-Time Filter: (NOT $0)
               InitPlan 1 (returns $0)
                 ->  Index Only Scan using test_insert_pkey on test_insert test_insert_1  (cost=0.15..8.17 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                       Index Cond: (id = 3)
                       Heap Fetches: 0
 Planning time: 0.294 ms
 Execution time: 0.241 ms
(10 rows)

So of course it's slower. It's checking the index before it inserts, and then the btree-insertion for the PRIMARY KEY is checking again (as it always must).

If you want to roll that into a try-or-skip, use ON CONFLICT DO NOTHING

EXPLAIN ANALYZE
INSERT INT test_insert(id,value)
VALUES (3,111)
ON CONFLICT DO NOTHING;

---------------------------------------------------------------------------------------------------
 Insert on test_insert  (cost=0.00..0.01 rows=1 width=0) (actual time=0.034..0.034 rows=0 loops=1)
   Conflict Resolution: NOTHING
   Tuples Inserted: 0
   Conflicting Tuples: 1
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
 Planning time: 0.054 ms
 Execution time: 0.065 ms
(7 rows)

Here is an example of your problem.. Sample schema,

CREATE TABLE f (id PRIMARY KEY);

Sample transactions,

TXN1                                TXN2

INSERT INTO f(id)
SELECT x FROM (VALUES (1))
  AS g(x)
WHERE pg_sleep(10) IS NOT NULL
AND NOT EXISTS (
  SELECT 1
  FROM f
  WHERE g.x = f.id
);

                                     INSERT INTO f(id) VALUES (1);

What happens in the above?

  • The EXISTS tests operating in the default READ COMMITTED ISOLATION level gets a snapshot. And then it holds up and goes to sleep.
  • The second transaction inserts a value.
  • The first transaction wakes up, and tests its snapshot for exists (or ran the test prior to going to sleep)
  • The first transaction tries to update the btree which is implementing the unique constraint.
  • The first transaction throws an error,

    ERROR:  duplicate key value violates unique constraint "f_pkey"
    DETAIL:  Key (id)=(1) already exists.
    

Switch the order of those transactions though and the first transaction sees that the key is there and never tries to insert into the btree.

Hopefully that helps you understand what's happening.