PostgreSQL – How to Find Leafs Using PostgreSQL’s ltree

performancepostgresqlpostgresql-performance

I have a tree structure based on accounting purposes like

300
300.01
300.02
300.02.01
300.03
.....

I want to assign is_leaf_level column to true for only leaf levels. kod is the accounting number having type ltree. Here is the query:

UPDATE dim_accounting_codes d1
SET is_leaf_level = (
  SELECT count(*)=0
  FROM dim_accounting_codes d2
  WHERE d1.kurum_hesap_plani_id = d2.kurum_hesap_plani_id
    AND d2.kod ~ lquery_in(
      (ltree2text(d1.kod) || '.*{1,}')::cstring
    )
  )

The query plan is as follows:

     Update on dim_kurum_hesap_kodu d1  (cost=0.00..8529064185.14 rows=1468583 width=322)
  ->  Seq Scan on dim_kurum_hesap_kodu d1  (cost=0.00..8529064185.14 rows=1468583 width=322)
        SubPlan 1
          ->  Aggregate  (cost=5807.56..5807.57 rows=1 width=1)
                ->  Bitmap Heap Scan on dim_kurum_hesap_kodu d2  (cost=347.44..5807.53 rows=9 width=0)
                      Recheck Cond: (kod ~ lquery_in(((ltree2text(d1.kod) || '.*{1,}'::text))::cstring))
                      Filter: (d1.kurum_hesap_plani_id = kurum_hesap_plani_id)
                      ->  Bitmap Index Scan on idx_kod_gist  (cost=0.00..347.44 rows=1469 width=0)
                            Index Cond: (kod ~ lquery_in(((ltree2text(d1.kod) || '.*{1,}'::text))::cstring))

Query is running too slowly. (about 40 minutes passed and still working) There are approximately 2000000 records in the table. The index on kod is not used in the plan. Is there any advices on how to mark the leaf level records on ltree structure?

Updated:

result of \d dim_accounting_codes

                                                          Table "public.dim_accounting_codes"
            Column             |              Type              |                                       Modifiers
-------------------------------+--------------------------------+----------------------------------------------------------------------------------------
 dim_kurum_hesap_kodu_id       | integer                        | not null default nextval('dim_kurum_hesap_kodu_dim_kurum_hesap_kodu_id_seq'::regclass)
 version                       | integer                        |
 kurum_hesap_kodu_id           | integer                        |
 kurum_hesap_kodu_adi          | character varying(256)         |
 hesap_tipi                    | character varying(30)          |
 kurum_hesap_plani_id          | integer                        |
 kurum_hesap_plani_ad          | character varying(256)         |
 pasiflestirme_tarihi          | timestamp(0) without time zone |
 aktiflestirme_tarihi          | timestamp(0) without time zone |
 kod                           | ltree                          |
 hesap_kodu_durum              | character varying(8)           |
 ana_hesap_kodu                | character varying(10)          |
 yardimci_ekonomik_kod_i       | character varying(10)          |
 yardimci_ekonomik_kod_ii      | character varying(10)          |
 yardimci_ekonomik_kod_iii     | character varying(10)          |
 yardimci_ekonomik_kod_iv      | character varying(10)          |
 ana_hesap_kodu_adi            | character varying(500)         |
 yardimci_ekonomik_kod_i_adi   | character varying(500)         |
 yardimci_ekonomik_kod_ii_adi  | character varying(500)         |
 yardimci_ekonomik_kod_iii_adi | character varying(500)         |
 yardimci_ekonomik_kod_iv_adi  | character varying(500)         |
 yardimci_ekonomik_kod_level   | smallint                       |
 is_leaf_level              | boolean                        | default false
Indexes:
    "idx_dim_kurum_hesap_kodu_lookup" btree (kurum_hesap_kodu_id)
    "idx_dim_kurum_hesap_kodu_tk" btree (dim_kurum_hesap_kodu_id)
    "idx_kod_gist" gist (kod)

Best Answer

Using ltree2text sounds like a horrible idea, and cstring is an internal thing -- you should never be using that. If an ltree has no descendants it's a leaf, so let's test for that using @>

CREATE TABLE foo(ltree)
AS
  VALUES
    ( 'Top.Science.Astronomy'::ltree ),
    ( 'Top.Science.Astronomy.Astrophysics' ),
    ( 'Top.Science.Astronomy.Cosmology' ),
    ( 'Top.Collections.Pictures.Astronomy' ),
    ( 'Top.Collections.Pictures.Astronomy.Stars' ),
    ( 'Top.Collections.Pictures.Astronomy.Galaxies' ),
    ( 'Top.Collections.Pictures.Astronomy.Astronauts' );

CREATE INDEX ON foo USING gist(ltree);

SELECT *
FROM foo AS f1
WHERE NOT EXISTS (
  SELECT 1
  FROM foo AS f2
  WHERE f1.ltree <> f2.ltree
    AND f1.ltree @> f2.ltree
);

This uses the GIST index on the ltree. See the docs for more information on using @> with ltree.

I suggest not storing this on the table. First, it's maintenance nightmare. After every insert and update, you would have to potentially rewrite the parent nodes to update the leaf column. Moreover, if simply querying for this isn't fast enough, I would instead use a MATERIALIZED VIEW. At least then you know when the view is stale. Keeping this in sync with your data sounds like no-fun to me.