Answer
The error occurs here:
CASE tmp_code
WHEN COALESCE(tmp_code,0)=0 THEN
Works like this:
CASE WHEN COALESCE(tmp_code,0)=0 THEN
You mixed two different syntax variants of PL/pgSQL CASE
("simple" vs. "searched") in an incompatible way.
There is another error:
update netcen.test set
test=myobj.test,
testname=myobj.testname WHERE testkey=myobj.testkey;
Did you mean:
UPDATE test
SET tes = myobj.tes
, testname = myobj.testname
WHERE testkey = myobj.testkey;
There is also no need for CREATE TYPE netcen.testobj ...
. You can use the table name netcen.test
as type name.
You really want to "UPSERT"
In Postgres 9.5 or later use the new UPSERT (INSERT ... ON CONFLICT DO UPDATE
). Like:
For older versions you can emulate an UPSERT
. The simple form in plpgsql, without concurrency:
CREATE OR REPLACE FUNCTION fun_test(myobj testobj)
RETURNS boolean
LANGUAGE plpgsql AS
$func$
BEGIN
UPDATE test
SET tes = myobj.tes
, testname = myobj.testname
WHERE testkey = myobj.testkey;
IF FOUND THEN
RETURN FALSE;
ELSE
INSERT INTO test SELECT (myobj).*;
RETURN TRUE;
END IF;
END
$func$;
Could just be plain SQL with a data-modifying CTE:
WITH my_row(testkey, tes, testname) AS (
SELECT 1::smallint, 'khaendra@me.net', 'khaendra'
)
, u AS (
UPDATE test t
SET tes = m.tes
, testname = m.testname
FROM my_row m
WHERE t.testkey = m.testkey
RETURNING t.testkey
)
INSERT INTO test (testkey, tes, testname)
SELECT * FROM my_row
WHERE NOT EXISTS (SELECT FROM u);
The time window for a possible race condition is extremely tiny with this form (single combined statement). If concurrency is still an issue (heavy concurrent write load), then ...
Your function reviewed
If the function returns, the row has been inserted or updated. The only other way is an EXCEPTION
of a different kind. The returned value true
is just noise. (Might be more interesting to return true
for INSERT
and false
for UPDATE
.) So I simplified:
CREATE OR REPLACE FUNCTION netcen.fun_test_modified(myobj netcen.test)
RETURNS boolean
LANGUAGE plpgsql AS
$func$
BEGIN
UPDATE netcen.test
SET tes = myobj.tes
, testname = myobj.testname
WHERE testkey = myobj.testkey;
IF FOUND THEN
RETURN true;
END IF;
BEGIN
INSERT INTO netcen.test
SELECT (myobj).*; -- simpler form, parenthesis needed.
EXCEPTION WHEN unique_violation THEN -- cleaner
UPDATE netcen.test
SET tes = myobj.tes
, testname = myobj.testname
WHERE testkey = myobj.testkey;
END;
RETURN true;
END
$func$;
Related answers on SO:
Blog post on UPSERT by Depesz.
Core feature is the window function lag()
.
Also pay special attention to avoid deadlocks and race conditions with concurrent deletes and inserts (which can affect which rows to delete!):
CREATE OR REPLACE FUNCTION remove_vendor_price_dupes(_vendor int)
RETURNS integer AS
$func$
DECLARE
del_ct int;
BEGIN
-- this may or may not be necessary:
-- lock rows to avoid race conditions with concurrent deletes
PERFORM 1
FROM vendor_prices
WHERE vendor = _vendor
ORDER BY sku, effective_date, id -- guarantee row locks in consistent order
FOR UPDATE;
-- delete redundant prices
DELETE FROM vendor_prices v
USING (
SELECT id
, price = lag(price) OVER w -- same as last row
AND (lead(id) OVER w) IS NOT NULL AS del -- not last row
FROM vendor_prices
WHERE vendor = _vendor
WINDOW w AS (PARTITION BY sku ORDER BY effective_date, id)
) d
WHERE v.id = d.id
AND d.del;
GET DIAGNOSTICS del_ct = ROW_COUNT; -- optional:
RETURN del_ct; -- return number of deleted rows
END
$func$ LANGUAGE plpgsql;
Call:
SELECT remove_vendor_price_dupes(1);
Notes
The current version of the 9.3 major release is 9.3.6. The project recommends that ...
all users run the latest available minor release for whatever major version is in use.
A multicolumn index on (vendor, sku, effective_date, id)
would be perfect for this - in this particular order. But Postgres can combine indexes rather efficiently, too.
It might pay to add the otherwise irrelevant price
as last item ot the index to get index-only scans out of this. You'll have to test.
Since you have concurrent deletes it may be a good idea to run a separate delete per vendor to reduce the potential for race conditions and deadlocks. Since there are only a few vendors, this seems like a reasonable partitioning. (Many tiny calls would be comparatively slow.)
I am running a separate SELECT
(PERFORM
in plpgsql, since we do not use the result) because the row locking clause FOR UPDATE
cannot be used together with window functions. Don't let the keyword mislead you, this is not just for updates. I am locking all rows for the given vendor, since the result depends on all rows. Concurrent reads are not impaired, only concurrent writes have to wait until we are done. That's another reason why deleting rows for one vendor at a time in a separate transaction should be best.
sku
is unique per product, so we can PARTITION BY
it.
ORDER BY effective_date, id
: your first version of the question included code for duplicate rows, so I added id to ORDER BY
as additional tie breaker. This way it works for duplicates on (sku, effective_date)
as well.
To preserve the last row for each set: AND (lead(id) OVER w) IS NOT NULL
. Reusing the same window for lead()
is cheap - independent of the added explicit WINDOW
clause - that's just syntax shorthand for convenience.
I am locking rows in the same order: ORDER BY sku, effective_date, id
. Make sure that concurrent DELETEs operate in the same order to avoid deadlocks. If all other transactions delete no more than a single row within the same transaction, there cannot be deadlocks and you don't need the row locking at all.
If concurrent INSERTs could lead to a different result (make different rows obsolete), you have to lock the whole table in EXCLUSIVE mode instead to avoid race conditions:
LOCK TABLE vendor_prices IN EXCLUSIVE MODE;
Do that only if it's necessary. It blocks all concurrent write access.
I am returning the number of rows deleted, but that's totally optional. You might as well return nothing and declare the function as RETURNS void
.
Best Answer
To return a TABLE add the table definition:
db<>fiddle here
Have a look at this tutorial about it.