The syntax is just slightly different:
DELETE q
FROM dbo.questions_tags AS q
WHERE EXISTS
(
SELECT 1
FROM dbo.questions_tags AS q1
WHERE q1.ctid < q.ctid
AND q.question_id = q1.question_id
AND q.tag_id = q1.tag_id
);
Personally, I prefer to use a CTE. Then I can easily swap in a SELECT
to validate what I am about to delete, and easily change the where clause to validate what I am going to keep:
;WITH q AS
(
SELECT question_id, tag_id, ctid,
rn = ROW_NUMBER() OVER (PARTITION BY question_id, tag_id ORDER BY ctid)
FROM dbo.question_tags
)
--DELETE q
SELECT * FROM q
WHERE rn > 1; -- to show keepers, change to = 1
I believe these semantics match yours, but please test.
Then, of course, add a proper key constraint before you let anybody insert any new nonsense into this table.
The basic problem can be solved with various simple queries. Considering all columns:
CREATE TABLE tbl3 AS
TABLE tbl1
UNION TABLE tbl2;
Given this additional information:
All columns except the id
column should be considered for the unique check.
And:
I don't need to preserve the ID column.
Just drop the id
column, then you can proceed with the simple query above.
I would import to temporary tables (much faster, less overhead) and only write the final result (tbl3
) to a regular table - in one session because temporary tables are dropped automatically at the end of the session.
CREATE TEMP TABLE tbl1 ( <columns from above, without id> );
COPY tbl1 FROM '/path/to/file1';
CREATE TEMP TABLE tbl2 ( <columns from above, without id> );
COPY tbl2 FROM '/path/to/file2';
Alternatively, to preserve the input tables across sessions, you could use unlogged tables.
For best performance create and fill the target with CREATE TABLE AS
and add the PK constraint in the same transaction:
BEGIN;
CREATE SEQUENCE tbl3_tbl3_id_seq;
CREATE TABLE tbl3 AS
SELECT nextval('tbl3_tbl3_id_seq'::regclass)::int AS tbl3_id, *
FROM (TABLE tbl1 UNION TABLE tbl2 ) sub;
ALTER TABLE tbl3
ADD CONSTRAINT tbl3_pkey PRIMARY KEY(tbl3_id)
, ALTER COLUMN tbl3_id SET DEFAULT nextval('tbl3_tbl3_id_seq'::regclass);
ALTER SEQUENCE tbl3_tbl3_id_seq OWNED BY tbl3.tbl3_id;
COMMIT;
Replace all occurrences of "tbl3" with our desired table name.
Detailed explanation in this related answer:
I added a serial
column (tbl3_id
) as surrogate PK to the target table. Adding the actual PK constraint at the end (of the same session) is the fastest way.
Related:
Before you do it, test whether double precision
is the best data type for all those columns. Chances are, some of them could be integer
(cheaper for whole numbers) or must really be numeric
(loss-less). If so, adapt your temp tables to begin with.
Best Answer
This is what's known as an aggregate query,
Using
GROUP BY
like this returns one record for every unique occurrence of (cart, barcode) and the totals of counted_in and counted_out for each.