If you have an index on LastLoggedIn
, deleting a few tens of thousands of records should be fast without needing to partition. I ran a small test on MS SQL:
CREATE TABLE Temp.Temp
(
UserID INT NOT NULL PRIMARY KEY,
LastLoggedIn DATETIME2(0) NOT NULL INDEX,
Dummy1 VARCHAR(30),
Dummy2 INT
)
I insert ~57 M dummy records, then selected a LastLoggedIn
date such that 35,000 records were at least that old. Deleting these records took less than a second. This is on a four-core machine with 128 GB of memory, under light load; your mileage may vary.
Of course, if you run this purge only every few weeks, you may have millions of records to delete rather than thousands (300 K records took seven seconds for me; 1.3 M records took 30 seconds). But in that scenario, it's part of your monthly maintenance window rather than daily upkeep, and a brief delay should be acceptable.
If you get poor performance with just an index, then sure, consider partitioning. As others have pointed out, there will be a significant cost to UPDATE
operations, a greater overhead than maintaining an index. You will have many more updates than purges, and the latter can happen during relatively idle periods, so I would worry about the updates more.
If you need always-on uptime, and the delete locks the table for an unacceptable length of time, that might justify using partitioning, but even there I'd look for alternatives. You could SELECT
a list of old users and then delete them one at a time (confirming their LastLoggedIn
dates first of course) with a cursor, for example.
I'm an MS SQL guy, so my apologies if any of this is incorrect for PostgreSQL.
--=======================================================
-- delete the duplicate records from table @t
-- keeping a single unit of each
-- marcelo miorelli 24-nov-2014
--=======================================================
--=======================================================
--create a table variable and insert records in it
-- just for this example
--=======================================================
declare @t table ([num] int, com varbinary(512) )
insert into @t select 1, convert(varbinary(512), '6778981' )
insert into @t select 1, convert(varbinary(512), '6778981' )
insert into @t select 1, convert(varbinary(512), '6778982' )
insert into @t select 2, convert(varbinary(512), '6778982' )
insert into @t select 2, convert(varbinary(512), '6778982' )
insert into @t select 3, convert(varbinary(512), '6778982' )
insert into @t select 4, convert(varbinary(512), '6778982' )
insert into @t select 4, convert(varbinary(512), '6778982' )
insert into @t select 4, convert(varbinary(512), '6778982' )
insert into @t select 4, convert(varbinary(512), '6778982' )
insert into @t select 4, convert(varbinary(512), '6778983' )
insert into @t select 5, convert(varbinary(512), '6778983' )
insert into @t select 5, convert(varbinary(512), '6778983' )
insert into @t select 5, convert(varbinary(512), '6778983' )
insert into @t select 5, convert(varbinary(512), '6778983' )
insert into @t select 5, convert(varbinary(512), '6778984' )
insert into @t select 6, convert(varbinary(512), '6778984' )
insert into @t select 6, convert(varbinary(512), '6778984' )
insert into @t select 6, convert(varbinary(512), '6778985' )
insert into @t select 7, convert(varbinary(512), '6778985' )
insert into @t select 7, convert(varbinary(512), '6778985' )
insert into @t select 8, convert(varbinary(512), '6778985' )
insert into @t select 8, convert(varbinary(512), '6778985' )
insert into @t select 9, convert(varbinary(512), '6778985' )
insert into @t select 9, convert(varbinary(512), '6778986' )
insert into @t select 9, convert(varbinary(512), '6778986' )
insert into @t select 9, convert(varbinary(512), '6778986' )
insert into @t select 9, convert(varbinary(512), '6778986' )
--=======================================================
-- this is going to show the table before the deletes
--=======================================================
select * from @t
--=======================================================
-- this deletes the duplicates - leaving only 1 of each
--=======================================================
; with cte as (
select *
, row_number() over (partition by [num],com order by [num],com ) as Picker
from @t
)
delete cte
where Picker > 1
--=======================================================
-- this shows the table after the deletion - without the deletes
--=======================================================
select * from @t
Best Answer
First off,
partition by A,B,C order by A,B,C
makes no sense. Since you intent to keep the "last" row (meaning the one with the greatestID
), you probably meant:Still, the syntax is invalid on top of this. This subquery expression contains a reference to a column of the outer query:
(select max(t.rn) from t)
. The scope of the subquery does not include columns from the outer query sorn
is not visible in there. Only columns of the tablet
are.You could use a CTE to allow the reference and make the syntax valid:
Still, the query is dangerous nonsense. Do not use this!
Comparing to the greatest row number is logical nonsense as each group of peers might have a different number of dupes. Would delete a lot more than it should.
Simpler, and correct:
Which, in turn, can be had more cheaply as (assuming all columns
NOT NULL
!):"... where a dupe exists with a greater ID".
Related: