Do not use a list of ids. You will regret it. Use junction tables.
DDL:
create table organization (
organization_id bigserial primary key,
name text not null
);
create table individual (
individual_id bigserial primary key,
email_address varchar(255) not null
);
create table organization_member (
organization_id bigint references organization(organization_id),
individual_id bigint references individual(individual_id),
primary key (organization_id, individual_id)
);
create table email_message (
email_message_id bigserial primary key,
send_at datetime not null,
subject varchar(80),
body text
);
create table email_message_organization (
email_message_id bigint references email_message(email_message_id),
organization_id bigint references organization(organization_id),
primary key (email_message_id, organization_id)
);
DML:
-- add an org:
insert into organization (name) values ('Acme, Inc');
-- add an individual:
insert into individual (email_address) values ('user@example.com');
-- associate individual w org:
insert into organization_member values (1,1);
-- add an email message:
insert into email_message (send_at, subject) values ('2013-07-31 12:12:12', 'test');
-- associate an email message with an org:
insert into email_message_organization values (1, 1);
-- when it's time to send the email, get the right people:
select
e.subject,
e.body,
i.email_address
from individual i
join organization_member om using (individual_id)
join email_message_organization emo using (organization_id)
join email_message e using (email_message_id);
-- send the emails, record that they were sent in another table
You should use the Party model for representing individuals and organizations, which is not what I used, but you should still look it up.
If you have an index on LastLoggedIn
, deleting a few tens of thousands of records should be fast without needing to partition. I ran a small test on MS SQL:
CREATE TABLE Temp.Temp
(
UserID INT NOT NULL PRIMARY KEY,
LastLoggedIn DATETIME2(0) NOT NULL INDEX,
Dummy1 VARCHAR(30),
Dummy2 INT
)
I insert ~57 M dummy records, then selected a LastLoggedIn
date such that 35,000 records were at least that old. Deleting these records took less than a second. This is on a four-core machine with 128 GB of memory, under light load; your mileage may vary.
Of course, if you run this purge only every few weeks, you may have millions of records to delete rather than thousands (300 K records took seven seconds for me; 1.3 M records took 30 seconds). But in that scenario, it's part of your monthly maintenance window rather than daily upkeep, and a brief delay should be acceptable.
If you get poor performance with just an index, then sure, consider partitioning. As others have pointed out, there will be a significant cost to UPDATE
operations, a greater overhead than maintaining an index. You will have many more updates than purges, and the latter can happen during relatively idle periods, so I would worry about the updates more.
If you need always-on uptime, and the delete locks the table for an unacceptable length of time, that might justify using partitioning, but even there I'd look for alternatives. You could SELECT
a list of old users and then delete them one at a time (confirming their LastLoggedIn
dates first of course) with a cursor, for example.
I'm an MS SQL guy, so my apologies if any of this is incorrect for PostgreSQL.
Best Answer
This looks good, except I would combine the Ladder_Users table with the UserLadderStats table. There's no reason to have them separate and it just introduces the chance for data integrity problems. Just move all the stats onto the Ladder_Users, and then you can drop UserLadderStats.
Also, note that the WinnerUserID and LoserUserID in LadderMatch are foreign keys, even though they aren't explicitly marked as such.