Postgresql – Joining based on date (or date range)

database-designjoin;postgresql

I'm not sure if I've completely fudged the design of my small hobby database (I'm not a DBA by any means), but I have a table like this (primary key is (staffid, effectivefrom)):

 staffid | target | effectivefrom
 --------|--------|---------------
  1      |  6.0   | 2012-01-01
  2      |  6.0   | 2012-01-01
  3      |  6.0   | 2012-01-01
  1      |  7.0   | 2012-03-01

So basically, three staff all start out with a target of 6.0, but on March, staff with ID 1 has a new target of 7.0. I want to maintain historical targets because it is relevant to other data in other tables.

I would like to have a user-defined function that takes a date as a parameter, and this function needs to join the above table with another table based on the date. Say the function is called with 1st of February as the date, I would like the result of the join to include the target column showing 6.0 for all staff.

Something like this (I think this won't work because there could be multiple rows before dateParameter):

 SELECT othertable.*, targets.target
     FROM othertable
     JOIN targets ON
         othertable.staffid = targets.staffid AND
         targets.effectivedate <= dateParameter;

Please let me know if I have done an absolute DBA 'no-no' or whether I just need some caffeine.

Best Answer

That is going to be horribly messy, since you'll need to find the single (staffid, effectivefrom) pair that's the newest prior to the dateParameter.

What I'd do is add another field effectiveTo to go with effectiveFrom, and then when you add a new target for a staff member, before you add it check and see if there are any current targets (effectiveTo IS NULL) and if there are then set their end date to the new target start date (optionally: throw an error if there's more than one current target).

So, the table turns out looking like:

 staffID | target | effectiveFrom | effectiveTo
 --------|--------|---------------|-------------
  1      |  6.0   | 2012-01-01    | 2012-03-01
  2      |  6.0   | 2012-01-01    | NULL
  3      |  6.0   | 2012-01-01    | NULL
  1      |  7.0   | 2012-03-01    | NULL

That makes your search query look like this:

SELECT  othertable.*, targets.target
FROM    othertable
            INNER JOIN 
        targets ON othertable.staffid = targets.staffid
WHERE dateParameter > effectiveFrom AND (dateParameter <= effectiveTo OR effectiveTo IS NULL)

The potential downside of that is data inconsistency (i.e. staff members having overlapping targets) - you can fairly easily write a script to check that though. I'd recommend making an addTarget(staffID, target, fromDate) stored procedure and wrapping all the checking/updating logic in that, also.

(A couple of style notes - I rewrote your query slightly to seperate the join condition from the where clause. Also, I know it's only an example, but SELECT table.* is a bad habit to get into - you land up transferring far more data than you need.)