Postgresql – Count where previous record exists in either of two tables

countpostgresql

I'm trying to COUNT records on one table where both the counted records and their "previous" records exist either in the same table or on another.

The table that I'm sampling has a composite PRIMARY KEY consisting of a bigint and character. The other table has the same structure.

A "previous" record has the same character and a bigint - 1 as the counted record either on the same table or on the other table.

In pseudo:

counting_table
   character_column
   bigint_column

other_table
   character_column
   bigint_column

counting_table will never have a parallel record on other_table because the parallel record on counting_table is deleted when its primary key equivalent is inserted into other_table.

I thought I could do this with subqueries, but I can't seem to figure out how to access the outer query's data from inside the subquery.

How can my intent be implemented optimally?

Best Answer

One way would be to LEFT JOIN both tables (counting_table itself as well as other_table), and count a COALESCE expression, which will be NULL (and therefore not counted) if neither table has a "previous" record:

SELECT count(COALESCE(o.bigint_col, c1.bigint_col)) AS ct
FROM   counting_table c
LEFT   JOIN other_table o ON o.bigint_col = c.bigint_col - 1
                         AND o.char_col   = c.char_col
LEFT   JOIN counting_table c1 ON c1.bigint_col = c.bigint_col - 1
                             AND c1.char_col   = c.char_col;

The other option would be with two OR'ed EXISTS semi-joins:

SELECT count(*) AS ct
FROM   counting_table c
WHERE EXISTS (
   SELECT 1 FROM other_table o
   WHERE  o.bigint_col = c.bigint_col - 1
   AND    o.char_col   = c.char_col
   )
OR    EXISTS (
   SELECT 1 FROM counting_table c1
   WHERE  c1.bigint_col = c.bigint_col - 1
   AND    c1.char_col   = c.char_col
   );

I prefer the second variant as it also works with multiple "previous" rows matching in one of both related tables - which does not seem to apply to your case, but it's still the safer bet. Also typically a bit faster.

Related Solutions

Ms-access – “No Current Record” when using COUNT() in Access 2010

Okay, well, since nobody else chimed in and I needed to get it done, I took one last shot at it. Not quite sure on the why of it all, but I used basically the entire query as a subquery in the FROM clause, and then ran the report against that, and that seemed to do it...

SELECT
  [VendorCusts].[VendorPaid],
  [VendorCusts].[DE?],
  [VendorCusts].[IC?],
  [VendorCusts].[AG?],
  [VendorCusts].[GB?],
  COUNT(VendorCustID) AS [CustCount]
FROM
  (
    SELECT DISTINCT
      [Plan Revenue Expense].[Check To] AS [VendorPaid],
      [Support Provider].[DE?],
      [Support Provider].[IC?],
      [Support Provider].[Agency?] AS [AG?],
      [Support Provider].[GeneralBus?] AS [GB?],
      Customer.CustID AS [VendorCustID],
      Customer.LName AS [VendorCustLName],
      Customer.FName AS [VendorCustFName]
    FROM
      (((Customer
      INNER JOIN Plan ON Customer.CustID=Plan.CustID)
      INNER JOIN [Plan Revenue] ON [Plan].[Plan ID]=[Plan Revenue].[PlanID])
      INNER JOIN [Plan Revenue Expense] ON [Plan Revenue].[Rev ID]=[Plan Revenue Expense].[RevID])
      LEFT JOIN [Support Provider] ON [Plan Revenue Expense].[SP]=[Support Provider].[ID]
    WHERE
      (
        (
          ([Plan Revenue Expense].[First Day]>=[Expense Start "MM/DD/YYYY"])
          AND
          ([Plan Revenue Expense].[First Day]<=[Expense End "MM/DD/YYYY"])
        )
        OR
        (
          ([Plan Revenue Expense].[Last Day]>=[Expense Start "MM/DD/YYYY"])
          AND
          ([Plan Revenue Expense].[Last Day]<=[Expense End "MM/DD/YYYY"])
        )
      )
      AND NOT
      (
        [Plan Revenue].[Service]='111' OR
        [Plan Revenue].[Service]='222' OR
        [Plan Revenue].[Service] LIKE '333*'
      )
      AND NOT Customer.[Inactive?]=TRUE
    ORDER BY
      [Plan Revenue Expense].[Check To],
      Customer.LName,
      Customer.FName
  ) AS [VendorCusts]
GROUP BY
  [VendorCusts].[VendorPaid],
  [VendorCusts].[DE?],
  [VendorCusts].[IC?],
  [VendorCusts].[AG?],
  [VendorCusts].[GB?]
HAVING
  NOT ([VendorCusts].[DE?]=TRUE OR [VendorCusts].[IC?]=TRUE)
ORDER BY
  [VendorCusts].[DE?] ASC,
  [VendorCusts].[IC?] ASC,
  [VendorCusts].[AG?] ASC,
  [VendorCusts].[GB?] ASC,
  [VendorCusts].[VendorPaid] ASC;

This is more or less what I finally came up with and it appears to get the job done. Checked a few data points and they seemed to agree with the slightly more verbose version w/o a Count() function. So, I think the results are good.

Then I abstracted out the start and end of the date range so I could run a few different versions. And added a few other parameters I needed...

Seems like it's working as intended, if a bit kludge-y. Sometimes it's better to be 'right' than 'pretty.' ;)

Sql-server – Non-sequential Primary Key performance in PostgreSQL 9.3

I have 3 solutions for you:

Direct reference of sequence and using concat

One possible solution is to reference the seqence in insert statement directly and prepend your node-id. A similar question including answer you can find here: https://stackoverflow.com/a/17925601/4206293

Using a UUID

Another possible solution is, if you don't need you node-id in the primary-key field, you can use the uuid-ossp extension which provides the type uuid and the functions to generate uuids: http://www.postgresql.org/docs/9.4/static/uuid-ossp.html

Use a trigger for these solutions

For both solutions: you can use a trigger to set the primary key.

Example:

-- table
CREATE TABLE test(
   id character varying(10) NOT NULL,
  "name" character varying,
   CONSTRAINT idx_pk PRIMARY KEY (id)
);

-- seqence
CREATE SEQUENCE test_seq
  INCREMENT 1
  MINVALUE 1
  MAXVALUE 9223372036854775807
  START 1
  CACHE 1;

-- Create Function
CREATE OR REPLACE FUNCTION insert_trigger() 
RETURNS TRIGGER    
AS $$
BEGIN
   NEW.id := '100' || nextval('test_seq')::TEXT;
   RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';

-- add Trigger
CREATE TRIGGER insert_table_trigger
BEFORE INSERT ON pkTable
FOR EACH ROW EXECUTE PROCEDURE insert_trigger();

If you now insert a value in the table

INSERT INTO test (name) VALUES ('test text');

you get your primary key '1001'

Using MINVALUE and MAXVALUE

Another way is you use the MINVALUE and MAXVALUE of SEQUENCE to define an numeric space:

CREATE SEQUENCE node100_seq
  INCREMENT 1
  MINVALUE 100000000000
  MAXVALUE 100999999999
  START 100000000000
  CACHE 1;

CREATE SEQUENCE node200_seq
  INCREMENT 1
  MINVALUE 200000000000
  MAXVALUE 200999999999
  START 200000000000
  CACHE 1;