Postgresql – Determine percentage from count() without cast issues

castcountdatatypespostgresql

I'm trying to run the following query to provide the % of rows in my patients table that have a value the refinst column. I keep getting a result of 0.

select (count (refinst) / (select count(*) from patients) * 100) as "Formula" 
from patients;

The table has 15556 rows, and select count(refinst) from patients tells me that 1446 of those have a value in the refinst column. The response I'd like to get from the query would be 30.62 (1446/15556*100=30.62XXXXX, rounded to two decimals).

I'm pretty sure it has something to do with the data type of the count results (integers I'm assuming). If I divide an integer by an integer and the result is less than 0 it is truncated to 0 correct? If that's the case, can someone show me how to cast the results of the counts as a number with 2 decimal places so that the result will be rounded to 2 decimal places as well?

I'm sure there's a better way to write this code than multiple count statements. I am looking for a more processor-efficient way to write this query in particular.

Best Answer

SELECT (count(refinst) * 100)::numeric / NULLIF(count(*), 0) AS refinst_pct
    -- count(refinst) * 100.0 / NULLIF(count(*), 0) AS refinst_pct  -- simpler
FROM   patients;

Do not use a subselect. Both aggregates can be derived from the same query. Cheaper.
Also, this is not a case for window functions, since you want to compute a single result, and not one result per row.
Cast to any numeric type that supports fractional digits, like @a_horse already explained.
Since you want to round() to two fractional digits I suggest numeric (which is the same as decimal in Postgres).
It's enough to cast one value involved in a calculation, preferably the first. Postgres automatically settles for the type that does not lose information.
Or, simpler yet: since we multiply anyway, use a numeric constant that's coerced to numeric automatically because of the decimal point (100.0).
It's generally a good idea to multiply before you divide. This typically minimizes rounding errors and is cheaper.
In this case, the first multiplication (count(refinst) * 100) can be computed with cheap and exact integer arithmetic. Only then we cast to numeric and divide by the next integer (which we do not cast additionally).
NULLIF(count(*), 0) prevents division by zero (raising an exception). We get NULL as (unknown) percentage if there are no rows at all.

Rounded to two fractional digits:

SELECT round((count(refinst) * 100)::numeric / NULLIF(count(*), 0), 2) AS refinst_pct
FROM   patients;

Related Solutions

Postgresql – Understanding time format of the EXPLAIN command – Postgres

actual time=8163.890..8163.893 means

Initializing that step ("startup") took 8163.890ms
Running the whole step took 8163.893ms

So in that case nearly the complete work was done in the startup phase of that step.

Edit:
The same logic is "applied" to the cost information

cost=2928781.21..2929243.02 means:

The cost to initialize this step was estimated at: 2928781.21
The cost to perform the step was estimated at: 2929243.02

(note that "cost" does not have a unit - it's an arbitrary value)

This is also explained here: http://www.postgresql.org/docs/current/static/using-explain.html

Postgresql – Looking for a simpler alternative to a recursive query

This was hard! I don't know if this is simpler, but at least it doesn't use window function nor produce rows that require being filtered out.

with recursive r(k, n) as (
    with t(k) as (values (1),(2),(3),(4),(5))   -- the data we want to filter
    -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13))
    -- with t(k) as (values (6),(8),(11),(16),(20),(23))
    -- with t(k) as (values (6),(8),(12),(16),(20),(23))
         ,t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t) -- precalculate what's next
    select * from (select * from t2 limit 1) x   -- limit 1 directly fails in a union!
    UNION ALL
    select t2.* from r, t2 where t2.k = r.n      -- on each iteration, keep only the value that matches the previous precalculated next one
)
select k from r

Testing

This alternative seems to be less efficient for very small sets, but more or less linear in performance, whilst the original seems to be exponentially more sluggish.

drop table if exists t;
create temp table t(k) AS
with recursive r(n) as (
  select floor(random()*10)::int + 1
  UNION ALL
  select n + floor(random()*10)::int + 1
  from r
  where n < 100000)        -- change to increase or reduce set
select * from r;           -- surprisingly fast! Go PG!
create index on t(k);

with recursive r(n, pri) as (
    select min(k), 1::bigint from t
    UNION
    select k, (rank() over(order by k)) rr
    from r, t 
    where t.k >= r.n+5 and r.pri = 1
)
select count(*) from r where pri = 1; -- I aborted it after waiting for a minute

with recursive r(k, n) as (
    with t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t)
    select * from (select * from t2 limit 1) x
    UNION ALL
    select t2.* from r, t2 where t2.k = r.n
)
select count(*) from r -- 26" in my server

Best Answer

Related Solutions

Postgresql – Understanding time format of the EXPLAIN command – Postgres

Postgresql – Looking for a simpler alternative to a recursive query

Related Question