Postgresql – Get A Count From Table

postgresql

I want to see all fsid limit 25 from my table — I tried this query

Select * 
from btmt 
WHERE Count(fsid) > 1
limit 24

But this throws an error of

Aggregates are not allowed in WHERE clause.

What do I need to alter in order to run this query?

Best Answer

you can use something like...

SELECT id, Count(fsid) As cnt FROM btmt GROUP BY 1

...as a subquery (substitute name of your id field for "id") and wrap it up like this...

SELECT b.* FROM btmt b,
(SELECT id, Count(fsid) As cnt FROM btmt GROUP BY 1) As z
WHERE b.id = z.id AND z.cnt > 1
LIMIT 24;

Hard to tell exactly what you're looking for, but that will get you 24 unordered results from the btmt table with fsid value of greater than 1, although it might not be the prettiest way to do so, computationally speaking.

Please clarify if that's not what you're after--hope it helps if so.

EDIT: if you're looking for fsid values 1-25 it would be more like...

SELECT b.*, z.cnt FROM btmt b,
(SELECT id, Count(fsid) As cnt FROM btmt GROUP BY 1) As z
WHERE b.id = z.id AND z.cnt BETWEEN 1 AND 25
ORDER BY z.cnt; --through that in just for the fun of it :-)

Related Solutions

Postgresql – How to create an Sql table from SELECT with a multiple row DISTINCT constraint

distinct is NOT a function. It always operates on all columns of the result.

The difference between select distinct (foo), bar and select distinct foo, bar is the same as the difference between select (foo), bar and select foo, bar. The parenthesis is just "noise".

When you write select (foo,bar) you are actually creating an anonymous record type in Postgres which results in a single column that has two attributes - which is not what you actually want.

As you are using Postgres, you can use the (proprietary) operator DISTINCT ON which - in contrast to the standard DISTINCT - does operate on a sub-set of the columns.

You have to specify an ORDER BY in that case to define which of the rows to take if there is more than one with the same combination of (field1, field2).

CREATE TABLE new_name 
AS 
SELECT DISTINCT ON (table.field1, table.field2), 
       table.field1, 
       table.field2, 
       .....
FROM ...
WHERE ...
ORDER BY ..

If you want to stick to ANSI SQL you will need a window function for this:

create table new_name
as
select column1, column2, column3, column4, column5, column6
from (
   select column1, column2, column3, column4, column5, column6, 
          row_number() over (partition by column1, column2 order by ...) as rn 
   from the_table
   where ...
) t
where rn = 1;

For a large table, DISTINCT ON is probably faster than the solution with the window function.

PostgreSQL Count – Count Approximation with Where Clause

How many unique values do you have for crawled_name? How feasible it is that you'll hit an apartment_id value greater than 2 billion (in the next 5-10 years)?

Note that COUNT(id) is the same as COUNT(*) (because id is not nullable), and that COUNT(apartment_id) > 10000 is effectively COUNT(*) > 10000 along with WHERE apartment_id IS NOT NULL. What you're doing not only complicates understanding the query, but may also cause PostgreSQL to select a sub-optimal strategy. By using COUNT(*) instead of COUNT(id), PostgreSQL can compute the whole thing on an index that does not include id, and without the need for reading the table at all, so it should process much faster.

Let's assume you currently have apartment_id values in the low millions, and it is unlikely you'll hit the value 2 billion in the near future. You could speed it up by creating an index and adjusting your query as follows:

CREATE INDEX ON facility (crawled_name, (apartment_id)::int)
  WHERE NULLIF(facility_characteristic, '') IS NULL
        AND apartment_id IS NOT NULL AND crawled_name IS NOT NULL;

SELECT crawled_name, COUNT(*)
FROM facility
WHERE NULLIF(facility_characteristic, '') IS NULL
      AND apartment_id IS NOT NULL AND crawled_name IS NOT NULL
GROUP BY crawled_name, apartment_id::int
HAVING COUNT(*) > 10000
LIMIT 10;

You could significantly speed it up further if you have a fixed list of crawled_name values that range in the dozens (or low hundreds), by turning the type of the column into an enum. You'll have to drop indexes that use this column prior:

DO $do$ DECLARE _l text[]; BEGIN
  _l := ARRAY(SELECT DISTINCT crawled_name FROM facility WHERE crawled_name > '' ORDER BY 1);
  EXECUTE $$CREATE TYPE crawled_name_enum AS enum ('$$ ||
            array_to_string(_l, $$','$$) || $$');$$;

  ALTER TABLE facility ALTER COLUMN crawled_name
    TYPE crawled_name_enum USING (crawled_name::crawled_name_enum);
END; $do$;

Best Answer

Related Solutions

Postgresql – How to create an Sql table from SELECT with a multiple row DISTINCT constraint

PostgreSQL Count – Count Approximation with Where Clause

Related Question