Postgresql – Postgres partition tables 1 million rows

postgresql

I have a 1 million rows in one of my table and expect it grow to 3-4M in next 1 or 2 years.

There is lot of averages and sum are run on the date column for this row.

I'm planning to partition the tables to make the query run faster..

Questions:

Is it ok to partition table that just has 1 million rows.
I have 10 columns in the table but there are 2 columns that are text which stores around 2000 bytes of data.
How many rows Postgres can handle without partitions.

Best Answer

Your database is small and doesn't require partitioning.

A quick guesstimate gives storage requirements: two 2,000 byte columns times 1M rows is 4GB; times 3-4M rows is 12-16GB. A proper calculation would include a fudge factor for the other columns, indices, and other overheads, but it's still obviously an amount that fits in RAM on anything but the most crusty of servers.

So to answer your questions:

You can partition a 1M row table, but it's not worth the effort with PostgreSQL. (PostgreSQL is not MySQL.)
This is not a question!
PostgreSQL's "about" page states there is no limit on the number of rows, but a table cannot exceed 32TB.

Related Solutions

Postgresql – Postgres not using the index even when rows returned is 5% of the table

From your query plans, it looks like you're comparing ints to ints in the first query plan, and int to numeric in the second plan.

Your first compare:

Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))

and

timestamp >= 1431100800 and timestamp <= 1431108000

In the second query, it's numeric values:

Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))

and

timestamp >= 1431093600.00 and timestamp <= 1431100800.00

Casting to numeric causes the index to be ignored in favor of a sequential scan.

You can see this with a very simple example, set up below:

CREATE TABLE t2 (a int);
CREATE INDEX t2_a_idx ON t2(a);
INSERT INTO t2 (a) SELECT i FROM generate_series(1,1000000) AS i;
VACUUM ANALYZE VERBOSE t2;

My first query plan looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000;

Index Only Scan using t2_a_idx on t2 (cost=0.42..7134.65 rows=250413 width=4) 
(actual time=0.019..29.926 rows=250000 loops=1)
Index Cond: (a > 750000)
Heap Fetches: 0
Planning time: 0.137 ms
Execution time: 39.114 ms
(5 rows)
Time: 39.540 ms

While a second query using numerics looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000.00;

Seq Scan on t2  (cost=0.00..19425.00 rows=333333 width=4) (actual time=122.803..175.326 rows=250000 loops=1)
Filter: ((a)::numeric > 750000.00)
Rows Removed by Filter: 750000
Planning time: 0.058 ms
Execution time: 184.194 ms
(5 rows)
Time: 184.487 ms

In the second instance here, the index is ignored in favor of a sequential scan because of the cast to a numeric value, which looks like exactly what's happening in your two examples.

One last aside, you might be able to speed your query up via a SET query before executing it as well:

SET work_mem = 2GB;

If your server can handle it, because your sorts are spilling to disk, as noted in this line from your query plan:

Sort Method: external merge  Disk: 1387704kB

Hope this helps. =)

Postgresql – 300 inserts per minute, 6 million rows per month

There are basically two ways to answer your question: speculation and simulation.

For the former you still have not provided much useful information, such as:

What is your current workload and how does your server behave today?
What are your current data volumes?
What are you planning to do with your 6 million new rows a month? Keep them indefinitely? Archive them after 3 months?

As for simulation, it would likely take you only a day or two to clone your existing database, creating a test system; download and install Apache JMeter; write a simple test plan generating your 300 inserts/second plus your queries; and run it to see how the system actually behaves under your projected workload. This will let you (or a temporary consultant, as suggested by others) to identify performance bottlenecks and determine whether they can be resolved by throwing additional hardware at them.

Best Answer

Related Solutions

Postgresql – Postgres not using the index even when rows returned is 5% of the table

Postgresql – 300 inserts per minute, 6 million rows per month

Related Question