I have a very large (>10m rows) table in a Postgres 9.5.4 database that collects readings from sensors. This table handles many inserts (~2k/minute) but isn't queried too often–only during a cache miss or when rebuilding the cache. Said table is essentially ‘append-only’–there are no updates. However, the queries that I do run are often fairly complex, and I care about optimizing read performance as cache misses can be very slow to a user.
In the business context of relevance, there are sensors that collect energy and power quality data from buildings; there can be multiple sensors for buildings, and a common query is aggregating facility level data. The table schema looks like so:
Table "public.measurement"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('measurement_id_seq'::regclass)
meter_id | integer |
timestamp | timestamp without time zone |
value | double precision |
_reading_type | smallint |
facility_id | integer |
Indexes:
"measurement_pkey" PRIMARY KEY, btree (id)
"ix_measurement_timestamp_meter_id__reading_type" btree ("timestamp", meter_id, _reading_type)
"ix_measurement_timestamp_facility_id__reading_type" btree ("timestamp", facility_id, _reading_type)
Foreign-key constraints:
"measurement_meter_id_fkey" FOREIGN KEY (meter_id) REFERENCES meter(id)
"measurement_facility_id_fkey" FOREIGN KEY (facility_id) REFERENCES facility(id)
I am considering dropping the id
column, which is a meaningless ‘surrogate key’, as I never, ever query it, but I'm curious what the performance implications would be.
I essentially run two different types of queries on this table:
- "Give me all of the
measurements
betweentime start
andtime end
of_reading_type
0 formeter_id
X", or - "Give me all of the
measurements
betweentime start
andtime end
of_reading_type
0 forfacility_id
X" (I believe thefacility_id
column denormalizes the table, but saves what was an expensive join).
Questions
-
What would happen if I do not declare a primary ke?
-
The combination of
(timestamp, meter_id, _reading_type)
is guaranteed to be unique, so: what would be the implication of dropping the multicolumn index on these columns and defining a composite primary key?
Best Answer
You don't need a surrogate key, but you can still use a
PRIMARY KEY
. It sounds to me like the Primary Key here isAlso you have two of the same keys, drop one of them
That's got to be a typo error, but you're just slowing down inserts. At 10 million rows, you may also want to look at partitioning in the upcoming 9.7 or inheritance if you can't upgrade.
You're also not likely doing 2,000
INSERT
s a second. That would be 7.2 MILLION an hour. Unless your table gets truncated every 1.5 hours.