Do tables with only fixed width values perform read queries better
than those with varying widths?
Basically no. There are very minor costs when accessing columns, but you won't be able to measure any difference. Details:
In particular:
The use of varchar(255)
in a table definition typically indicates a lack of understanding of the Postgres type system. The architect behind it is most probably not a native speaker - or the layout has been carried over from another RDBMS like SQL Server where this used to matter.
- Your most expensive query
SELECT COUNT(*) FROM articles
does not even consider row data at all, only the total size matters indirectly. Counting all rows is costly in Postgres due to its MVCC model. Maybe an estimate is good enough, which can be had very cheaply?
- Fast way to discover the row count of a table
(Pretend disk space isn't an issue.)
Disk space is always an issue, even if you have plenty. The size on disk (number of data pages that have to be read / processed / written) is one of the most important factors for performance.
Where can I learn more about the internals of the Postgres DB engine?
The info page for the tag postgres has the most important links to more information, including books, the Postgres Wiki and the excellent manual. The latter is my personal favorite.
Your third query has issues
SELECT * FROM articles WHERE user_id = $1 ORDER BY published_date DESC LIMIT 1;
ORDER BY published_date DESC
, but published_date
can be NULL (no NOT NULL
constraint). That's a loaded foot-gun if there can be NULL values, unless you prefer NULL values over the latest actual published_date
.
Either add a NOT NULL
constraint. Always do that for columns that can't be NULL.
Or make that ORDER BY published_date DESC
NULLS LAST
and adapt the index accordingly.
"articles_user_id_published_date_idx" btree (user_id, published_date DESC NULLS LAST)
Details in this recent, related answer:
Convert published_date
to an actual date
While 'published_date' is always rounded
, it's effectively just a date
which occupies 4 bytes instead of 8 for the timestamp
. You would best move that up in the table definition to come before the two timestamp
columns, so you don't lose the 4 bytes to padding:
...
body | text
published_date | date -- <---- here
created_at | timestamp without time zone
updated_at | timestamp without time zone
Smaller on-disk storage does make a difference for performance.
More importantly, your index on (user_id, published_date)
would now just occupy 32 bytes per index entry instead of 40, because 2x4 bytes do not incur extra padding. And that would make a noticeable difference for performance.
Aside: this index is not relevant to the demonstrated queries. Delete unless indexes unless used elsewhere:
"index_articles_on_published_date" btree (published_date)
You can use the +
operator.
SELECT pk,ev_date,ev FROM events;
pk | ev_date | ev
----+------------+----------
1 | 2016-02-19 | 01:00:00
2 | 2016-02-19 | 02:00:00
3 | 2016-02-19 | 05:00:00
4 | 2016-02-19 | 12:00:00
5 | 2016-02-19 | 18:00:00
6 | 2016-02-19 | 23:00:00
7 | 2016-02-20 | 01:00:00
8 | 2016-02-20 | 05:00:00
9 | 2016-02-20 | 12:00:00
10 | 2016-02-20 | 18:00:00
(10 rows)
SELECT pk, ev_date, ev
FROM events
WHERE (ev_date + ev)
BETWEEN ('2016-02-19 04:00:00')
AND ('2016-02-20 02:00:00');
pk | ev_date | ev
----+------------+----------
3 | 2016-02-19 | 05:00:00
4 | 2016-02-19 | 12:00:00
5 | 2016-02-19 | 18:00:00
6 | 2016-02-19 | 23:00:00
7 | 2016-02-20 | 01:00:00
(5 rows)
Don't forget to create the index below:
CREATE INDEX events_ts_idx ON events ((ev_date + ev));
ANALYZE events;
I've inserted many dummy rows, so I show the result of EXPLAIN:
EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events WHERE (ev_date + ev)
BETWEEN ('2016-02-19 23:50:00')
AND ('2016-02-20 00:01:00');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using events_ts_idx on events (cost=0.29..8.52 rows=8 width=16) (actual time=0.014..0.029 rows=42 loops=1)
Index Cond: (((ev_date + ev) >= '2016-02-19 23:50:00'::timestamp without time zone) AND ((ev_date + ev) <= '2016-02-20 00:01:00'::timestamp without time zone))
Planning time: 0.082 ms
Execution time: 0.053 ms
(4 rows)
For comparison, I've created other index and tried other form:
CREATE INDEX events_ts2_idx ON events (ev_date,ev);
ANALYZE events;
EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events WHERE (ev_date,ev)
BETWEEN ('2016-02-19','23:50:00')
AND ('2016-02-20','0:01:00');
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on events (cost=189.50..511.36 rows=7143 width=16) (actual time=0.027..0.042 rows=42 loops=1)
Recheck Cond: ((ROW(ev_date, ev) >=ROW('2016-02-19'::date,'23:50:00'::time without time zone)) AND (ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
Heap Blocks: exact=7
-> Bitmap Index Scan on events_ts2_idx (cost=0.00..187.72rows=7143 width=0) (actual time=0.019..0.019 rows=42 loops=1)
Index Cond: ((ROW(ev_date, ev) >= ROW('2016-02-19'::date,'23:50:00'::time without time zone))AND(ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
Planning time: 0.079 ms
Execution time: 0.071 ms
(7 rows)
According to my investigation, my way (using +
operator) is better. I recommend to compare with both ways on your machine.
Best Answer
To construct the interval, multiply the number with the
1 second
interval:duration * interval '1 second'
or:
Postgres docs have a page about Date/Time Functions and Operators, where there are similar examples: