Is it a good approach to design tables according to the queries I’ll perform on it

database-designperformance

Watching this video, and pretty new to dbms.
The speaker explains that in a row-oriented DB, rows are read in blocks.
So, my understanding is that if I have rows with fewer fields, more rows can fit into a single block and when I query the table it should take less IO operation, resulting in better performances.. Am I right?

Can I extract the rule that I shouldn't design tables according to the entity they represent but, instead, to the frequency I'll read or update that fields?

For example:
table employers:

  • ID
  • Name (frequently used)
  • Badge number (frequently used)
  • Birth date (rarely used)
  • Birthplace (rarely used)

    Should I split the table into 2?

  • tbl1: ID | Name | Badge number
  • tbl2: ID | Birth date | Birthplace

Best Answer

In most database management systems, data is is stored as pages, not blocks. Pages are normally 4 or 8 KB, depending on the database and how its been configured.

All else being equal, smaller row size will equate to better reuse of cached pages and less page reads on queries that require a large number of rows - so less I/O and faster read performance.

However

If you vertically partition the table (as you have in your example), there will be a slight increase to overall storage (equal to the primary key length and number of rows, plus the b-tree) and insert performance will be slightly slower as you'll need to maintain a PK-FK relationship between the two tables.

Furthermore, if most of your queries are for single-record lookups, you're still going to be reading a single page. There's a greater chance that page will be cached, but reading 4 or 8 KB off a modern disk is really not an expensive operation.

Splitting the table would require 2 page reads (and navigating two B-trees) when you require BirthDate/BirthPlace. Again, not really a big deal on modern hardware.

The only time I would vertically partition a table would be in certain data warehouse situations, or if BirthDate/BirthPlace were nullable and infrequently populated.

Other Considerations

If badge number is relatively small in size (say, under 20-30 bytes), the best thing you can do to increase performance would be to drop the unneeded ID column and make your primary key BadgeNumber since:

  1. You shouldn't have duplicates in that column
  2. Most likely you will primarily lookup on that column, so using BadgeNumber:
  • Saves you a column, making your table more compact
  • Removes the need for an index (and associated overhead) on BadgeNumber
  • Eliminates the need to join to your table to get the BadgeNumber when the table has a PK-FK relationship to another table.

There are other ways to reduce I/O and improve read performance. Most commercial DBMSs will support some form of data compression. This can fit more rows on a single page without any changes to the structure of the table, at the expense of some CPU overhead to compress/decompress the data as it is written/read. CPU is usually a cheaper operation than disk, so compression is usually a net benefit.