Sql-server – Bad practice to return more columns than required

sql server

I unfortunately have quite a broad question regarding queries that return more columns than they need.

Simply put, is this bad practice and should it be avoided at all costs? We continuously have debates about this in our development team. Some argue that all queries should only return the exact data they need as this would reduce I/O etc, others argue that it is OK to have a query that returns a few extra columns such as int or nvarchar(50) as this reuses execution plans and makes development faster. We are not using any SELECT * statements anywhere so this is not an issue – it is purely where a few extra columns are explicitly specified.

I know this may trigger a debate rather than an answer but any insight would be helpful.

Best Answer

Some argue that all queries should only return the exact data they need as this would reduce I/O etc...

And they are correct!

...others argue that it is OK to have a query that returns a few extra columns such as int or nvarchar(50) as this reuses execution plans and makes development faster

These folks are also correct!

There are always tradeoffs with these things.

Imagine a web page where users can edit their profile, populated by "query 1" here:

SELECT Id, FullName, UserName, Phone 
FROM dbo.Users 
WHERE Id = @Id;

And then there's an admin page that uses "query 2" here:

SELECT Id, FullName, UserName, Phone, IsActive, AdminComments 
FROM dbo.Users 
WHERE Id = @Id;

Performance

For the situation where we're using both queries, we have (non-exhaustive list of pros and cons):

  • Con: two different plans taking up space in the plan cache
  • Con: developers had to write two different queries / stored procedures
  • Pro: the first query will only be doing the work it needs to (no extra I/O)
  • Pro: the two queries can be optimized differently
  • Pro: the two queries can be indexed for separately

If we just have query 2, the situation is basically inverted (only one plan in the cache, only one query / proc had to be written, but the query uses more I/O, etc).

Security

There's also an aspect of security: you might not want those "admin only" columns getting queried for non-admin pages, even if they are never presented to the user.

Maintainability

Sharing the same query for multiple different pages that don't need all that data can also lead to maintainability problems down the road. What if query 2 suddenly needs to join to other tables, in order to pull in additional information about user activity that's not available to regular users?

Now the normal user page is being impacted by the admin query. And developers at that time might not realize they need to go back and check on the regular user page.

What if some enterprising developer removes columns from query 2, because it's decided admins don't care about them? Now the regular user page is actually broken.

Takeaway

Re-using queries can be a valid time-saving strategy, but it requires good change management and testing later to make sure that you don't break anything when these re-used queries are modified. It's also possible various factors will cause the queries to be split up later (performance, feature changes). You'll have to keep that in mind to determine if the up front time saving is worth the cost later.

Generally speaking, I think it's better to have queries that are specific to each use-case. You'll notice that this results in lots of stored procedures / queries, which has it's own maintenance overhead. There are some things you can do to help with that though (keep the procedures small where possible, include a comment in the procedure that indicates what app function / page uses that query, name the procedures very clearly).