Sql-server – Splitting a table with large columns for performance

sql server

I am working on a system that stores images in its database (SQL Server 2014) using the varbinary datatype. The table looks like this (extra columns removed for brevity):

| ImageDataId | DateUploaded |
|   1         | 1-1-2001     |
|   2         | 1-1-2001     |
|   3         | 1-1-2001     |

There is also a seperate table for the image itself.

| ImageId | ImageFile    |
|   1     | [byte array] |    
|   2     | [byte array] |
|   3     | [byte array] |

The database was made with Entity Framework Code First, so the ImageId key in the second table is a foreign key reference to the Id in the first table.

I was told these tables were designed to improve performance. If I only need ImageData, but no image, querying the first table without any imagedata in it would be fast. I'm not an experienced DBA, so I may be missing something, but wouldn't a query be just as fast if I merge both tables:

| ImageDataId | DateUploaded | ImageFile   |
|   1         | 1-1-2001     | [byteArray] |
|   2         | 1-1-2001     | [byteArray] |
|   3         | 1-1-2001     | [byteArray] |

And then do a SELECT without the ImageFile column? Or is this query:

SELECT DateUploaded 
FROM Images

Somehow faster with split tables?

Best Answer

Generally this can make sense, BUT: you have first to make sure the data is in the table.

If you use large table fields (varbinary(max)) then the data is actually not stored in the table but in separate pages anyway. If the original table as as thin as you indicate (id, date upload, not a lot more fields) Then this is a fallacy, with one exception: EF will gladly load all the data on object access - which makes acess slow thanks to EF inefficiency.

You can avoid that by not using EF for accessing the file field. That does make a TON of sense because you may never want to load the whole file into memory anyway. You want to be able to access the file in a loop reading max x bytes - so you can stream them off.

Welcome to the limitations of EF and the even worse limitations of code first.