Sql-server – the LOB storage space

sql server

I have lot's of large objects ((n)test, varchar, image types) in my SQL-Server tables. I read the following article about sp_tableoption "In-Row Data" and "large value types out of row" but did not understand those tons of text.

It's explained that large object data types can depending of the settings be stored in the table or outside anywhere. It's called the "LOB storage space". Where is this space? What is the benefit of those settings? Is this suitable to raise read-performance?

Best Answer

In most cases, leave the default option (0). Assuming you have default options, layman's version of the article you referenced:

  • Remember that SQL Server has a fixed page size of ~8K which means you cannot physically store more than ~8k per column in-row (it's possible to have bigger than 8K total rowsize but that's a different topic)
  • BLOB types support up to 2GB but actual usage is usually much lower, a recent investigation into found over 90% of databases we probed (using scripts) to have BLOB cells smaller than 2K.
  • If your total rowsize including BLOB data is less than the 8k max, SQL Server will try to store the whole thing (BLOB and other data) in the same row as the rest of your data for that row (I think this was introduced in 2008 or perhaps 2005, can't remember for sure)
  • If your total rowsize including BLOB data is greater than 8K, SQL Server will store the BLOB data in a separate location from the other data in your row. This other location is generically referred to as LOB storage space. This is highly simplified but just grok the concept for now.
  • If the BLOB is less than ~40K and you have space in the row for a 72b pointer string, SQL Server will store the string of pointers for the BLOB (sitting somewhere else) in the same row as the rest of the row data. This saves a bunch of things during insert and future access.
  • If the BLOB is more than 40K, SQL Server needs to store more pointers for the BLOB so instead of consuming lots of space in the row, it stores a shorter (24b) pointer string that points to a new page. That new pages stores as many pointers as needed for the BLOB. Don't know exact reason for picking 5 pointers specifically (for the 5x ~8K pages= ~40K) but the goal is to balance optimizing perf for the 80-90% of BLOB sizes and row space usage so there's a "reasonable" number of rows per page.
  • When accessing data, less movement is good, fewer operations is good, opposites are bad. SQL Server optimizes for this. For example:
    • if your BLOBs are small, you can read the entire row or multiple rows from a single request (however many rows fit into an 8K page, and however many pages you need in a 64K extent)
    • If your BLOB is bigger than the space available in-row but less than ~40K and the row has room for the pointer string, SQL Server can get all the pointers for the BLOB directly from the row and retrieve the BLOB with them.
    • If your BLOB is greater than ~40K or the row does not have room for the pointer string then SQL Server retrieves a pointer to the root node of a pointer tree on a different page, read that, then retrieve the actual BLOB chunks.

If that made sense then check out a really nice write-up on BLOB storage http://sqlmag.com/t-sql/varbinarymax-tames-blob