I see the term LOB used often when describing data types in columns of tables. What is it an acronym of, and why is this data object important?
SQL Server – What is a LOB (Large Object)?
oraclesql server
Related Solutions
In most cases, leave the default option (0). Assuming you have default options, layman's version of the article you referenced:
- Remember that SQL Server has a fixed page size of ~8K which means you cannot physically store more than ~8k per column in-row (it's possible to have bigger than 8K total rowsize but that's a different topic)
- BLOB types support up to 2GB but actual usage is usually much lower, a recent investigation into found over 90% of databases we probed (using scripts) to have BLOB cells smaller than 2K.
- If your total rowsize including BLOB data is less than the 8k max, SQL Server will try to store the whole thing (BLOB and other data) in the same row as the rest of your data for that row (I think this was introduced in 2008 or perhaps 2005, can't remember for sure)
- If your total rowsize including BLOB data is greater than 8K, SQL Server will store the BLOB data in a separate location from the other data in your row. This other location is generically referred to as LOB storage space. This is highly simplified but just grok the concept for now.
- If the BLOB is less than ~40K and you have space in the row for a 72b pointer string, SQL Server will store the string of pointers for the BLOB (sitting somewhere else) in the same row as the rest of the row data. This saves a bunch of things during insert and future access.
- If the BLOB is more than 40K, SQL Server needs to store more pointers for the BLOB so instead of consuming lots of space in the row, it stores a shorter (24b) pointer string that points to a new page. That new pages stores as many pointers as needed for the BLOB. Don't know exact reason for picking 5 pointers specifically (for the 5x ~8K pages= ~40K) but the goal is to balance optimizing perf for the 80-90% of BLOB sizes and row space usage so there's a "reasonable" number of rows per page.
- When accessing data, less movement is good, fewer operations is good, opposites are bad. SQL Server optimizes for this. For example:
- if your BLOBs are small, you can read the entire row or multiple rows from a single request (however many rows fit into an 8K page, and however many pages you need in a 64K extent)
- If your BLOB is bigger than the space available in-row but less than ~40K and the row has room for the pointer string, SQL Server can get all the pointers for the BLOB directly from the row and retrieve the BLOB with them.
- If your BLOB is greater than ~40K or the row does not have room for the pointer string then SQL Server retrieves a pointer to the root node of a pointer tree on a different page, read that, then retrieve the actual BLOB chunks.
If that made sense then check out a really nice write-up on BLOB storage http://sqlmag.com/t-sql/varbinarymax-tames-blob
...once they are no longer in cache, both selecting and deleting the same 12 thousand rows takes ~40 seconds.
This seems to indicate that the storage subsystem is inadequate. If this is the cause, SQL Server will probably be waiting with one of the PAGEIOLATCH_XX
wait types.
It definitely appears that the LOB data must be loaded into memory on the server in order to be deleted.
As Paul Randal is reported to have said (update 2, in the question you linked to), SQL Server must traverse the LOB tree in order to locate the pages that must be removed. To do this, SQL Server needs to bring these LOB tree pages into memory. There really is no way around this; the information is just not available in any other way.
It seems like this deallocation should happen in a clean-up task, and not up-front.
The way this works is not very well documented.
In my tests, SQL Server deletes the LOB data immediately if there is no special reason to keep the row around. This seems to be an optimization: after all, if we are already touching the LOB data, we might as well delete it while we are there. Deferring it to ghost cleanup (GC) would only add overhead, since GC would also need to traverse the chain. In addition, logging is reduced, since SQL Server only has to record the deallocation, not the complete LOB content.
The LOB delete operation is deferred to GC if (for example) there is a trigger on the table, or if a row-versioning isolation level is enabled. Both these features use row versions, which must be maintained on the LOB rows. In these cases, the LOB delete is fully logged. When GC runs later on, it performs the deallocation of the LOB pages. GC can be temporarily disabled with (documented) global trace flag 661, to see these ghosted LOB records if you want to test it yourself.
Given all that, it is hard to see how (fully) logging the delete and deferring a repeat of much of the same work already done to the GC would help you. It is likely faster to log less and remove the LOB data during the delete.
There's not enough in the question to speculate much about the precise cause of your poor delete performance, beyond the possibility of an overwhelmed/underspecified storage system, but I can say that in general it is best to:
- Avoid accessing LOB data under read uncommitted isolation
- Avoid having triggers on tables with LOB data when performing deletes
- Be aware of the side-effects of row versioning
- Ensure the storage subsystem is adequate for logging and general I/O
- Run the latest build of SQL Server (currently 11.00.6523 for SQL Server 2012)
For more information about point 1, see Performance bug: NOLOCK scans involving off-row LOB data by Paul Randal. For more on points 2 and 3, see Deletes that Split Pages and Forwarded Ghosts (by me).
It could equally be that the simplified example misses a crucial detail, or the table variables produce an inefficient execution plan, or your table has billions of ghost LOB records already. A detailed analysis would probably require access to the system itself.
Best Answer
“LOB” is short for “large object”.
In relational databases, data are typically stored in blocks of a fixed size to allow efficient interaction with storage.
Now if a value comes close to the size of a block or exceeds it, there is an obvious problem storing it in a table column. Such values are typically called LOBs, and different techniques have been proposed and used to store them, which typically involve splitting up the large value and storing it in more than one block.