Not sure that this is simpler than a cursor solution, but here is what I would do:
Before we begin we need a few tables in a SQL Fiddle:
MS SQL Server 2012 Schema Setup
CREATE TABLE dbo.tbl1(c1 INT, c2 INT, c3 INT);
CREATE TABLE dbo.tbl2(c4 INT, c5 INT, c6 INT);
CREATE TABLE dbo.tbl3(c7 INT, c8 INT, c9 INT);
First we need the list of all user tables. For that we can use the sys.tables
catalog view. The OBJECT_SCHEMA_NAME()
function gets us the schema name, the OBJECT_NAME()
function the table name. The QUOTENAME()
function quotes the names correctly, in case some of those contain special characters, key words or spaces. (In this example the use of OBJECT_NAME()
is not strictly necessary as sys.tables
has a name column, but I left it in as you can use this pattern with any catalog view that has an object_id
column.)
Query 1:
SELECT
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id))+'.'+
QUOTENAME(OBJECT_NAME(T.object_id)) AS quoted_table_name,
T.object_id
FROM sys.tables AS T;
Results:
| QUOTED_TABLE_NAME | OBJECT_ID |
|-------------------|-----------|
| [dbo].[tbl1] | 245575913 |
| [dbo].[tbl2] | 261575970 |
| [dbo].[tbl3] | 277576027 |
The next step is to get the list of column names, again quoted. We can use the sys.columns
catalog view for that.
Query 2:
SELECT C.name,C.column_id
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1');
Results:
| NAME | COLUMN_ID |
|------|-----------|
| c1 | 1 |
| c2 | 2 |
| c3 | 3 |
The next hurdle is to get those columns in a comma separated list. There is no string concatenation aggregate function build in so we have to use a trick:
Query 3:
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1')
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS clomun_list;
Results:
| CLOMUN_LIST |
|----------------|
| [c1],[c2],[c3] |
With that all pieces are in place and we just have to put them all together:
Query 4:
SELECT 'SELECT ' +
CL.column_list +
' FROM ' +
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id)) + '.' +
QUOTENAME(OBJECT_NAME(T.object_id)) +
';' AS select_statement
FROM sys.tables AS T
CROSS APPLY (
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = T.object_id
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS column_list
)CL;
Results:
| SELECT_STATEMENT |
|------------------------------------------|
| SELECT [c1],[c2],[c3] FROM [dbo].[tbl1]; |
| SELECT [c4],[c5],[c6] FROM [dbo].[tbl2]; |
| SELECT [c7],[c8],[c9] FROM [dbo].[tbl3]; |
This example will work in SQL 2005 and later, assuming you are on the latest service pack.
There is no clean solution to achieve this with SQL Server 2000. In that case you need to go back to your cursor solution.
What might cause the big discrepancy in disk space?
Typically this is caused by rows that have been deleted. Heaps do not reclaim space for data pages emptied by DELETE
operations unless a table lock is taken at the time of the deletion. Even then, other factors like the possibility of row-overflow data, or an enabled row-versioning isolation level can prevent space being reclaimed.
Another factor that usually has a much smaller impact on space usage is that rows with variable-length columns that expand may not fit on the original heap page, in which case the row is forwarded to another page. Forwarded rows are more usually associated with scan performance issues than pure space usage, however.
Overall, tables that experience significant deletions (without the space being reused quickly by new rows) and/or variable column updates that do not fit in place are often better configured with a clustered index. Clustered tables usually deallocate empty pages very quickly.
You can use the sys.dm_db_index_physical_stats
DMV to see physical details for a heap or index:
SELECT
DDIPS.index_id,
DDIPS.partition_number,
DDIPS.index_type_desc,
DDIPS.alloc_unit_type_desc,
DDIPS.avg_fragmentation_in_percent,
DDIPS.fragment_count,
DDIPS.avg_fragment_size_in_pages,
DDIPS.page_count,
DDIPS.avg_page_space_used_in_percent,
DDIPS.record_count,
DDIPS.avg_record_size_in_bytes,
DDIPS.forwarded_record_count
FROM sys.dm_db_index_physical_stats
(
DB_ID(),
OBJECT_ID(N'dbo.t1', N'U'),
0,
NULL,
'DETAILED'
) AS DDIPS;
The following (trivial) example shows a heap DELETE
not releasing any empty pages:
SET NOCOUNT ON;
CREATE TABLE t1 (c1 char(8000) DEFAULT 'a');
GO
INSERT INTO t1 DEFAULT VALUES;
GO 1000
-- 1000 pages allocated
SELECT
DDIPS.index_id,
DDIPS.partition_number,
DDIPS.page_count
FROM sys.dm_db_index_physical_stats
(
DB_ID(),
OBJECT_ID(N'dbo.t1', N'U'),
0,
NULL,
'DETAILED'
) AS DDIPS;
-- Delete all the data from the heap
DELETE FROM t1;
-- Still 1000 pages allocated
-- Even though the table is empty
SELECT
DDIPS.index_id,
DDIPS.partition_number,
DDIPS.page_count
FROM sys.dm_db_index_physical_stats
(
DB_ID(),
OBJECT_ID(N'dbo.t1', N'U'),
0,
NULL,
'DETAILED'
) AS DDIPS;
GO
DROP TABLE dbo.t1;
If you try the example again, but with a table lock (DELETE FROM t1 WITH (TABLOCK)
) the deletion frees all empty pages (assuming the database does not have the READ_COMMITTED_SNAPSHOT
option set to ON
etc.)
Best Answer
Partitioning does not sound like a good fit for your scenario. Partitioning isn't something to be taken lightly. It requires a lot of planning and you'd have to rewrite 5 billion rows in order to convert the data. Instead you should consider a
UNION ALL
view. Depending on your requirements you could even make it a partitioned view.A view will give you an easy way to query your data without any date movement. It's by far the simplest solution.
In general, partitioning doesn't improve query performance. However, with columnstore tables it can be helpful if it gives you better rowgroup elimination on the partitioned column. That can be accomplished without partitioning depending on how you build your columnstore tables. If you don't modify the old data it should be relatively straightforward to build your columnstore in a way that maintains order.
Columnstore tables do support partitioning, if you do have a legitimate need for it at some point. It's easy to experiment with it. Simply try to create an empty, partitioned columnstore table.