I need all the help that I can get to get this index rebuild successful. Especially an expert advise on transaction log management.
Background
The target database is a DW database hosting 3 big tables in clustered columnstore index (CCIX). The biggest table called Analog. It holds ~37 billion rows and ~600 GB of high compressed data. Based on our initial estimate using smaller table, a 150 GB CCIX table could expand to 5 TB, so we provisioned additional 29 TB for this rebuild.
- SQL Server Enterprise 2016
- Recovery mode = Simple
Why do we have to do this?
In the first version of our ETL service, the data are not consolidated and sorted first before they are loaded to CCIX. This results to non-optimal use of CCIX as many row segments are not fully filled before they are compressed. An optimal segment must have compressed 1,0485,76 rows and must be sorted in Time so time-based queries can get the most row-group elimination in sql server and less segments to process. We’re learn more CCIX as we go and as more documentatons are available.
Excerpt of my alignment is here
https://drive.google.com/file/d/0BzGLNskaj70UQUtZYW9CZF9iUUk/view?usp=sharing
The new ETL guarantees that data are consolidated and sorted in time before they are loaded to CCIX. So future loads will be sorted, we are concern here with existing data loaded during First Load.
For full case description, please read here.
https://drive.google.com/open?id=0BzGLNskaj70URmZURlVDWVNYd2M
This is our second attempt and I can see the log file is still piling up despite the new script having partition-based index rebuild. We need to keep this in check.
My questions are:
-
I already executed the 2nd attempt (in progress), is it possible to check which partitions are already processed and sorted? I have OFFLINE = ON in the rebuild script.
-
How can we manage the size of transaction log while a partion-based index rebuild is in progress? We need to keep this in check and regularly truncated every partition if possible.
-
The index rebuild failed initially due to insufficient space in log file. We added a new log file and use a separate disk. But how we make make sure this wont happen again in this second run? Will the partition-based rebuild sufficient to guarantee we can rebuild successfully?
Appreciate your help.
Rebuild scripts by partition (2nd attempt) – IN PROGRESS
Elapsed: 17 hrs, 34 mins
--create a new clustered row index (CRI)
CREATE CLUSTERED INDEX [Analog_ColumnStoreIndex]
ON [dbo].[Analog]
(
[LogTime] ASC,
[CTDIID] ASC,
[WindFarmId] ASC,
[StationId] ASC
)
WITH (
DROP_EXISTING = ON,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
ONLINE = OFF
)
ON [AnalogMonthlyPScheme]([LogTime])
GO
Best Answer
I would process this one partition at a time. At a high level:
SWITCH
partition 1 from the source table to the working tableDROP_EXISTING
CHECK
constraint to the working table that covers the range of values in partition 1SWITCH
the working table into partition 1 of the target tableThis makes maximum use of minimally-logged operations, minimizes workspace, and allows the transaction log to be cleared between partitions if necessary.
Demo
Partitioning
Numbers table setup (if required)
Source table and data
Target table and work table
Partition 1
Partition 2
...and so on. This is fairly straightforward to script.