Sql-server – Creating indexes after data load Vs Before data load in a large table

indexsql serverssis

I have a huge sql server table of size 430 GB on one server A.

I needed to copy the table as it is to a new server B, so I created an empty table on the server B and moved all data from A to B using SSIS package/job – this took me almost 30+ hours.

Now the data copy is finished and the big task is to build the indexes.

Was I correct in taking this approach, or should I have loaded the data in already indexed table on server B ?

—————————–TABLE—————

CREATE TABLE [CCTSwapForward].[SWAP_DATA](
    [SwapData_ID] [int] NOT NULL,
    [Contract_ID] [varchar](20) NOT NULL,
    [DataSourceName] [varchar](10) NULL,
    [Direction] [varchar](5) NULL,
    [StatusDescription] [varchar](50) NULL,
    [TargetCurrencyCode] [varchar](5) NULL,
    [SettlementCurrencyCode] [varchar](5) NULL,
    [TargetAmount] [float] NULL,
    [SettlementAmount] [float] NULL,
    [ConfirmationNo] [varchar](20) NULL,
    [ItemNo] [int] NULL,
    [ExpiryDate] [datetime] NULL,
    [Original_TargetAmount] [float] NULL,
    [Original_SettlementAmount] [float] NULL,
    [PartyID] [varchar](20) NULL,
    [PrimaryAssetClass] [varchar](50) NULL,
    [SecondaryAssetClass] [varchar](5) NULL,
    [EffectiveDate] [datetime] NULL,
    [PriceNotationType] [varchar](5) NULL,
    [PriceNotationValue] [varchar](5) NULL,
    [Addn_PriceNotationType] [varchar](5) NULL,
    [Addn_PriceNotationValue] [varchar](5) NULL,
    [ProductIDPrefix] [varchar](5) NULL,
    [ProductIDValue] [varchar](50) NULL,
    [AllocationIndicator] [varchar](20) NULL,
    [ExecutionTS] [datetime] NULL,
    [VerificationType] [varchar](50) NULL,
    [ExecutionVenuePrefix] [varchar](10) NULL,
    [ExecutionVenue] [varchar](50) NULL,
    [ClearingDCOValue] [varchar](50) NULL,
    [Collateralized] [varchar](25) NULL,
    [LastReporteddt] [datetime] NULL,
    [Confirmationdt] [datetime] NULL,
    [ConfirmationType] [varchar](25) NULL,
    [Valuationdt] [datetime] NULL,
    [MTMValue] [float] NULL,
    [MTMCurrency] [varchar](3) NULL,
    [ReportingJurisdiction] [varchar](15) NULL,
    [ValueDate] [datetime] NULL,
    [ExchangeRate] [float] NULL,
    [TradePartyRole] [varchar](50) NULL,
    [TradePartyPrefix] [varchar](50) NULL,
    [TradePartyValue] [varchar](20) NULL,
    [ReportingObligation] [varchar](15) NULL,
    [USPersonIndicator] [bit] NULL,
    [FinEntityIndicator] [bit] NULL,
    [ClientOrder_ID] [int] NULL,
    [OrderDetail_ID] [int] NULL,
    [Batch_ID] [int] NOT NULL,
    [Status_ID] [int] NULL,
    [initdt] [datetime] NULL,
    [initid] [int] NULL,
    [upddt] [datetime] NULL,
    [updid] [int] NULL,
    [ReportingPartyLEI] [varchar](20) NULL,
    [ProcessCenter] [varchar](50) NULL,
    [TargetAmount_NDec] [int] NULL,
    [SettlementAmount_NDec] [int] NULL,
    [ExchangeRate_NDec] [int] NULL,
    [MTMRate] [float] NULL,
    [LastUpdatedTS] [datetime] NULL,
    [Client_ID] [int] NULL,
    [Office_ID] [int] NULL,
    [IsInterafiliate] [bit] NULL,
    [TreasuryToBranchRate] [float] NULL,
    [ActualValueDate] [datetime] NULL,
    [SpreadRevenueInSettlementCurrency] [float] NULL,
    [SettlementToUSDRate] [float] NULL,
    [ReportingCurrencyToUSDRate] [float] NULL,
    [TradeCurrencyToUSDRate] [float] NULL,
    [BranchCurrency] [varchar](3) NULL,
    [FirstTradeDate] [datetime] NULL,
    [Action] [varchar](10) NULL,
    [TransactionType] [varchar](25) NULL,
    [LifeCycleEvent] [varchar](25) NULL,
    [TPDomicile] [varchar](200) NULL,
    [TP1Branch] [varchar](50) NULL,
    [TP2FinancialJurisdiction] [varchar](10) NULL,
    [TP2NonFinancialJurisdiction] [varchar](10) NULL,
    [MasterAgreementDate] [datetime] NULL,
    [ReportingDelegation_ID] [int] NULL,
    [RelatedClientOrder_ID] [int] NULL,
    [RelatedOrderDetail_ID] [int] NULL,
    [ReportingDelegationModel] [nvarchar](100) NULL,
    [ExtractDatetimeUTC] [datetime] NOT NULL,
    [IsNDF] [bit] NULL,
    [FixingDate] [datetime] NULL,
    [SettlementExchangeBasis] [varchar](7) NULL
) ON [PRIMARY]

--------------------INDEX---------------------------



USE [ODS]
GO

CREATE UNIQUE NONCLUSTERED INDEX [idx_Batch_ID_SwapData_ID] ON [CCTSwapForward].[SWAP_DATA]
(
    [Batch_ID] ASC,
    [SwapData_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


USE [ODS]
GO

CREATE UNIQUE NONCLUSTERED INDEX [idx_Contract_ID_SwapData_ID] ON [CCTSwapForward].[SWAP_DATA]
(
    [Contract_ID] ASC,
    [SwapData_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


USE [ODS]
GO

CREATE UNIQUE CLUSTERED INDEX [idx_SWAPDATA_ID] ON [CCTSwapForward].[SWAP_DATA]
(
    [SwapData_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Best Answer

Given that your table is very wide, and the indexes very narrow, creating non-clustered indexes on the table following the load should be preferred.

In this instance I would have:

  1. Create the new table with the Clustered Index in place - this is because the process of converting a heap into a clustered index is computationally expensive.
  2. Load the data into the table, in the order of the clustered index SwapData_ID
  3. Using BULK INSERT (ensuring the operation is minimally logged), load into the table
  4. Create the non-clustered indexes

The above approach should be optimal given your scenario.

There is of course then other questions around:

Data drift (will the source data change during your load process? Do these changes need to be taken across)

DR (is log shipping enabled? In this case the recovery model may need to be changed to bulk-logged)

Log file sizing (You'll need to ensure your log file is big enough to accommodate for the non-clustered index creations)

Presizing the database (ensuring it doesn't auto-grow during the load)

but these all seem to be slightly outside the context of what you're asking.