I want to update one large table that has 83,423,460 rows and growing.
The below query takes 8 minutes to execute successfully:
UPDATE FPP_Invoice_Revenue
SET Till_Prev_Inv_Amt = Till_Prev_Inv_Amt_In_USD / 0.0285714286,
Cur_Inv_Amt = Cur_Inv_Amt_In_USD / 0.0285714286,
YTD_Inv_Amt = YTD_Inv_Amt_In_USD / 0.0285714286
WHERE SOW_Number = '20014378'
There exists one clustered index
. I thought of disabling that index before updating and again will rebuild after update but that also did not work since rebuilding is taking a lot of time.
I have read somewhere this can be achieved by dividing into small parts, but how can I divide the above query?
DDL:
CREATE TABLE [dbo].[FPP_Invoice_Revenue](
[Project_Code] [varchar](10) NOT NULL,
[Project_Desc] [varchar](50) NULL,
[SOW_Number] [varchar](10) NOT NULL,
[SOW_Desc] [varchar](50) NULL,
[Invoice_No] [varchar](50) NOT NULL,
[Inv_Month] [int] NOT NULL,
[Inv_Year] [int] NOT NULL,
[Billing_Date] [smalldatetime] NULL,
[Doc_Currency] [varchar](10) NULL,
[Vertical] [varchar](255) NULL,
[Till_Prev_Inv_Amt] [numeric](24, 10) NULL,
[Cur_Inv_Amt] [numeric](24, 10) NULL,
[YTD_Inv_Amt] [numeric](24, 10) NULL,
[Till_Prev_Inv_Amt_In_USD] [numeric](24, 10) NULL,
[Cur_Inv_Amt_In_USD] [numeric](24, 10) NULL,
[YTD_Inv_Amt_In_USD] [numeric](24, 10) NULL,
CONSTRAINT [PK_FPP_Invoice_Revenue] PRIMARY KEY CLUSTERED
(
[Project_Code] ASC,
[SOW_Number] ASC,
[Invoice_No] ASC,
[Inv_Month] ASC,
[Inv_Year] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Execution Plan:
Best Answer
Adding an index on
SOW_Number
will allow SQL Server to very quickly identify the rows that need to be updated, without requiring a scan of the entire table (assuming a relatively small number of rows match theWHERE
clause, making the index highly selective).Even though you have
SOW_Number
defined as part of the clustered index, it is not the leading column of the index, which means SQL Server must scan the entire index looking for rows that fulfill theWHERE
clause. Adding the three columns listed in theINCLUDE
clause will allow SQL Server to not perform a lookup into the clustered index for those values, which it would otherwise need to do to as they are used in the calculations inside theSET
clause.I wouldn't normally suggest adding an index like this without evaluating the workload carefully; however I'm assuming you don't have any indexes other than the clustered index you identify in your script. Adding a single index in this case will most likely make a very big difference to the performance of your update, without dramatically affecting other portions of your workload.
I setup a test-bed on my laptop so I could test the performance of adding the suggested index vs having no index on
SOW_Number
.First, I created a new database
TestDB
for this test (I'm using SQL Server on Linux RC3):Next, I inserted 10,000,000 rows to have a sufficiently large table. I know, this is a bit smaller than your table, but it provides enough data to make reasonable extropolations:
Next, I create the non-clustered index I've suggested above:
This took about 1:20 to create on my laptop.
Now, the query:
The test data is evenly distributed on 10
SOW_Number
values;WHERE SOW_Number = '6'
indicates we'll update 1,000,000 rows:The actual execution plan for the update query:
The execution statistics for this:
As you can see from the above, the query only took around 14 seconds to update 1,000,000 rows. If
SOW_Number
is more selective, the time required will be proportionally less.As you've identified in your question, dropping the clustered index does not help increase performance of the update query. Understand that all data in a table that has a clustered index is in fact stored in the clustered index. Dropping and re-creating the clustered index will result in all 83 million rows being written to the log while data is moved in and out of the clustered index. This is a lot of extra I/O that is not necessary for the update to succeed.
You have
SOW_Number
defined as avarchar(10)
; if the content of that column is always actually 10 characters long you might consider modifying it to be achar(10)
instead ofvarchar(10)
, since that is actually more efficient. Consider this for all thevarchar
columns you have. Also, doesVertical
actually need to be 255 characters long? Presumably,Inv_Month
andInv_Year
are actually not required to store up to 2,147,483,647? You could likely convert those columns tosmallint
and save 4 bytes per row. Keeping in mind the columns in the clustered index key are used (duplicated) in every non-clustered index, the space savings from right-sizing the data might be substantial.