Sql-server – Speed up INSERT procedure

insertperformancequery-performancesql serversql-server-2016

I am creating a stored procedure in SQL Server 2016, in which I have the following query:

DECLARE 
    @FechaMovimiento DATETIME

SELECT 
   @FechaMovimiento = ISNULL(MAX(FechaMovimiento), '31/12/2014') 
FROM  
   DB0.[dbo].[TR_TAB_OLAP_FechasCierre]
WHERE 
   FechaCierre <= GETDATE() -1


INSERT INTO DB1.[dbo].[HVentas_1]
   (
   IDTienda, Id, Numero, NumeroFactura, idhora, IdLineaDocumento, 
   IdDocumento, Fecha, UniqueID, Posicion, CodigoDeBarras, 
   CodigoNIT_Productor, CodigoAmarreProductor,
   CodigoMarca, CodigoUnidad, CodigoUnidadDeEmpaque, CodigoImpuesto, CodigoAmarre_Articulo_Referencia,
   CodigoFamilia, CodigoLinea, CodigoCategoria, CodigoSubcategoria,
   DescripcionCortadeProducto, Descripcion, DescripcionTienda, BarraTienda, Unidades, PrecioDefecto, Precio, 
   Descuento, IDImpuesto, Importe, ImporteIVA, ImporteDtoTotal, 
   ImporteDtoTotalIva, ImporteDtoLinea, ImporteDtoLineaIva, Comentario1, Comentario2,
   VolumenTotal, PrecioPorPesoVolumen, Tiendas_ActivasPorBarra
   )
SELECT DISTINCT TOP 5 
   v.IDTienda, v.Id, v.Numero, v.NumeroFactura, v.idhora, v.IdLineaDocumento, 
   v.IdDocumento, v.Fecha, v.UniqueID, v.Posicion, v.CodigoDeBarras, 
   v.CodigoNIT_Productor, v.CodigoAmarreProductor,
   v.CodigoMarca, v.CodigoUnidad, v.CodigoUnidadDeEmpaque, v.CodigoImpuesto, v.CodigoAmarre_Articulo_Referencia,
   v.CodigoFamilia, v.CodigoLinea, v.CodigoCategoria, v.CodigoSubcategoria,
   v.DescripcionCortadeProducto, v.Descripcion, v.DescripcionTienda, v.BarraTienda, v.Unidades, v.PrecioDefecto, v.Precio, 
   v.Descuento, v.IDImpuesto, v.Importe, v.ImporteIVA, v.ImporteDtoTotal, 
   v.ImporteDtoTotalIva, v.ImporteDtoLinea, v.ImporteDtoLineaIva, v.Comentario1, v.Comentario2,
   v.VolumenTotal, v.PrecioPorPesoVolumen, 0
FROM 
   DB1.dbo.TR_TAB_OLAP_HVentas v LEFT JOIN  DB2.[dbo].[HVentas_1] v1 WITH (NOLOCK)
ON
   v.fecha = v1.fecha AND v.[UniqueID] = v1.[UniqueID] AND v.Posicion = v1.Posicion 
WHERE
   v1.fecha IS NULL
AND
   v1.Uniqueid IS NULL
AND
   v1.Posicion IS NULL
AND
   v.fecha > @FechaMovimiento

This is the execution plan:

https://www.brentozar.com/pastetheplan/?id=Sy4qks9SW

It is taking too much time (around 10 minutes), even though I am taking a small quantity of records. I would like to know If there's any way to speed up this procedure.

Query timings:

  • @FechaMovimiento assignment : less than 1 second.
  • Select distinct: less than 1 second
  • Insert From Select Distinct: almost 10 minutes.

I have a non-clustered index on HVentas on Fecha, UniqueId and Posicion.
And a columnstore index in HVentas_1.

Best Answer

First let's do a quick summary of the query plan for that insert query. SQL Server scans all rows from HVentas_1 to create the hash build and finds 0 relevant rows. It then scans through TR_TAB_OLAP_HVentas and stops if/when it finds 5 distinct rows such that v.fecha > @FechaMovimiento and that don't match a row in HVentas_1. You have an index defined on the table but SQL Server doesn't use it in the query plan that you uploaded. The worst case for performance should be when @FechaMovimiento is a value that doesn't match any rows in the table. In that case SQL Server will scan all 70 million rows from the table only to insert 0 rows. The number of rows inserted isn't relevant here. What's important is the amount of work that SQL Server did to find the rows to insert.

For the actual plan that you uploaded, SQL Server had to scan 70037142 rows from the heap before it found five distinct rows. It seems as if you had a very selective value for the variable, but you're using a local variable in a stored procedure without a RECOMPILE hint. You could be running into an issue with parameter sniffing. In any case, try it again with a RECOMPILE hint. That will allow SQL Server to build a query plan that accounts for the value of the local variable. This should be fine unless you're running this stored procedure hundreds of times per second. With better information, the query optimizer can make more informed decisions about when to use an index. Using an index here is likely to lead to better performance.

As a side node, you have a CCI that only has ten rows. Will that table continue to be that small? If so that may not be a good candidate for columnstore.