SQL Server – How to Delete Multiple Rows Based on the First Instance of the Value

sql serversql-server-2005

I have a table with the below structure. I would like to delete all of the records that records that match ColA for any of the ColB's that have a ColC that has an even-numbered value. But only for the first instance of ColA. (In this example all of the AA records would be deleted)

ColA and ColB are unique and sorted in alphabetical order. ORderByCol is part of the query that builds this table: ROW_NUMBER() OVER(PARTITION BY ColA ORDER BY ColB)

ColA    ColB    ColC    OrderByCol
AA      A       8       1
AA      B       3       2
AA      F       5       3
BB      B       7       1
BB      D       9       2
CC      A       1       1
CC      Q       5       2

I can do this with an ColA in a subquery, but that seems like a lot of work and it is really slow.

Query:

DELETE T FROM Table1 AS T
WHERE T.ColA IN (
    SELECT DISTINCT ColA FROM Table1
    WHERE EXISTS (SELECT * FROM Table1 WHERE ColC % 2)
    AND ORderByCol = 1
)

So the query will give me a list of all unique ColA values that have the first item not primary, but it is really slow. I'm sure there is a faster way to do this.

EDIT: If someone knows a better way to do what I am asking that would be awesome! I always like learning new ways to do things in SQL. However I found out why my query was slow. I didn't have an index on the table because I thought that table variables (@table1) tables couldn't have indexes. Once I put in an index on my table it reduced my run time from 1 min 20 sec to 2 sec.

Thank you to every one who has helped out with this.

Best Answer

Your query is more complicated that it needs to be. DISTINCT is redundant there and the internal subquery is not needed. You could rewrite as:

DELETE T FROM Table1 AS T
WHERE T.ColA IN (
    SELECT ColA 
    FROM Table1
    WHERE ColC % 2 = 0 
      AND OrderByCol = 1
) ;

Related Solutions

Combine Columns from Multiple Rows into Single Row in SQL Server

This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:

DECLARE @x TABLE 
(
  id INT, 
  row_num INT, 
  customer_code VARCHAR(32), 
  comments VARCHAR(32)
);

INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';

SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments 
    FROM @x AS x2 WHERE id = x.id
     ORDER BY row_num
     FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;

If you have a case where the data in comments could contain unsafe-for-XML characters (>, <, &), you should change this:

     FOR XML PATH('')), 1, 1, '')

To this more elaborate approach:

     FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')

(Be sure to use the right destination data type, varchar or nvarchar, and the right length, and prefix all string literals with N if using nvarchar.)

Sql-server – selecting multiple instances of a record based on record lifespan over years

This won't be fantastic depending on the indexes on the Items table, but should be much more efficient than the loop you were thinking about. In almost all cases, a set-based query will perform much better than iteration of any kind - there are a few exceptions, but you should only end up with a loop if it's actually necessary or proves to perform better than a set approach, never as a first reaction. IMHO.

This procedure takes advantage of a catalog view to build a set of numbers on the fly that represents the largest number of replacements that could be possible, given the input start/end year, if the smallest lifespan is one year. You could reduce this if the smallest span is 2 years, etc., but it won't really change the performance profile. Then it uses those numbers to find replacement years, based on modulo, like your approach would have - but it uses a set instead. There is probably a way I could have finagled the UNION into the JOIN but it seemed easier to call this part of the query out separately.

CREATE PROCEDURE dbo.GetReplacementsWithinYearRange
    @startyear int, 
    @endyear   int
AS
BEGIN
    SET NOCOUNT ON;

    ;WITH n(n) AS 
    (
      SELECT TOP (@endyear - @startyear + 1) ROW_NUMBER() OVER 
      (ORDER BY [object_id]) FROM sys.all_columns
    )
    SELECT ID, ItemName, ReplaceYear FROM
    (
     SELECT i.ID, i.ItemName, ReplaceYear = n.n + i.InstallYear
     FROM n INNER JOIN dbo.Items AS i
     ON (n.n - 1) % i.UsefullLife = i.UsefullLife - 1
     AND n.n + i.InstallYear > @startyear
     AND n.n + i.InstallYear <= @endyear
    ) AS x
    UNION
      SELECT ID, ItemName, InstallYear
       FROM dbo.Items
       WHERE InstallYear BETWEEN @startyear AND @endyear
    ORDER BY ID, ReplaceYear;
END
GO

Here's a sqlfiddle that demonstrates: http://sqlfiddle.com/#!3/09824f/2

Best Answer

Related Solutions

Combine Columns from Multiple Rows into Single Row in SQL Server

Sql-server – selecting multiple instances of a record based on record lifespan over years

Related Question