Sql-server – select in (select) allow to get duplicate values

duplicationselectsql server

I need help in one SQL query.

I have select in select:

select column1 from table where column2 in (
--there I have select command what returns me 609 rows with duplicate values
)

and I need to get in first select 609 rows with duplicates. But I only get 12 rows because the select in command ignores duplicate values in second select.

I need to get certain values (609rows) not only to find what values I get two or more times.
I will join this result to another result.

My query looks like this :

This is the first select I was talking about

select  serialnumber from ffPackage where ID in (

and this is the select from what I get 609 rows.

select parentID from ffPackage where SerialNumber in (
select ffPackage.SerialNumber from ffEmployee, ffHistory, ffSerialNumber, ffPackage, ffUnitDetail
where ffHistory.UnitID in (
select UnitID from ffUnitDetail where OutmostPackageID in (
select ID from ffPackage where SerialNumber = 'xxxxxxxx'))
and ffHistory.EmployeeID = ffEmployee.ID
and ffHistory.UnitID = ffSerialNumber.UnitID
and ffHistory.UnitStateID = 'xxxx'
and ffUnitDetail.UnitID = ffHistory.UnitID
and ffUnitDetail.InmostPackageID = ffPackage.ID
))

Best Answer

The query that returns 609 rows apparently has only 12 distinct parentID values. If you use it as an IN subquery matched against a unique column (which I am assuming ffPackage.ID is), you can only get 12 rows as the result.

If you want to get as many rows as the subquery returns, one way is to use the subquery as a derived table and join it to ffPackage:

SELECT
  serialnumber
FROM
  dbo.ffPackage AS p
  INNER JOIN
  (
    -- the 609 row subquery
    select parentID from ffPackage where SerialNumber in (
    select ffPackage.SerialNumber from ffEmployee, ffHistory, ffSerialNumber, ffPackage, ffUnitDetail
    where ffHistory.UnitID in (
    select UnitID from ffUnitDetail where OutmostPackageID in (
    select ID from ffPackage where SerialNumber = 'xxxxxxxx'))
    and ffHistory.EmployeeID = ffEmployee.ID
    and ffHistory.UnitID = ffSerialNumber.UnitID
    and ffHistory.UnitStateID = 'xxxx'
    and ffUnitDetail.UnitID = ffHistory.UnitID
    and ffUnitDetail.InmostPackageID = ffPackage.ID
    ))
  ) AS sub ON p.ID = sub.ParentID
;

Incidentally, I would like to suggest that you rewrite the comma joins in the subquery as explicit joins – for consistency and maintainability's sake:

SELECT
  SerialNumber
FROM
  dbo.ffPackage AS p
  INNER JOIN
  (
    -- the 609 row subquery
    SELECT
      ParentID
    FROM
      dbo.ffPackage
    WHERE
      SerialNumber IN
      (
        SELECT
          p.SerialNumber
        FROM
          dbo.ffEmployee AS e
          INNER JOIN dbo.ffHistory AS h ON h.EmployeeID = e.ID
          INNER JOIN dbo.ffSerialNumber AS sn ON h.UnitID = sn.UnitID
          INNER JOIN dbo.ffUnitDetail AS ud ON ud.UnitID = h.UnitID
          INNER JOIN dbo.ffPackage AS p ON ud.InmostPackageID = p.ID
        WHERE
          h.UnitID IN
          (
            SELECT
              UnitID
            FROM
              ffUnitDetail
            WHERE
              OutmostPackageID IN
              (
                SELECT ID FROM ffPackage WHERE SerialNumber = 'xxxxxxxx'
              )
          )
          AND h.UnitStateID = 'xxxx'
      )
  ) AS sub ON p.ID = sub.ParentID
;

I would also argue that using better formatting and short (but meaningful) table aliases contributes to maintainability of your queries too. And you may want to consider getting into the habit of always qualifying your tables with the schema name as well as that of using statement terminators.

Related Solutions

Sql-server – Selecting only one duplicate

You can use a CTE for this, if you want the row that's returned to be a complete, intact row rather than aggregates of any of the other columns. You can change the ORDER BY to prefer rows by any of the columns (the grouping is by the ones you think should be unique).

;WITH x AS
(
  SELECT col1, col2, col3, 
    rn = ROW_NUMBER() OVER 
    (
        PARTITION BY unique_columns 
        ORDER BY unique_columns, tie_breaker_if_you_care
    )
  FROM dbo.source_table
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

Updating duplicate records with different data

Using the ranking ROW_NUMBER() function will work. First give row numbers to all rows in both tables, then join using these row numbers, then update:

with 
  oldt as
  ( select fileNo , folder, fileType,
           row_number() over (partition by fileNo, folder
                              order by fileType)
             as rn
    from oldtable
  ),

  newt as
  ( select fileNo , folder, fileType,
           row_number() over (partition by fileNo, folder
                              order by fileType)
             as rn
    from newtable
  ),

  upd as
  ( select 
        n.fileType,
        o.fileType as old_fileType
    from newt n
      join oldt o
      on  n.fileNo = o.fileNo
      and n.folder = o.folder
      and n.rn     = o.rn 
  ) 

update
    upd
set 
    fileType = old_fileType ;

SQLfiddle seems to be giving error for Oracle, so it has been tested in SQL Server only: SQLfiddle-test (but this syntax should be valid for Oracle, too).

Tested in Oracle, the above doesn't work, sadly. I think because statements that start with WITH can only be SELECT statements. Even if the query is rearranged (I tried several rewrites), Oracle throws various errors. The only way I managed to have it working is after adding another column in newtable and a unique constraint on it. Then the following works (nid is the added primary key column).

Tested in Oracle's Live SQL site:

update 
  ( with 
      oldt as
      ( select fileNo , folder, fileType,
               row_number() over (partition by fileNo, folder
                                  order by fileType)
                 as rn
        from oldtable
      ),
    newt as
      ( select fileNo , folder, nid,
               row_number() over (partition by fileNo, folder  
                                  order by fileType)
                 as rn
        from newtable
      ),
    upd as
      ( select 
            n.nid,
            o.fileType as old_fileType
        from newt n
          join oldt o
          on  n.fileNo = o.fileNo
          and n.folder = o.folder
          and n.rn     = o.rn 
      ) 
    select 
        up.fileType,
        ( select upd.old_fileType
          from upd 
          where upd.nid = up.nid 
        ) as old_fileType
    from newtable up
  ) x
set fileType = old_fileType ;

Best Answer

Related Solutions

Sql-server – Selecting only one duplicate

Updating duplicate records with different data

Related Question