Sql-server – Query to remove duplicate calculated values

sql serversql-server-2008-r2

I have a client running SQL Server 2008 R2 SP1, and I can't seem to write a query that has distinct calculated values. I could change the schema to store the value being calculated, but I was curious if there was a way to do this from within a query without modifying schema (for scenarios where client schema modification is not possible)?

City Table:

+-------+----------+--------------------------+----------+
|  id   |   name   |       displayName        | countyID |
+-------+----------+--------------------------+----------+
| ...   | ...      | ...                      | ...      |
| 833   | Portland | Portland, Ashley, AR     | 3323     |
| 21388 | Portland | Portland, Clackamas, OR  | 5439     |
| 21655 | Portland | Portland, Multnomah, OR  | 5462     |
| 21726 | Portland | Portland, Washington, OR | 5470     |
+-------+----------+--------------------------+----------+

County Table

+------+------------+----------------+---------+
|  id  |    name    |  displayName   | stateID |
+------+------------+----------------+---------+
| ...  | ...        | ...            | ...     |
| 5439 | Clackamas  | Clackamas, OR  | 38      |
| 5462 | Multnomah  | Multnomah, OR  | 38      |
| 5470 | Washington | Washington, OR | 38      |
+------+------------+----------------+---------+

State Table

+-----+-------+---------------+
| id  | State | StateFullName |
+-----+-------+---------------+
| ... | ...   | ...           |
| 38  | OR    | Oregon        |
+-----+-------+---------------+

Query:

   select top 100 City.id as cityID
    ,City.name + ', ' + [State].[State] as label
    ,[State].[State] as stateAbbrev
    ,[State].StateFullName as stateName
from [City]
left join [Country] on (City.countyID = Country.id)
left join [State] on (Country.stateID = [State].id)
where City.name + ', ' + [State].[State] like 'Portland, OR'

Results:

+--------+--------------+-------------+-----------+
| cityID |    label     | stateAbbrev | stateName |
+--------+--------------+-------------+-----------+
|  21388 | Portland, OR | OR          | Oregon    |
|  21655 | Portland, OR | OR          | Oregon    |
|  21726 | Portland, OR | OR          | Oregon    |
+--------+--------------+-------------+-----------+

The desired query would return only one row for distinct values of the calculated column "label". As I mentioned earlier, I'm curious if there's a solution that doesn't involve schema modification since that may not always be possible.

Best Answer

Converting my comment as answer (SQLFiddle):

;with cte (cityID, label, rowNum, stateAbbrev, stateName)
as (
    SELECT 
    TOP 100 City.id AS cityID,
    City.name + ', ' + [State].[State]  as label,
    row_number() over (partition by cast(City.name + ', ' + [State].[State] as varchar(max)) order by City.id desc) AS rowNum, -- if you want the highest cityID then use "desc" else remove the "desc"
    [State].[State] AS stateAbbrev,
    [State].StateFullName AS stateName
  FROM [City]
  LEFT JOIN [Country] ON(City.countyID = Country.id)
  LEFT JOIN [State] ON(Country.stateID = [State].id)
  WHERE City.name + ', ' + [State].[State] LIKE 'Por%'
  order by City.id)  -- I have put in order by 
select cityID, label, stateAbbrev, stateName
        from cte
        where rowNum = 1

The above will result :

(click here to enlarge)

Note: As Aaron commented, you should remove CityID if not required in output list.

Related Solutions

Sql-server – sql server Group by with calculated values

I think your CASE statements may be flawed as you're always going to have validity_name and validity_surname equal to 0, because the name cannot be 3 different things at once.

When do you want the validity_name to equal 1?

Regardless of that fact, I like the APPLY VALUES method of grouping on a computed column.

Using AdventureWorks as an example, this works like so:

USE AdventureWorks;
GO

SELECT
  FirstName,
  ValidName
FROM Person.Person AS p
CROSS APPLY (VALUES(CASE WHEN FirstName = 'Kim' THEN 1 ELSE 0 END)) AS a(ValidName)
GROUP BY FirstName, ValidName

When applied to your query, it should be something like the following (untested, use the example above to build your query):

select
       p.id,
       p.name,
       p.surname,
       p.adress,
       a.validity_name,
       b.validity_surname,
       sum(pr.value) as value
from   [dbo].[_data_CRM_COSTOMER] as p inner join
       [dbo].[_data_CRM_INVOICE] as r on (r.partner_id=p.id) 
        CROSS APPLY (VALUES(case when(p.name not like ('%!!!%') or p.name not like ('%XXX%') or p.name not like ('%???%')) then 0 else 1 END)) AS a(validity_name)
        CROSS APPLY (VALUES(CASE WHEN p.surname not like ('%!!!%') or p.surname not like ('%*XXX%') or p.surname not like ('%???%') then 0 else 1 END)) as b(validity_surname)
group by id, name, validity_name, surname, validity_surname, adress

Sql-server – Finding rows with duplicate values

This should return tag the records that need attention. I put the tagging in SELECT, but you could easily turn this into a second CTE and simply select out the payments to clean up.

-- 
-- find all accounts with more than one payment and mark payments to cancel
--
WITH cte_DuplicatePayments AS
(
SELECT COUNT(*) OVER(PARTITION BY accountID) AS numberOfPaymentsPerAccountID
, COUNT(*) OVER(partition BY accountID, amount) AS numberOfPaymentsPerAccountIDAndAmount
, ROW_NUMBER() OVER(partition BY accountID ORDER BY amount asc) AS PaymentsNumberPerAccountID
, *
FROM ScheduledPayment
)
SELECT CASE 
    WHEN numberOfPaymentsPerAccountID != numberOfPaymentsPerAccountIDAndAmount THEN 'MARK AS CANCELLED: Duplicate Payments with amount mismatch' 
    WHEN PaymentsNumberPerAccountID > 1 THEN 'MARK AS CANCELLED: Duplicate Payments with matching amount' 
    ELSE ''
   END AS PaymentAuditAction
, ScheduledPaymentID, accountID, amount,
FROM cte_DuplicatePayments
WHERE numberOfPaymentsPerAccountID > 1

Best Answer

Related Solutions

Sql-server – sql server Group by with calculated values

Sql-server – Finding rows with duplicate values

Related Question