Sql-server – SELECT DISTINCT in extra space data from a single column ( no duplicates )

distinctduplicationsql server

Hello i am having problems with this column data

cities
Charlotte
Cincinnati
Cleves
Columbus
Fairfield
Fresno
Los  Angeles -- HERE IS THE PROBLEM
Los Angeles
Madison
Manhatttan  Beach
New York
Orange
Palo Alto
Phoenix
Sacramento
San Francisco
Takoma  Park -- maybe this city must have problems also
Tarrytown
Valencia
Washington

How i can filter that value, what is the easy way with trim? i did my research but i find long sql statements that i dont understand if that could help is a simple error.

This is the query:

-- 3. Write a query that will list all the cities that have customers with a heading of Cities. Only
-- list each city once (no duplicates) and sort in descending alphabetical order.

select distinct customer_city as cities
FROM customers
ORDER BY customer_city ASC

-- left Los  Angeles

Best Answer

As Erik said in the comments, you should fix bad data rather than trying to query around it, but if you absolutely cannot fix the data, the below query will get the distinct list of cities by replacing double spaces with a single space:

SELECT DISTINCT REPLACE(customer_city, '  ', ' ') as cities
FROM customers
ORDER BY customer_city ASC

This is a really basic example, however, if the input data is not being validated, this may not be the only type of whitespace you're encountering causing duplication.

Prior to SQL 2017, you need to daisy chain multiple REPLACE statements to replace multiple characters. For example, this code replaces double spaces and tab characters with a single space:

SELECT DISTINCT REPLACE(REPLACE(customer_city, '  ', ' '), CHAR(9), ' ') as cities
FROM customers
ORDER BY customer_city ASC

In 2017, you can use the TRANSLATE function to swap all of the characters you're searching for with a single character, then replace that character with nothing to ensure you're finding all exact duplicates:

SELECT DISTINCT REPLACE(TRANSLATE(customer_city, CHAR(9) + CHAR(10) + CHAR(13) + CHAR(32), '####'), '#', '') as cities
FROM customers
ORDER BY customer_city ASC

This means you don't have to repeat REPLACE for every character you want to strip, you just add the character code (+ CHAR(?)) to the TRANSLATE function and add another replacement character (#). As you can see, the TRANSLATE example replaces 4 characters for basically the same amount of code as the two-character replacement in prior versions.

Related Solutions

Sql-server – Change SELECT DISTINCT to UPDATE DISTINCT

To just update an arbitrary one from each distinct group you could use

WITH T
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Finance_Project_Number] 
                                       ORDER BY (SELECT 0)) AS RN,
                [Processing_Result_Text],
                [Processing_Result]
         FROM   [InterfaceInfor].[dbo].[ProjectMaster]
         WHERE  NOT EXISTS (SELECT *
                            FROM   [IMS].[dbo].[THEOPTION]
                            WHERE  [InterfaceInfor].[dbo].[ProjectMaster].[Finance_Project_Number] = [IMS].[dbo].[THEOPTION].[NAME]))
UPDATE T
SET    [Processing_Result_Text] = 'UNIQUE',
       [Processing_Result] = 0
WHERE  RN = 1;

If you decide there is some criteria by which the row to be updated should be selected after all simply change the ORDER BY (SELECT 0) accordingly so that the desired target row is ordered first - e.g. ORDER BY DateInserted desc would update the latest one as ordered by a column called DateInserted if such a column exists.

This uses a common table expression (CTE) because it is not permitted to reference ranking functions such as ROW_NUMBER directly in the WHERE clause. It is allowed to update data via common table expressions in the same circumstances as for updatable views (basically the data being updated must be able to be mapped back straight forwardly to specific items in a single base table).

If you are not yet familiar with ranking functions you may well find SELECT-ing from the CTE first to be beneficial.

CREATE TABLE #TheOption(NAME VARCHAR(50));

CREATE TABLE #ProjectMaster
(
Finance_Project_Number VARCHAR(10) NOT NULL,
Processing_Result_Text VARCHAR(50) NULL,
Processing_Result INT NULL
);

INSERT INTO #ProjectMaster (Finance_Project_Number, Processing_Result_Text)
VALUES ('A00001', 'A'), 
       ('A00001', 'B'), 
       ('A00001', 'C'), 
       ('B99999', 'D'), 
       ('B99999', 'E'), 
       ('C47474', 'F'), 
       ('C47474', 'G');

INSERT INTO #TheOption (NAME) VALUES('C47474');


WITH T
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY Finance_Project_Number 
                                       ORDER BY (SELECT 0)) AS RN,
                Finance_Project_Number,
                Processing_Result_Text,
                Processing_Result
         FROM   #ProjectMaster pm
         WHERE  NOT EXISTS (SELECT *
                            FROM   #TheOption opt
                            WHERE  pm.Finance_Project_Number =opt.NAME))
SELECT *
FROM T
ORDER BY Finance_Project_Number, RN;

Example results are below

The C47474 rows are filtered out as they do exist in the other table so don't meet the NOT EXISTS, the remaining rows are grouped by Finance_Project_Number and assigned a sequential number within each group.

In this case the yellow rows would meet the RN = 1 condition and be updated. However there is no guarantee exactly how these numbers will be assigned within each group unless you use a ORDER BY clause on an expression guaranteed to be unique. Without this it could potentially change even between successive executions of the same statement.

How to select distinct record

This query with a DISTINCT:

SELECT DISTINCT Name, MyID
FROM data;

Returns distinct Name and MyID:

Name     |  MyID
Alan     |  2
John     |  3

This query with a GROUP BY:

SELECT MIN(ID) AS ID, Name, MyID
FROM data
GROUP BY Name, MyID;

Returns Name and MyId with the smallest ID like your sample:

ID   |  Name     |  MyID
1    |  Alan     |  2
3    |  John     |  3

Both query are in this SQL Fiddle and will work with most RDBMS. (Sample uses SQL Server)

Best Answer

Related Solutions

Sql-server – Change SELECT DISTINCT to UPDATE DISTINCT

How to select distinct record

Related Question