Sql-server – SELECT DISTINCT in extra space data from a single column ( no duplicates )

distinctduplicationsql server

Hello i am having problems with this column data

cities
Charlotte
Cincinnati
Cleves
Columbus
Fairfield
Fresno
Los  Angeles -- HERE IS THE PROBLEM
Los Angeles
Madison
Manhatttan  Beach
New York
Orange
Palo Alto
Phoenix
Sacramento
San Francisco
Takoma  Park -- maybe this city must have problems also
Tarrytown
Valencia
Washington

How i can filter that value, what is the easy way with trim? i did my research but i find long sql statements that i dont understand if that could help is a simple error.

This is the query:

-- 3. Write a query that will list all the cities that have customers with a heading of Cities. Only
-- list each city once (no duplicates) and sort in descending alphabetical order.

select distinct customer_city as cities
FROM customers
ORDER BY customer_city ASC

-- left Los  Angeles

Best Answer

As Erik said in the comments, you should fix bad data rather than trying to query around it, but if you absolutely cannot fix the data, the below query will get the distinct list of cities by replacing double spaces with a single space:

SELECT DISTINCT REPLACE(customer_city, '  ', ' ') as cities
FROM customers
ORDER BY customer_city ASC

This is a really basic example, however, if the input data is not being validated, this may not be the only type of whitespace you're encountering causing duplication.

Prior to SQL 2017, you need to daisy chain multiple REPLACE statements to replace multiple characters. For example, this code replaces double spaces and tab characters with a single space:

SELECT DISTINCT REPLACE(REPLACE(customer_city, '  ', ' '), CHAR(9), ' ') as cities
FROM customers
ORDER BY customer_city ASC

In 2017, you can use the TRANSLATE function to swap all of the characters you're searching for with a single character, then replace that character with nothing to ensure you're finding all exact duplicates:

SELECT DISTINCT REPLACE(TRANSLATE(customer_city, CHAR(9) + CHAR(10) + CHAR(13) + CHAR(32), '####'), '#', '') as cities
FROM customers
ORDER BY customer_city ASC

This means you don't have to repeat REPLACE for every character you want to strip, you just add the character code (+ CHAR(?)) to the TRANSLATE function and add another replacement character (#). As you can see, the TRANSLATE example replaces 4 characters for basically the same amount of code as the two-character replacement in prior versions.