Sql-server – Returning only duplicated data

sql serversql-server-2008

I'm searching all over the internet and stack Overflow but every example I find is with only one row, or it just does not work.

I have this table:

TBL_RELATORIOS_TAMANHOS

table_catalog
table_name
column_name
data_type
character_maximum_length
numeric_precision
numeric_scale

Inside it I have all columns from all tables. I need to find only those columns that has a different precision, size or etc.

With row_number I got this:

;WITH CTE AS
(
SELECT DISTINCT table_name 'Tabela', 
       column_name 'Coluna',
       character_maximum_length as 'Tamanho', 
       numeric_precision as 'Precisao_Numerica', 
       numeric_scale 'Escala_Numerica'
FROM TBL_RELATORIO_TAMANHOS
)
SELECT *,
    ROW_NUMBER() over ( partition by Tabela,Coluna order by Tabela,Coluna)
FROM CTE
ORDER BY 1,2

and with this, I can see columns that has a 2 on it, so this one is duplicated ( same column, different size, precision or etc ).

| table_Name              | Column_name      | character_maximum_length | numeric_precision | numeric_scale | Row_number |
|-------------------------|------------------|--------------------------|-------------------|---------------|------------|
| ACOES                   | ID_BMF           | 20                       | NULL              | NULL          | 1          |
| ACOES                   | NO_ACAO          | 20                       | NULL              | NULL          | 1          |
| AJUSTE_CODIGO_MITIGADOR | CODIGO_MITIGADOR | 40                       | NULL              | NULL          | 1          |
| AJUSTE_CODIGO_MITIGADOR | CONTA_DLO        | 10                       | NULL              | NULL          | 1          |
| AJUSTE_CODIGO_MITIGADOR | CONTA_DLO        | 25                       | NULL              | NULL          | 2          |
| AJUSTE_CODIGO_MITIGADOR | VALOR_PADRAO     | 1                        | NULL              | NULL          | 1          |
| AJUSTE_POPR             | CD_ELEMENTO      | 10                       | NULL              | NULL          | 1          |

As you can see, CONTA_DLO appears 2 times. So, how the hell can I list only those columns with more then 1 row_number, but returning all of them ( all duplicates ). in this case. it should return CONTA_DLO with row_number 1 and 2.

Best Answer

So, you want to show only rows that there is more than one with same table and column name.

We can use COUNT(*) OVER (PARTITION BY table_name, column_name) to count how many rows have the same table and column name and then WHERE ... >= 2. Also:

  • I don't think you need the DISTINCT.
  • If you do need the DISTINCT there, it can be with either one more CTE to apply it or use GROUP BY as below.
  • Please don't use single quotes for table and column aliases. If you need to, use square brackets or double quotes.

The query:

  ;
WITH cte AS
(
    SELECT   
        table_name               AS Tabela, 
        column_name              AS Coluna,
        character_maximum_length AS Tamanho, 
        numeric_precision        AS Precisao_Numerica, 
        numeric_scale            AS Escala_Numerica,
        COUNT(*) OVER (PARTITION BY table_name, column_name) 
                                 AS Count_Duplicates
    FROM TBL_RELATORIO_TAMANHOS
    GROUP BY 
        table_name, column_name,  character_maximum_length,
        numeric_precision, numeric_scale
)
SELECT 
    cte.*,
    ROW_NUMBER() OVER (PARTITION BY Tabela, Coluna) AS Rn
FROM cte
WHERE Count_Duplicates >= 2
ORDER BY Tabela, Coluna, Rn 
  ;