Rank vs. Distinct

distinctoraclerank

Just for curiosity, if I have to use rank to tag distinct values for some columns only, and ignore whatever different values occurred on other columns, why not use distinct and just remove the other columns? Any real life examples?

SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table) tmp_table
WHERE rn = 1

vs.

SELECT DISTINCT
 col1,
 col2
FROM table

When ordered by col3, there is a chance you can get different values of col3 compared to rank 2, right? Then if that col3 is not important because it is not consistent, why not use distinct instead for only col1 and col2?

Best Answer

When ordered by col3, there is a chance you can get different values of col3 compared to rank 2, right?

Yes, if I understand correctly what you are asking.

Then if that col3 is not important because it is not consistent, why not use distinct instead for only col1 and col2?

The difference between query 1 and 2 and is that query 1 returns more information than 2. It includes columns col3 and col4 in the output.

While query-2 returns all the distinct values of col1 and col2, query-1 returns all the distinct values of col1 and col2 plus the values of col3 and col4 from the first row (ordered by col3) of each group (ie. each col1 + col2 combination).

Just for curiosity, if I have to use rank to tag distinct values for some columns only, and ignore whatever different values occurred on other columns, why not use distinct and just remove the other columns?

Yes, if you don't need the rest of the columns, but only col1 and col2, there is no difference.