Sql-server – How to assign different random values to each row in a SELECT statement

sql serversql-server-2005

Please look at this code:

create table #t1(
  id int identity (1,1),
  val varchar(10)
);


insert into #t1 values ('a');
insert into #t1 values ('b');
insert into #t1 values ('c');
insert into #t1 values ('d');

Now, whenever you execute this

select *, 
    ( select top 1 val from #t1 order by NEWID()) rnd 
from #t1 order by 1;

you will get a result with where all rows have the same random value. e.g.

id          val        rnd
----------- ---------- ----------
1           a          b
2           b          b
3           c          b
4           d          b

I know a way using a cursor to loop throw the rows and get different random values, but that is not performant.

A clever solution to this is

select t1.id, t1.val, t2.val
from #t1 t1
    join (select *, ROW_NUMBER() over( order by NEWID()) lfd from #t1) as t2 on  t1.id = t2.lfd

But I simplified the query. The real query looks more like

select *, 
    ( select top 1 val from t2 where t2.x <> t1.y order by NEWID()) rnd 
from t1 order by 1;

and the simple solution doesn't fit. I'm looking for a way to force repeated evaluation of

( select top 1 val from #t1 order by NEWID()) rnd

without the use of cursors.

Edit:
Wanted output:

perhaps 1 call

id          val        rnd
----------- ---------- ----------
1           a          c
2           b          c
3           c          b
4           d          a

and a second call

id          val        rnd
----------- ---------- ----------
1           a          a
2           b          d
3           c          d
4           d          b

The value for each row just should be a random value independent from the other rows

Here is the cursor version of the code:

CREATE TABLE #res ( id INT, val VARCHAR(10), rnd VARCHAR(10));

DECLARE @id INT
DECLARE @val VARCHAR(10)
DECLARE c CURSOR FOR
SELECT id, val
FROM #t1
OPEN c
FETCH NEXT FROM c INTO @id, @val
WHILE @@FETCH_STATUS = 0
BEGIN
    INSERT INTO #res
    SELECT @id, @val, ( SELECT TOP 1 val FROM #t1 ORDER BY NEWID()) rnd 
    FETCH NEXT FROM c INTO @id, @val
END
CLOSE c
DEALLOCATE c

SELECT * FROM #res

Best Answer

A subquery is evaluated once if possible. I can't recall what the "feature" is called (folding?) sorry.

The same applies to GETDATE and RAND functions. NEWID is evaluated row by row because it in intrinsically a random value and should never generate the same value twice.

The usual techniques are to use use NEWID as input to CHECKSUM or as a seed to RAND

For random values per row:

SELECT
   co1l, col2,
   ABS(CHECKSUM(NEWID())) AS Random1,
   RAND(CHECKSUM(NEWID())) AS Random2
FROM
   MyTable

If you want random order:

SELECT
   co1l, col2
FROM
   MyTable
ORDER BY
   NEWID()

If you want random order with a row order too. ActualOrder order here is preserved regardless of the order of the resultset

SELECT
   id, val,
   ROWNUMBER() OVER (ORDER BY id) AS id
FROM
   #t1
ORDER BY
   NEWID()

Edit:

In this case, we can state the requirement as:

return any random value from the set for each row in the set

the random value will be different from the actual value in any row

This is different to what I offered above which simply re-orders rows in various ways

So, I'd consider CROSS APPLY. The WHERE clause force row by row evaluation and avoids the "folding" issue and ensures that val and rnd are always different. CROSS APPLY can scale quite well too

SELECT
   id, val, R.rnd
FROM
   #t1 t1
   CROSS APPLY
   (SELECT TOP 1 val as rnd FROM #t1 t2 WHERE t1.val <> t2.val ORDER BY NEWID()) R
ORDER BY
   id

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

I found out what the problem is, it seems that GoldenGate doesn't work with SQL Express. The server I was connecting to is SQL Express, I'll need to use the Enterprise Edition.

Sql-server – Static Cursor and Where current of

The main difference seems to be how each approach finds the row to be updated. The STATIC Cursor copies the full result set to a hidden temporary table first (hence why it is read-only), so it would seem to be less efficient to then have to re-query the main table for each UPDATE. However, the Positioned Update seems to have quite a bit more in Logical Reads and operations. One advantage of the Positioned Update, however, is noted in the MSDN page for UPDATE:

CURRENT OF

Specifies that the update is performed at the current position of the specified cursor.

A positioned update using a WHERE CURRENT OF clause updates the single row at the current position of the cursor. This can be more accurate than a searched update that uses a WHERE clause to qualify the rows to be updated. A searched update modifies multiple rows when the search condition does not uniquely identify a single row.

Test Setup

SET NOCOUNT ON;
-- DROP TABLE ##CursorTest;
CREATE TABLE ##CursorTest ([ID] INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
                           [Val] INT NOT NULL);
INSERT INTO ##CursorTest ([Val]) VALUES (1), (1), (1), (1);

Updateable CURSOR and WHERE CURRENT OF

UPDATE ##CursorTest SET [Val] = 1;
SELECT * FROM ##CursorTest;

SET STATISTICS IO ON;
DECLARE curTest CURSOR TYPE_WARNING
  LOCAL
  FORWARD_ONLY
  KEYSET -- removing only reduces logical reads by 4
  SCROLL_LOCKS
  --OPTIMISTIC 
FOR
  SELECT [ID] FROM ##CursorTest WHERE [Val] < 5
  FOR UPDATE OF [Val];

DECLARE @ID INT;
OPEN curTest;

FETCH NEXT
FROM  curTest
INTO  @ID;

WHILE (@@FETCH_STATUS = 0)
BEGIN
  UPDATE tmp
  SET    tmp.[Val] = tmp.[Val] + 2
  FROM   ##CursorTest tmp
  WHERE CURRENT OF curTest;

  FETCH NEXT
  FROM  curTest
  INTO  @ID;
END;

CLOSE curTest;
DEALLOCATE curTest;
SET STATISTICS IO OFF;

SELECT * FROM ##CursorTest;

Results:

Table 'Worktable'. Scan count 0, logical reads 8
Table '##CursorTest'. Scan count 1, logical reads 2
Table '##CursorTest'. Scan count 1, logical reads 2
Table 'Worktable'. Scan count 1, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 1, logical reads 2
Table 'Worktable'. Scan count 1, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 1, logical reads 2
Table 'Worktable'. Scan count 1, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 1, logical reads 2
Table 'Worktable'. Scan count 1, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 1, logical reads 2

Removing the KEYSET option did reduce the logical reads by 4 (I believe), but that might not be a savings on a more complicated query, possibly with JOINs.

Also, switching SCROLL_LOCKS to be OPTIMISTIC increased the Logical Reads.

STATIC Cursor and standard UPDATE

UPDATE ##CursorTest SET [Val] = 1;
SELECT * FROM ##CursorTest;

SET STATISTICS IO ON;
DECLARE curTest CURSOR TYPE_WARNING
  LOCAL
  FORWARD_ONLY
  STATIC
  OPTIMISTIC 
FOR
  SELECT [ID] FROM ##CursorTest WHERE [Val] < 5;

DECLARE @ID INT;

OPEN curTest;

FETCH NEXT
FROM  curTest
INTO  @ID;

WHILE (@@FETCH_STATUS = 0)
BEGIN
  UPDATE tmp
  SET    tmp.[Val] = tmp.[Val] + 2
  FROM   ##CursorTest tmp
  WHERE  tmp.[ID] = @ID;

  FETCH NEXT
  FROM  curTest
  INTO  @ID;
END;

CLOSE curTest;
DEALLOCATE curTest;
SET STATISTICS IO OFF;

SELECT * FROM ##CursorTest;

Results:

Table 'Worktable'. Scan count 0, logical reads 8
Table '##CursorTest'. Scan count 1, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2
Table '##CursorTest'. Scan count 0, logical reads 2
Table 'Worktable'. Scan count 0, logical reads 2

These simple tests seem to show the STATIC Cursor and regular UPDATE being the better option, and a more complicated query for the Cursor might be an even bigger difference (assuming you are able to update based on the Clustered Key of the target table).

But, if you have a situation where you can't narrow down to a individual row / have no Key value to use, then the Positioned Update would be quite handy.

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

Sql-server – Static Cursor and Where current of

Related Question