Sql-server – Using TOP in INSERT INTO statement

order-bysql servertop

I am reading a tutorial about TOP keyword in T-SQL. But I don't really understand the examples given in the tutorial.

Example 1:

INSERT TOP n INTO table_name1 (Col1, Col2)
SELECT Col3, Col4 FROM table_name2;

The tutorial mentions that the TOP here doesn't really achieve anything since the database doesn't have a concept of order, which I kind of get.

And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?

Example 2:

The tutorial gives another example that improves the first statement.

INSERT INTO table_name1 (Col1, Col2)
  SELECT TOP n  Col3, Col4
  FROM table_name2
Order By Col1;

Here is the confusing part. Putting the TOP keyword inside the SELECT statement is supposed to give us a better control over what rows get chosen and inserted through the ORDER BY clause. So I would assume the ORDER BY clause is part of the SELECT statement in the second example. But Col1 isn't part of table_name2.

This confused the hell out of me.
Thanks for any insight.

Best Answer

And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?

When an order isn't specified, it will usually use the sort fields and directions from the design of the primary key of table_name2, but is not guaranteed to. As Aaron Bertrand said

It doesn't really matter what the clustered index is. SQL Server may still return the order based on some other index based on a wide variety of factors. Creating a primary key on some column DOES NOT guarantee that selects with no order by will suddenly come back ordered by that column always. Might you observe that most of the time? Sure. But that is not the same as a guarantee. I've never seen a polar bear on my street but there is no polar bear forcefield that prevents it from happening. – Aaron Bertrand♦ Sep 14 '11 at 20:41

It certainly wouldn't hurt to add your order clause in, even if you want them to return in the index order of the PK.

From the tutorial example

INSERT INTO table_name1 (Col1, Col2)
  SELECT TOP n Col3, Col4
  FROM table_name2 
Order By Col1;

Gaaaah. That's horrible coding. That will order by Col1 ASC of table_name2, even though your SELECT clause isn't explicitly returning a Col1 from the second table. It is NOT ordered by Col3 of the 2nd table aka Col1 of the 1st table. The MS docs for INSERT INTO do not list an ORDER BY clause - the ORDER BY is part of the inner SELECT clause.

It's always better to explicitly specify which table you are inserting into and selecting from. E.g. this would be a slightly better example.

INSERT INTO
  table_name1 (Col1, Col2)
  SELECT TOP 3
    t2.Col3
    ,t2.Col4
  FROM 
    table_name2 AS t2
  ORDER BY
    t2.Col1 ASC, t2.Col3 ASC, t2.Col4 ASC

After this alteration there's no confusion about what Col1 means (nor the sort order of it). I'd still recommend throwing in a sort order of Col3 and Col4 if Col1 isn't unique for all entries in table_name2, as demonstrated above. You cannot alias the table that records are inserted into (e.g. table_name1 AS t1 (t1.Col1, t1.Col2) as that is not legal syntax for existing versions of SQL Server.

Opinion-based recommendation: avoid the tutorial website that you found.