Sql-server – SQL Server 2005 – Generate and Bulk Insert 3 million rows across 2 tables

bulk-insertinsertopenrowsetsql-server-2005

I have a task to programmatically generate 3 million different rows of data and insert them into a database table for test purposes.

I have written a VB.NET program that handles the generation of row data which can then be either INSERT-ed a row at a time straight into my database or written as a comma delimited row in a .csv file.

My issue is that some of the rows required a joining row added in another table, and the join is done an IDENTITY column value from my first table, and I won't know what that value is until after the row has been inserted.

So I have been using the VB.NET program to insert rows one by one, adding the joining row in the other table if required. This was working at a reasonable speed until the main table reached around 2 million rows, at which point it started taking much longer to INSERT a new row.

What is the best practice for achieving my goal? I have looked a little at Bulk Insert and OPENROWSET, but cannot figure out how to use them to populate two tables with different amounts of rows and a joining column which I don't know the value of.

Best Answer

Your table structure looks like this?:

Table1
id (identity)
data1
data2
data3

Table 2
id (identity)
t1id (fk to table1)
data1
data2
...

-- Main Table insert
INSERT INTO table1 (data1,data2,data3)
VALUES('dummy1','dummy2','dummy3');

--get last inserted id
var=SELECT SCOPE_IDENTITY()

-- dependent insert into table2
INSERT INTO table2 (t1id,data1,data2,...)
VALUES(var,'dummy1','dummy2','...');

Look at this answer.
If you return the id from the newly inserted row of Table1 you could use that to insert the dependent row into Table2 without querying Table1 for every row.
If the select for the newly generated id is your bottleneck this should help. If you're full scanning your main table for every row, indexing the id column, if not already done, could also decrease the time needed for the lookup.