I have a task to programmatically generate 3 million different rows of data and insert them into a database table for test purposes.
I have written a VB.NET program that handles the generation of row data which can then be either INSERT-ed a row at a time straight into my database or written as a comma delimited row in a .csv file.
My issue is that some of the rows required a joining row added in another table, and the join is done an IDENTITY column value from my first table, and I won't know what that value is until after the row has been inserted.
So I have been using the VB.NET program to insert rows one by one, adding the joining row in the other table if required. This was working at a reasonable speed until the main table reached around 2 million rows, at which point it started taking much longer to INSERT a new row.
What is the best practice for achieving my goal? I have looked a little at Bulk Insert and OPENROWSET, but cannot figure out how to use them to populate two tables with different amounts of rows and a joining column which I don't know the value of.
Best Answer
Your table structure looks like this?:
Look at this answer.
If you return the id from the newly inserted row of Table1 you could use that to insert the dependent row into Table2 without querying Table1 for every row.
If the select for the newly generated id is your bottleneck this should help. If you're full scanning your main table for every row, indexing the id column, if not already done, could also decrease the time needed for the lookup.