Sql-server – SSIS – Best way to insert large number of rows

csql serverssisxml

So here's the scenario: I have an XML file, which is in size of 500GB, and with data of around 600 million rows (once on a database table). I'm using SSIS for the operation, and since it consumes a REALLY large amount of memory if I am to use an SSIS component (i.e.: XML Source), it might cause a timeout (correct me if I'm wrong, but as far as I know, using the components on SSIS loads the content of the XML into memory – with a file that big it will surely cause errors). My approach then is:

  • Use a Script Task to parse the XML data using XML Reader (XML Reader
    by far is the best approach, since it parses the XML on a forward,
    non-cached approach)
  • Insert the data on a DataTable
  • Every 500,000 rows on the DataTable, insert the contents to the
    database using SqlBulkCopy, then clear the contents of the DataTable

My problem is, currently, I tried it to parse another file with the size of 200GB, and it's running on around 13.5M / 1 hour – and I don't know if it's still fine with that run time. It sure solves my problem – but it's not too elegant, I mean, there should be other ways.

I'm looking on other approaches, like:

  • Dividing the large XML files into small pieces of CSVs (around 20GB)
    then use an SSIS Data Flow task
  • Use INSERT script every new rows

Can you help me do decide which is best? Or suggest any other solutions.

Every answer will be very much appreciated.

EDIT

I forgot to mention – my approach will be dynamic. I mean, there are many tables that will be populated with large sized XML files. So, using a Script Component as source might be not so useful, since I still need to define the output columns. But still, will give it a try.

Best Answer

Have you considered using a Data Flow Task with a Script Component acting as a source? You could just do an AddRow for each row you consume out of the file, and let SSIS handle the batches.

Related Question