Sql-server – Best Practice for SQL Server data validation

data validationsql serversql server 2014ssisssis-2014

I have CSV files that I would like to import into SQL Server 2014. Each record in the files has around 20 columns, each of them should be validated for data type compliance as well as some other checks per column (min/max ranges, null/not null etc.).

I'm looking for advice regarding a model architecture on doing this in a efficient way. Using SSIS and some error tables seems to me a bit inefficient since it looks as if I only can get one column checked at a time and the error handling (especially the column lockup in SSIS is clumsy at best).

I would prefer to have each record processed and right away have an overview of all errors that are in that record, so that I can provide a feedback of all issues at once to the people providing the data for import.

Best Answer

SSIS is, indeed, clunky when it comes to importing data for all the processing prowess that it seems to tout. One of the commenters, Kin I believe, had it right with creating a temporary table. Here's the code we use to import data directly from Excel into a temporary table to be parsed for compliance. As you can see in the example below, using T-SQL, OPENROWSET reads the specific tab on the Excel Spreadsheet called SQL Server Instance Details and imports the data into the SQL Table TempServerMap.

After, the data is imported into the temporary table you have a world of options. You can insert the data into another table that has all the SQL business rules you need on each of the columns--kicking out the data that doesn't qualify--I say this not as a necessarily "best way" but for using the least amount of code in the beginning fine tuning your process of finding errors. You'd need to control the error handling here to catch the bad apples (rows) so you can review them later. Alternately, you could create functions, or procedures, or triggers or all of the above that check the data in advance to make sure the do comply with business requirements.

And if you're looking to communicate those errors to others, just create an SSRS (Reporting Services) report that queries your new error table and give them the proper permissions to view the report. It's web based and easily accessible in an intranet environment. If you have external associates you can have SSRS e-mail the report to them just as easily.

The main point here is: get your data away from Excel and into SQL Server so it can be manipulated effectively.

Some DBAs might be offended at the replace/exec of the code below. If that's the case, just hardcode the select statement with specific excel file names.

declare  @cmdstring varchar(8000) ='select * into SQLSunSet.dbo.TempServerMap from OPENROWSET (''Microsoft.Ace.OLEDB.12.0'', ''Excel 12.0;Database=<MapsExcelFile_SQLServerUsageTracker>'', ''select * from [SQL Server Instance Details$]'')'
set @cmdstring = replace(@cmdstring,'<MapsExcelFile_SQLServerUsageTracker>',@MapsExcelFile_SQLServerUsageTracker)

if exists (select * from sysobjects where type ='u' and name ='TempServerMap') drop table TempServerMap

        --print (@cmdstring)
     exec (@cmdstring)

Many people seem to turn to SSIS in hopes of not coding T-SQL. But I usually find that SSIS is like faulty four-wheel drive. It just get's you stuck more than you would have been a lot farther down the road.

P.S. to use OPENROWSET effectively you might need to install the Microsoft Access Database Engine 2010 Redistributable (Really--a Microsoft Office Engine--not just for MS Access). This can be downloaded here:

https://www.microsoft.com/en-us/download/details.aspx?id=13255

Additionally, you might need to configure SQL Server to run OPENROWSET and also use the Office engine within T-SQL using the following code after it's installed:

exec sp_configure 'Show Advanced Options', 1;
RECONFIGURE;
GO

exec sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;
GO

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1 
GO 

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1 
GO 
Related Question