SQL Server – How to Collect NTFS File Properties and Insert into Table

powershellsql serversql-server-2012

I have to provide reports on file system usage.

I'm collecting statistics on file server usage down to individual file level so we can see who is using what files/folders, how much storage they're using, how many files they have, when they were created and last used.

To do this I have 2 powershell scripts.

The first reads through the file system and captures the attributes I want and saves them to a file.

dir -rec G:\ | Select LastWriteTime, Directory, Name, Extension, Length, @{Name="Owner";Expression={get-acl $_.FullName| select Owner}} | export-csv FileInfo.csv

The 2nd script reads the csv file and inserts the data into a table.

Once the data is in SQL I can parse the text and split it into various columns and and then produce a variety of reports and analyse the data in different ways. My approach works but it's cumbersome.

Is there a better way to collect NTFS information and save it into SQL Server? What are the alternatives? SSIS?

Edit: Could this all be combined to operate together in a single process?

Best Answer

SSIS is well equipped to handle CSV files and load them into SQL Server.

You can have a very simple package using the Flat File Source.

The dialogue and setup is a familiar windows "wizard" like process, and most of it is automated... what you need to pay attention to is that it has correctly guessed your file for lengths and data types. You can either adjust the settings in the connection manager or you can later change data types with SSIS tasks. Note that if you have say 10,000 rows of integers and then start getting characters the flat file source may easily assign an integer data type to that column, then fail when it encounters the characters. Thus with large files that may not be well structured you have to pay more attention to these settings. The Suggest Types... button allows you to increase the number of inspected rows, but I have found that even this can still recommend the wrong data types.

SSIS is a huge tool and you can perform data clean-up tasks or even split data into different tables from the single CSV. If you have different tables use tasks like Multicast or Conditional Split. You may also find that Data Conversion and Derived Column can help you efficiently produce the data you need as it moves through your package.

I wouldn't do much more than clean, split, modify, and load the data into SQL Server with SSIS though. SQL Server is highly optimized to produce aggregates, sorts, etc., while SSIS is less capable for such tasks. Tasks like Aggregate are blocking transforms which essentially means it can stall your SSIS package and consume a lot of memory.

As an example the below SSIS dataflow performs the following tasks:

Reads a CSV file
Creates derived columns which are just trimmed versions of the originals
Performs a look-up to see if the record already exists in the destination
If the record was not found then it is inserted in the destination

Related Solutions

Sql-server – import csv data stored in a blob column

Is it possible to use a stored procedure to read the file inside the blob column, loop on each line and insert the data in a table?

Definitely.

However, I would consider rethinking this plan.

A SQL Server database generally isn't a great place to be storing BLOBs, particularly given that you're just going to turn around and process them into row data later. It's a lot of extra disk activity and (presumably more expensive) storage that you just don't need to use. Also, the kind of processing being proposed will almost certainly perform worse than the many direct-processing alternatives. Generally speaking, the less processing you need to do (and the simpler the process itself), the better performance you'll get out of the system. And it will probably be more reliable as well.

Why not turn the files into row data immediately? Are you concerned about blocking or latency of the client application? If that's the only concern, consider setting up an asynchronous queuing system, possibly by using Service Broker. You can use BULK INSERT to turn the CSV files into row data directly from the file system without first loading the files as BLOBs. If this is going to blast the CPU during load when you need to run other things on the same server, consider using Resource Governor if you're on Enterprise Edition.

If you have to process the files in batch at night due to other constraints, it may be better to simply direct the raw files into a named (YYYYMMDD) folder on a network share during the day, and then once/day use an SSIS package with a Foreach File Enumerator container to process the files. I suppose this could also work in a job-based scenario where you just fire it up every 15 minutes or so to process and remove files that landed in the folder in the last period. SSIS may also be a good solution if you need some kind of transformation process to happen between the raw files and the row data.

There are lots of different possibilities here depending on your exact requirements, but I think I've given enough of the more common elements that you can piece together a solution that will work best for your situation.

SQLCMD – Create Output File Names Based on Date or Day of the Week

Below script will help you. Just save it as a .bat file.

@echo off
setlocal
set timehour=%time:~0,2%
sqlcmd -S SQL-CLUST1 -E -Q "SELECT * FROM TABLE" -o report-%date:~-4,4%%date:~-10,2%%date:~-7,2%-%timehour: =0%%time:~3,2%.txt

If you have xp_cmdshell enabled, then it is much easier :

DECLARE       @sqlCommand   VARCHAR(max)
DECLARE       @filePath     VARCHAR(100)
DECLARE       @fileName     VARCHAR(100)

SET    @filePath = 'C:\Temp\'

SET    @fileName = 'Output_' +
       + CONVERT(VARCHAR, GETDATE(), 112) + '_' +
         CAST(DATEPART(HOUR, GETDATE()) AS VARCHAR) + '_' +
         CAST(DATEPART(MINUTE,GETDATE()) AS VARCHAR) + '.txt'

SET    @sqlCommand =
       'SQLCMD -S server_name -E -d master -q "select @@servername" -o "' +
       @filePath + @fileName +
       '" -h-1'


PRINT       @sqlCommand

--EXEC   master..xp_cmdshell @sqlCommand
GO

Best Answer

Related Solutions

Sql-server – import csv data stored in a blob column

SQLCMD – Create Output File Names Based on Date or Day of the Week

Related Question