Sql-server – Bulk insert from a complex text file into a table

bulk-insertsql servert-sql

I want to use bulk insert to load a text file into a table, but my coding is only able to load half of the text file because my text file has two parts:

  • (ID, CompanyName and Date) as one part; and
  • (Name Dept, Status, Age and Gender)

The (ID, CompanyName and Date) section appears only once in the file. I want it repeated in the rows when inserted to the table.

My coding is only able to load with one format text file.

Below is my text file example:

A1242
CompanyName
Date 2019-11-28 18:58:04
Name Department Status Age Gender
John R&D Active 23 Male
James R&D Active 21 Male

How can I use bulk insert to load the above text file into a table looking like the below example?

A1242 CompanyName 2019-11-28 18:58:04 John R&D Active 23 Male
A1242 CompanyName 2019-11-28 18:58:04 James R&D Active 21 Male

Below is my coding:

-- Procedure Load Text File
ALTER procedure [dbo].[ImportFiles]
@FilePath       varchar(1000) = 'c:\Transfer\' ,
@ArchivePath        varchar(1000) = 'c:\Transfer\Archive\' ,
@FileNameMask       varchar(1000) = 'bcp*.txt' ,
@MergeProc      varchar(128) = 'MergeBCPData'

AS

set nocount on

declare @ImportDate datetime

select @ImportDate = getdate()

declare @FileName       varchar(1000) ,
@File           varchar(1000),
declare @cmd varchar(2000)

create table ##Import (s varchar(8000), sd varchar(8000))
create table #Dir (s varchar(8000), sd varchar(8000))

/*****************************************************************/
-- Import file
/*****************************************************************/
select  @cmd = 'dir /B ' + @FilePath + @FileNameMask
delete #Dir
insert #Dir exec master..xp_cmdshell @cmd

delete #Dir where s is null or s like '%not found%'

while exists (select * from #Dir)
begin
select  @FileName = min(s) from #Dir
select  @File = @FilePath + @FileName

select  @cmd =      'bulk insert'
select  @cmd = @cmd +   ' ##Import' 
select  @cmd = @cmd +   ' from'
select  @cmd = @cmd +   ' ''' + replace(@File,'"','') + ''''
select  @cmd = @cmd +   ' with (FIELDTERMINATOR=''\t'''
select  @cmd = @cmd +   ',FIRSTROW = 5 '
select  @cmd = @cmd +   ',ROWTERMINATOR = ''\n'')'


truncate table ##Import

-- import the data
exec (@cmd)

-- remove filename just imported
delete  #Dir where s = @FileName

exec @MergeProc


-- Archive the file
select @cmd = 'move ' + @FilePath + @FileName + ' ' + @ArchivePath + 
@FileName
exec master..xp_cmdshell @cmd
end

drop table ##Import
drop table #Dir


--Procedure Insert
/****** Object:  StoredProcedure [dbo].[MergeBCPData]    Script Date: 
11/28/2019 
8:32:13 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER procedure [dbo].[MergeBCPData]

AS
set nocount on

-- insert data to production table
insert  BCPData
(
fld1 ,
fld2 ,
fld3 ,
fld5
--fld4
)
select  
fld1    = substring(s,1,12) , --testing
fld2    = substring(s,15,4) , --testing
fld3    = s ,
fld5    = s
--fld4  = convert(datetime,substring(s,26,19))
from    ##Import


--Exetuce the stored procedur
exec ImportFiles 'D:\Piccolo\' , 'D:\Piccolo\Archive\' , 'bcp*.txt', 
'MergeBCPData'

I cannot change the file format and external application is my backup plan.

Best Answer

SQL Server BULK INSERT is very limited in the types of files it can import and your type of file is way outside of what it can do.

The best approach is to have the file format changed.

Second best is to have an external application do the parsing of the file and upload data to table.

If none of the above is possible then you should use SQL Server solutions below:

  1. Assuming the "header" section appears only once in a file, load this file in two parts. Load the first part into a separate one column staging table (use LASTROW parameter). The second part would be loaded using your current approach. You will then join the data using tSQL.
  2. Create a staging table that would have an ID (will act as row number) and a single VARCHAR( MAX) column. Load the file into this table as is, without column splitting. Then, using SQL queries, manipulate the data into the format you need. The code will be rather ugly.

Some useful info: Splitting delimited strings

Update

SQL Server implementation option 1 would follow the steps below. Note: this is pseudo-code to avoid overly long code, that I have no way to test.

create table ##ImportHeader (RowID INT IDENTITY, RowData VARCHAR( 3000 ))
-- Load the first 3 rows into a separate table.
bulk insert ##ImportHeader from [YourFile] with( LASTROW = 3, FIELDTERMINATOR='\t' ROWTERMINATOR = '\n' )

-- Load the rest of the data
-- Your import table needs to have the same number of columns as the data you are loading
create table ##Import (RowID INT IDENTITY, Name VACRHAR( 200 ), Department VARCHAR( 200 ), Status VARCHAR( 10 ), Age INT, Gender VARCHAR( 10 ))
bulk insert ##Import from [YourFile] with( FIRSTROW = 5, FIELDTERMINATOR='\t' ROWTERMINATOR = '\n' )

-- Join data
SELECT CompanyID.RowData, CompanyName.RowData, CAST( SUBSTRING( CompanyDate.RowDate, 5, 50 ) AS DATETIME ),
   D.*
FROM ##Import AS D
    INNER JOIN ##ImportHeader AS CompanyID ON CompanyID.RowID = 1
    INNER JOIN ##ImportHeader AS CompanyName ON CompanyName.RowID = 2
    INNER JOIN ##ImportHeader AS CompanyDate ON CompanyDate.RowID = 3

What you are doing is loading the same file twice. First time you are loading first 3 rows and second time you are loading the rest of the structured data.