SQL Server – Using Non-Printable Characters in BCP XML Format File as TERMINATOR

bcpencodingsql serverxml

I have a flat-text data file containing records which fields are separated by the non printable character "File Separator" (0x1c). I am trying to use SQL Server's bcp utility to load this data into my database. Yet, when using the hex encoded value of the File Separator as TERMINATOR, I get a syntax error.

I have tried using

  • the hex encoded value : "0x1c"
  • the XML encoded value as hex: ""
  • the XML encoded value as decimal: ""

None of these work, yet when using the same encoding for a printable character, like tab, this does work: 	, 	 (0x9 doesn't. Not unsurprisingly, since this is an XML file.)

The conclusion seems to be that non-printable characters are not supported. Is this the case? That would be ironical, since the non-printable separator characters are created exactly for this purpose…

Below you can find all code to reproduce this issue:

XML format file: test.xml

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="&#x1c;" MAX_LENGTH="10" COLLATION="Latin1_General_CS_AS_WS"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="COL1" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="2" NAME="COL2" xsi:type="SQLNUMERIC" PRECISION="4" SCALE="0"/>
 </ROW>
</BCPFORMAT>

Data file: test.txt
This is just one row, as a test case. StackExchange doesn't show the separator in the row below, yet when you click "Edit" for this post, the separator is included, and you should be able to copy-paste this.

1111111112008

Commandline

bcp TEST_DB.dbo.UL_TEST in "test.txt" -T -f "test.xml"

Best Answer

It seems, from the documentation about BCP - Specify Field and Row Terminators (SQL Server), that non-printable characters are not supported:

Characters Supported As Terminators

The bcp command, BULK INSERT statement, and the OPENROWSET bulk rowset provider support a variety of characters as field or row terminators and always look for the first instance of each terminator. The following table lists the supported characters for terminators.

enter image description here

As an option, you could use POWERSHELL to read in the source file and search\replace the 0x1C characters with some other character like a tilda (~) and output to a different file. Then use that character as your terminator.

(Get-Content c:\test\test.txt) | ForEach-Object {$_ -replace [char]0x1C,'~'} | Set-Content c:\test\testout.txt