Sql-server – Saving into a VARCHAR(MAX) throwing error in some scenarios

encodingnative-clientsql serverwindows

We are noticing an error like this:

InterfaceError: ('HY104', 0, '[Microsoft][SQL Server Native Client 11.0]
Invalid precision value', 10226)

It happens when we insert a certain length of characters along with Unicode content like \u3000 (Ideographic Space). This happens only we are using the Windows MSSQL Driver and the query is parametrized.

The column is a VARCHAR(MAX). So it should support 2GB.

  • What could be the reason for this?
  • Why is it that length matters?

If I just insert this Unicode character alone, it won't cause an issue!

I was trying to see this from a theoretical point of view. Switching to NVARCHAR is not an option for us currently, but removing Unicode is.

Here are some more details as per the questions asked by Solomon Rutzky:

  1. Column Collation is SQL_Latin1_General_CP1_CI_AS
  2. String length where I start seeing the error is 4019.
  3. sqlalchemy.connectors.mxodbc, line 86, in error_handler is where I get the initial error. This is from the traceback.
  4. It works fine with single Unicode character.
  5. A 20000 character string without Unicode works fine.
  6. It makes it ASCII from when I try this from this software (Web App). I tested in Dbeaver by exporting the data.

Best Answer

It would greatly help to know:

  1. the current Collation of the VARCHAR(MAX) column
  2. the exact string length (in bytes) at which you start seeing the error
  3. What software is reporting this error?
  4. Does the error occur if only a single Unicode character is present?
  5. Does a 10,000 character string with no Unicode characters get inserted correctly?
  6. For a short-enough string that gets inserted without error yet contains one of these Unicode characters, what value exactly is in the table? You can check by converting the column to VARBINARY(MAX). Did the Unicode character get translated or something else?

It is possible that the string that errors is just over a 4000 or 8000 character limit that then requires using a MAX type and the driver is initially guessing (or being told) that it is one size and then finding out that it is another. The error message says "invalid precision value" which has to be referring to how the column is being configured (i.e. similar to setting up a SqlParameter in .NET and declaring the max size, such as new SqlParameter("@name", SqlDbType.VarChar, 8000)). So this type of error should only really happen if the software is attempting to set up the column as VARCHAR of over 8000 or NVARCHAR of over 4000, as either of those conditions would be an invalid "precision". Now, in order to get such a number I suspect that some code in there is counting the number of bytes (i.e. DATALENGTH ) of the string, which will be larger than the number of characters. Of course, this would be true of any character in a .NET string or C++ wchar, so I am suspecting that the difference with having some Unicode characters vs not having any is that without any Unicode characters it can convert the string to an 8-bit encoding (i.e. single byte for use with VARCHAR), but somehow having a Unicode character prevents that. It's a long-shot (until there is more info provided), but it is pretty clear where the error is occurring.

If I just insert the character alone it won't cause an issue.

How are you inserting the character? Through the same software that is generating the error, or in SSMS? If you are doing this manually via an INSERT statement, then that isn't a good test as SQL Server converts U+3000 into a regular space, U+0020 (one byte), instead of two questions marks (still 2 bytes, as would happen if there was no equivalent character):

SELECT N'-' + NCHAR(0x3000) + N'-',
       CONVERT(VARCHAR(MAX), N'-' + NCHAR(0x3000) + N'-'),
       CONVERT(VARBINARY(MAX), CONVERT(VARCHAR(MAX), N'-' + NCHAR(0x3000) + N'-'));

returns:

- -    - -    0x2D202D

If possible, trying passing in both of the strings generated by the following code via the software, not via SSMS:

DECLARE @String4k NVARCHAR(MAX);
SET @String4k = REPLICATE(N'a', 3999) + NCHAR(0x3000);
SELECT LEN(@String4k);

DECLARE @String8k NVARCHAR(MAX);
SET @String8k = REPLICATE(CONVERT(NVARCHAR(MAX), N'a'), 7999);
SET @String8k += NCHAR(0x3000);
SELECT LEN(@String8k);

The @String4k string is 3999 characters that can convert cleanly into an 8-bit encoding (i.e. VARCHAR) plus the U+3000 character that will likely remain as 2 bytes. So maybe this comes across, through that software, as 4001 characters. I doubt that this is the issue, but can't hurt to test.

The @String8k string is 7999 characters that can convert cleanly into an 8-bit encoding (i.e. VARCHAR) plus the U+3000 character that will likely remain as 2 bytes. So maybe this comes across, through that software, as 8001 characters.