Sql-server – SQL 2005 table data type “text” to nvarchar

sql serversql-server-2005type conversion

I have this column Remark which is data type text. The longest length is 153.

So is it safe to say that if I change this column from text to nvarchar(200), I won't lose any data?

SELECT LEN(LTRIM(RTRIM( CAST( Remark AS NVARCHAR(MAX) )))) FROM myTable
ORDER BY LEN(LTRIM(RTRIM( CAST( Remark AS NVARCHAR(MAX) )))) DESC

Best Answer

If you want to know what data is there, then:

Don't use LTRIM or RTRIM
Use DATALENGTH() (for NTEXT add / 2 to the end of that) instead of LEN

Now, converting from either TEXT or NTEXT to NVARCHAR won't result in data loss if the length is the same or larger than what was calculated assuming what is mentioned above.

HOWEVER, the real issue isn't what data is there, it's what does the app allow for. If the app allows for more than 200 characters, then 200 isn't a safe choice. Make sure that all sources that can insert or update data in this column enforce a limit of 200 characters. If even one does not, then you need to work with the devs to make sure any sources of input are updated to reflect a limit of 200, or whatever you agree on (just because nobody has yet used even 200 characters doesn't necessarily imply that there isn't a valid case for going above that limit, if the app allows for it).

I just had the thought that there could be a slight issue with maximum row size since the TEXT, NTEXT, and IMAGE datatypes, by default, are a 16 byte pointer in the row with the actual data off row. Since you only get approximately 8060 bytes per row, if you were less than 400 bytes from that limit, then changing TEXT (requiring 16 bytes) to NVARCHAR(200) (requiring 400 bytes) could potentially result in an error. However, I just tried it and doesn't error; it allows for going above the 8060 byte limit (keep in mind that the max row size is still approx. 8060, but any extra data will be placed on ROW_OVERFLOW pages).

USE [tempdb];

-- DROP TABLE dbo.AlterColumnTest;
CREATE TABLE dbo.AlterColumnTest
(
  [ID] INT NOT NULL PRIMARY KEY,
  [EnglishSource] VARCHAR(50) COLLATE Latin1_General_CI_AS,
  [EnglishText] TEXT COLLATE Latin1_General_CI_AS,
  [HebrewSource] VARCHAR(50) COLLATE Hebrew_CI_AS,
  [HebrewText] TEXT COLLATE Hebrew_CI_AS,
  [Spacer] CHAR(7500) NOT NULL DEFAULT ('')
);

DECLARE @TestString VARBINARY(100);
SET @TestString = 0x4120746573743A20C0C620CBE0E2E8EC20F0F4F92021;
SELECT CONVERT(VARCHAR(50), @TestString); -- A test: ÀÆ Ëàâèì ðôù !

INSERT INTO dbo.AlterColumnTest ([ID], [EnglishSource], [HebrewSource])
VALUES (1, @TestString, @TestString);

SELECT * FROM dbo.AlterColumnTest;

UPDATE dbo.AlterColumnTest
SET    [EnglishText] = [EnglishSource],
       [HebrewText] = [HebrewSource];

SELECT * FROM dbo.AlterColumnTest;

ALTER TABLE dbo.AlterColumnTest
  ALTER COLUMN [EnglishText] NVARCHAR(200) COLLATE Latin1_General_CI_AS;

ALTER TABLE dbo.AlterColumnTest
  ALTER COLUMN [HebrewText] NVARCHAR(200) COLLATE Hebrew_CI_AS;

SELECT * FROM dbo.AlterColumnTest;

SELECT *
FROM   sys.columns sc
WHERE  sc.[object_id] = OBJECT_ID(N'dbo.AlterColumnTest');

Values for @TestString taken from:

Related Solutions

SQL Server to MySQL – How to Migrate Using MySQL Workbench Tool

Well, since you have data currently stored in SQL Server, and it's already in an NVARCHAR column, then either it's an NVARCHAR <= 4000 (in which case you can't lose any data, and should just change all instances of NVARCHAR(8000) to NVARCHAR(4000)), or it's an NVARCHAR(MAX) column, in which case you change all instances of NVARCHAR(8000) to NVARCHAR(MAX). Or just leave out those CASTs - do you really need them?

As an aside, you should probably change as NTEXT to as NVARCHAR(MAX) as well.

SQL Server UNPIVOT – Why Datatype Length Must Match

What is the logic behind requiring the same length when using UNPIVOT?

This question may only be truly answerable by the people who worked on the implementation of UNPIVOT. You might be able to obtain this by contacting them for support. The following is my understanding of the reasoning, which may not be 100% accurate:

T-SQL contains any number of instances of weird semantics and other counter-intuitive behaviours. Some of these will eventually go away as part of deprecation cycles, but others may never be 'improved' or 'fixed'. Quite aside from anything else, applications exist that depend on these behaviours, so backward compatibility has to be preserved.

The rules for implicit conversions, and expression type derivation account for a significant proportion of the weirdness mentioned above. I do not envy the testers who have to ensure that the weird (and often undocumented) behaviours are preserved (under all combinations of SET session values and so on) for new versions.

That said, there is no good reason not to make improvements, and avoid past mistakes, when introducing new language features (with obviously no backward compatibility baggage). New features like recursive common table expressions (as mentioned by Andriy M in a comment) and UNPIVOT were free to have relatively sane semantics and clearly-defined rules.

There will be a range of views as to whether including the length in the type is taking explicit typing too far, but personally I welcome it. In my view, the types varchar(25) and varchar(50) are not the same, any more than decimal(8) and decimal(10) are. Special casing string type conversion complicates things unnecessarily and adds no real value, in my opinion.

One could argue that only implicit conversions that might lose data should be required to be explicitly stated, but there are edge-cases there too. Ultimately, a conversion is going to be needed, so we might as well make it explicit.

If the implicit conversion from varchar(25) to varchar(50) were allowed, it would just be another (most likely hidden) implicit conversion, with all the usual weird edge cases and SET setting sensitivities. Why not make the implementation the simplest and most explicit possible? (Nothing is perfect, however, and it is a shame that hiding varchar(25) and varchar(50) inside a sql_variant is allowed.)

Rewriting the UNPIVOT with APPLY and UNION ALL avoids the (better) type behaviour because the rules for UNION are subject to backward compatibility, and are documented in Books Online as allowing different types so long as they are comparable using implicit conversion (for which the arcane rules of data type precedence are used, and so on).

The workaround involves being explicit about the data types and adding explicit conversions where necessary. This looks like progress to me :)

One way to write the explicitly-typed workaround:

SELECT
    U.PersonId,
    U.ColumnName,
    U.Value
FROM dbo.People AS P
CROSS APPLY
(
    VALUES (CONVERT(varchar(50), Lastname))
) AS CA (Lastname)
UNPIVOT
(
    Value FOR
    ColumnName IN (P.Firstname, CA.Lastname)
) AS U;

Recursive CTE example:

-- Fails
WITH R AS
(
    SELECT Dummy = 'A row'
    UNION ALL
    SELECT 'Another row'
    FROM R
    WHERE Dummy = 'A row'
)
SELECT Dummy
FROM R;

-- Succeeds
WITH R AS
(
    SELECT Dummy = CONVERT(varchar(11), 'A row')
    UNION ALL
    SELECT CONVERT(varchar(11), 'Another row')
    FROM R
    WHERE Dummy = 'A row'
)
SELECT Dummy
FROM R;

Finally, note that the rewrite using CROSS APPLY in the question is not quite the same as the UNPIVOT, because it does not reject NULL attributes.

Best Answer

Related Solutions

SQL Server to MySQL – How to Migrate Using MySQL Workbench Tool

SQL Server UNPIVOT – Why Datatype Length Must Match

Related Question