Well, since you have data currently stored in SQL Server, and it's already in an NVARCHAR
column, then either it's an NVARCHAR <= 4000
(in which case you can't lose any data, and should just change all instances of NVARCHAR(8000)
to NVARCHAR(4000)
), or it's an NVARCHAR(MAX)
column, in which case you change all instances of NVARCHAR(8000)
to NVARCHAR(MAX)
. Or just leave out those CASTs
- do you really need them?
As an aside, you should probably change as NTEXT
to as NVARCHAR(MAX)
as well.
What is the logic behind requiring the same length when using UNPIVOT?
This question may only be truly answerable by the people who worked on the implementation of UNPIVOT
. You might be able to obtain this by contacting them for support. The following is my understanding of the reasoning, which may not be 100% accurate:
T-SQL contains any number of instances of weird semantics and other counter-intuitive behaviours. Some of these will eventually go away as part of deprecation cycles, but others may never be 'improved' or 'fixed'. Quite aside from anything else, applications exist that depend on these behaviours, so backward compatibility has to be preserved.
The rules for implicit conversions, and expression type derivation account for a significant proportion of the weirdness mentioned above. I do not envy the testers who have to ensure that the weird (and often undocumented) behaviours are preserved (under all combinations of SET
session values and so on) for new versions.
That said, there is no good reason not to make improvements, and avoid past mistakes, when introducing new language features (with obviously no backward compatibility baggage). New features like recursive common table expressions (as mentioned by Andriy M in a comment) and UNPIVOT
were free to have relatively sane semantics and clearly-defined rules.
There will be a range of views as to whether including the length in the type is taking explicit typing too far, but personally I welcome it. In my view, the types varchar(25)
and varchar(50)
are not the same, any more than decimal(8)
and decimal(10)
are. Special casing string type conversion complicates things unnecessarily and adds no real value, in my opinion.
One could argue that only implicit conversions that might lose data should be required to be explicitly stated, but there are edge-cases there too. Ultimately, a conversion is going to be needed, so we might as well make it explicit.
If the implicit conversion from varchar(25)
to varchar(50)
were allowed, it would just be another (most likely hidden) implicit conversion, with all the usual weird edge cases and SET
setting sensitivities. Why not make the implementation the simplest and most explicit possible? (Nothing is perfect, however, and it is a shame that hiding varchar(25)
and varchar(50)
inside a sql_variant
is allowed.)
Rewriting the UNPIVOT
with APPLY
and UNION ALL
avoids the (better) type behaviour because the rules for UNION
are subject to backward compatibility, and are documented in Books Online as allowing different types so long as they are comparable using implicit conversion (for which the arcane rules of data type precedence are used, and so on).
The workaround involves being explicit about the data types and adding explicit conversions where necessary. This looks like progress to me :)
One way to write the explicitly-typed workaround:
SELECT
U.PersonId,
U.ColumnName,
U.Value
FROM dbo.People AS P
CROSS APPLY
(
VALUES (CONVERT(varchar(50), Lastname))
) AS CA (Lastname)
UNPIVOT
(
Value FOR
ColumnName IN (P.Firstname, CA.Lastname)
) AS U;
Recursive CTE example:
-- Fails
WITH R AS
(
SELECT Dummy = 'A row'
UNION ALL
SELECT 'Another row'
FROM R
WHERE Dummy = 'A row'
)
SELECT Dummy
FROM R;
-- Succeeds
WITH R AS
(
SELECT Dummy = CONVERT(varchar(11), 'A row')
UNION ALL
SELECT CONVERT(varchar(11), 'Another row')
FROM R
WHERE Dummy = 'A row'
)
SELECT Dummy
FROM R;
Finally, note that the rewrite using CROSS APPLY
in the question is not quite the same as the UNPIVOT
, because it does not reject NULL
attributes.
Best Answer
If you want to know what data is there, then:
Don't use
LTRIM
orRTRIM
Use
DATALENGTH()
(forNTEXT
add/ 2
to the end of that) instead ofLEN
Now, converting from either
TEXT
orNTEXT
toNVARCHAR
won't result in data loss if the length is the same or larger than what was calculated assuming what is mentioned above.HOWEVER, the real issue isn't what data is there, it's what does the app allow for. If the app allows for more than 200 characters, then 200 isn't a safe choice. Make sure that all sources that can insert or update data in this column enforce a limit of 200 characters. If even one does not, then you need to work with the devs to make sure any sources of input are updated to reflect a limit of 200, or whatever you agree on (just because nobody has yet used even 200 characters doesn't necessarily imply that there isn't a valid case for going above that limit, if the app allows for it).
I just had the thought that there could be a slight issue with maximum row size since the
TEXT
,NTEXT
, andIMAGE
datatypes, by default, are a 16 byte pointer in the row with the actual data off row. Since you only get approximately 8060 bytes per row, if you were less than 400 bytes from that limit, then changingTEXT
(requiring 16 bytes) toNVARCHAR(200)
(requiring 400 bytes) could potentially result in an error. However, I just tried it and doesn't error; it allows for going above the 8060 byte limit (keep in mind that the max row size is still approx. 8060, but any extra data will be placed on ROW_OVERFLOW pages).Values for
@TestString
taken from: