I've been looking at how to decode Base64 with SQL Server and after searching around many solutions online (some from here) seem to be based upon this sort of method.
SELECT CAST(CAST('Base64StringHere' as XML ).value('.','varbinary(max)') AS VARCHAR(250))
When I have ASCII text this works perfectly. However when I have the following French text it gets corrupted (presumably due to the restrictions of VARCHAR).
Où est le café le plus proche?
T8O5IGVzdCBsZSBjYWbDqSBsZSBwbHVzIHByb2NoZT8=
And provides the following output.
Où est le café le plus proche?
I thought the relatively simple fix would be to change the CAST
to NVARCHAR
but this results in corruption again.
SELECT CAST(CAST('T8O5IGVzdCBsZSBjYWbDqSBsZSBwbHVzIHByb2NoZT8=' as XML ).value('.','varbinary(max)') AS NVARCHAR(250) )
썏₹獥⁴敬挠晡꧃氠汰獵瀠潲档㽥
My search engine skills may be failing me, but I cannot seem to find someone else who has my problem.
Any thoughts?
Best Answer
The problem is that you encoded a UTF-8 encoded string into Base64. So, decoding the Base64 gives you back the original UTF-8 sequence of bytes. SQL Server uses UTF-16 Little Endian only for
NVARCHAR
data, and even for XML. Hence,ù
is the 8-bit version of the two-byte UTF-8 sequence forù
(0xC3
and0xB9
).Fortunately, it is possible to convert a UTF-8 encoded string into UTF-16, or even into a non-Unicode Code Page (IF the Code Page supports all characters being converted). The trick is to convert the Base64 decoded bytes, in their text representation (even with incorrectly converted characters) into XML. The trick to this trick is that you need to add the
<?xml ...>
declaration (typically omitted) and specify the source encoding:Please note that this trick only works for converting from various source encodings into UTF-16 Little Endian (as that is how the XML datatype in SQL Server stores strings internally). This method cannot be used to convert UTF-16 into UTF-8 or some other non-SQL Server-supported encoding.
Below is an Inline-TVF encapsulating the steps shown above:
And then test:
Just use with
CROSS APPLY
if doing a set-based operation.