Why does SQL convert Unicode 9619 to ASCII code 166?
SQL Server is not employing any special custom logic here; it is using standard operating system services to perform the conversion.
Specifically, the SQL Server type and expression service (sqlTsEs
) calls into OS routine WideCharToMultiByte
in kernel32.dll
. SQL Server sets the input parameters to WideCharToMultiByte
such that the routine performs a 'quick translation'. This is faster than requesting a specific default character be used when no direct translation exists.
The quick translation relies on the target code page to perform a best-fit mapping for any unmatched characters, as mentioned in the link Martin Smith provided in a comment to the question:
Best-fit strategies vary for different code pages, and they are not documented in detail.
When the input parameters are set for a quick translation, WideCharToMultiByte
calls OS service GetMBNoDefault
(source). Inspecting the SQL Server call stack when performing the conversion specified in the question confirms this:
Regarding:
I tested doing an “add new column, set new=old, drop old, rename old to new” for both of the main two fields I want to convert from nvarchar(max) to varchar(max), and it took 81 minutes on our test server ... before running out of disk space ... it was too slow.
and
Another technique I plan to try is to create the new table def in a new schema, select into it from the old table, then move tables amongst schema and delete the old table.
Generally speaking, making a copy of the table with the ideal schema is my preferred approach. But, if you only now have maybe enough space to convert the two columns, are you sure you have enough space to make a copy of the entire table?
Also, the new table would just need to have a different name. It wouldn't need to be in a different schema.
Since you are on Enterprise Edition, have you considered or even looked at enabling Data Compression? Not only would it have the effect you are looking for on the NCHAR
/ NVARCHAR
fields, but it would also save space on other fields with other types as well.
There are two types of Compression: Row and Page. You should read up on them and run the stored procedure that estimates what your savings would be.
Enabling compression can be done as an ONLINE
operation, but might require a bit of disk space. If you do not have the available space for this, then you might could consider a hybrid approach where you build a copy of the table as TableNEW
, and have the clustered index already created, and created with Compression enabled. Then you should be able to slowly fill TableNEW
and the data will compress as it goes in. Of course, you will want to use INSERT INTO...SELECT to do it in batches. And the full benefit of the compression might not be realized until after you drop the original table and do a full rebuild on the Clutered Index of TableNEW
.
And keep in mind that there are scenarios where you might not save that much space, or saving the space isn't worth the increase in CPU activity. But, that all depends on a lot of factors so it is really something that should be tested on your system.
You could always take the approach of:
- Enable compression, either directly to the current table as ONLINE (if there is enough space to support it), or into a separate table that has Compression enabled.
- IF you find that CPU actually has increased beyond the benefit of the space savings, then you have the option of building the table again with regular VARCHAR fields and no Compression. Because compression was already enabled, you should definitely have room for this now.
But again, like anything we do, it should be tested. I have heard for years how horrible "XML parsing" is on CPU, and how bad Compression is supposed to be, but in practice, those concerns are quite often overstated. The only way to know is to test on your system. (ps, just in case it is not clear being a text-only medium, these final statements are not attacking what @Kin said in his answer about needing to be cautious of CPU activity increase. He is correct, at least to a degree. I am just reminding everyone to be sure to put everything into perspective of current hardware and software and system setup).
Best Answer
From my experience, simpler characters are more commonly used, e.g.
1-1
to mean one-to-one, and1-*
for one-to-many, and*-*
for many-to-many.You can also use the infinity symbol (U+221E) for the "many" symbol if you really wanted but I prefer the previous simpler examples.
By the way, that website is helpfully searchable and you can find other similar arrow characters to what you're looking for as well such as the following: