I have files on a Windows server that have certain accented characters in the name. On Windows Explorer files are displayed normally but running 'dir' at the command prompt with default settings displays substituted characters.
For example, the character ö
is displayed as o"
in the listing. This causes problems when accessing these files from other platforms over SMB, presumably because of conflicting encoding/code pages. The problem is not present with all files and I don't know where the problem files came from.
Example:
E:\folder\files>dir
Volume in drive E is data
Volume Serial Number is 5841-C30E
Directory of E:\folder\files
07/05/2016 07:46 PM <DIR> .
07/05/2016 07:46 PM <DIR> ..
12/01/2015 11:12 AM 14,105 file with o" character.xlsx
01/22/2015 05:30 PM 11,598 file with correct ö character.xlsx
2 File(s) 25,703 bytes
2 Dir(s) 2,727,491,600,384 bytes free
I've changed file and directory names, but you'll get the idea.
Any ideas how the names could have gotten this way? Perhaps they were copied or created using another platform or tool?
How could I batch find and rename all the problem files? I looked at a couple of GUI renaming utilities but they don't see the problem and only work with the name shown in Windows Explorer.
Filesystem on the drive is ReFS, could that have something to do with it?
Edit: ran PowerShell command
Y:\test>powershell -c Get-ChildItem ^|ForEach-Object {$x=$_.Name; For ($i=0;$i
-lt $x.Length; $i++) {\"{0} {1} {2}\" -f $x,$x[$i],[int]$x[$i]}}
file with o¨ character.xlsx o 111
file with o¨ character.xlsx ¨ 776
Cleaned up to show only relevant part.
So looks like it's really a combining diaeresis
and not a vertical quotation mark. Like it should be, as I understand, when talking about unicode normalization.
Best Answer
I can reproduce your problem using next simple Powershell script
Above script is updated as follows: 1st shows more info on composed/decomposed Unicode characters i.e their Unicode names (see Get-CharInfo module); 2nd embedded very artless draft of possible solution.
Output from
cmd
prompt:In fact, above
dir
output looks like1097217FormDsˇo¨u¨.txt
incmd
window and my unicode-aware browser composes strings as listed above but unicode analyzer shows the characters truly as well as the latest image:However, next example shows the problem in its full width: a
for
loop changes combining accents to normal ones:==>
Here's very artless draft of possible solution (see output above):
(ToDo: invokeRename-Item
merely if necessary):and its output(again, here are rendered composed strings and image below showscmd
window look unbiased):Updated
cmd
output