Windows – How to remove non-ASCII characters from filenames

batch-renamefilenamesrenameunicodewindows 7

I have several files with names containing various Unicode characters.
I'd like to rename them to only contain the "printable" ASCII characters (32-126).

E.g,

Läsmig.txt         //Before
L_smig.txt         //After
Mike’s Project.zip 
Mike_s Project.zip

Or for bonus points, transcribe to the closest character

Läsmig.txt
Lasmig.txt
Mike’s Project.zip
Mike's Project.zip

Ideally looking for an answer that doesn't require 3rd party tools.

(Edit: Scripts encouraged; I'm just trying to avoid niche shareware apps that need to be installed to work)

Power shell snippet that finds the files I'm interested in renaming:

gci -recurse | where {$_.Name -match "[^\u0020-\u007E]"}

Best Answer

I found a similar topic here on Stack Overflow.

With the following code most of the characters will be translated to their "closest character". Although i couldn't get the ’ translated. (Maybe it does, i can't make a filename in the prompt with it ;) The ß also does not get translated.

function Remove-Diacritics {
param ([String]$src = [String]::Empty)
  $normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
  $sb = new-object Text.StringBuilder
  $normalized.ToCharArray() | % {
    if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
      [void]$sb.Append($_)
    }
  }
  $sb.ToString()
}

$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
  $newname = Remove-Diacritics $_.Name
  if ($_.Name -ne $newname) {
    $num=1
    $nextname = $_.Fullname.replace($_.Name,$newname)
    while(Test-Path -Path $nextname)
    {
      $next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
      $nextname = $_.Fullname.replace($_.Name,$next)
      $num+=1
    }
    echo $nextname
    ren $_.Fullname $nextname
  }
}

Edit:

I added some code to check if a filename already exists and add (1), (2) etc... if it does. (It's not smart enough to detect an already existing (1) in the filename to be renamed so in that case you would get (1) (1). But as always... everything is programmable ;)

Edit 2:

Here is the last one for tonight...

This one has a different function for replacing the characters. Also added a line to change unknown characters like ß and ┤ for example to _.

function Convert-ToLatinCharacters {
param([string]$inputString)
  [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($inputString))
}

$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
  $newname = Convert-ToLatinCharacters $_.Name
  $newname = $newname.replace('?','_')
  if ($_.Name -ne $newname) {
    $num=1
    $nextname = $_.Fullname.replace($_.Name,$newname)
    while(Test-Path -Path $nextname)
    {
      $next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
      $nextname = $_.Fullname.replace($_.Name,$next)
      $num+=1
    }
    echo $nextname
    ren $_.Fullname $nextname
  }
}

Best Answer

Related Solutions

Windows – Find files with non-ASCII characters in filenames in Windows XP

Mac OS X – How to Remove Numbers from Filenames

Related Question