Control encoding of batch-created file

batchutf-8

I wanted to automate the creation of a directory tree file in Windows 10.

In PowerShell, I executed the following commands:

cd  C:\TreeTest
tree /f > .\TreeStructure.txt

The output was a pretty UTF-8 file:

Pretty UTF-8 tree structure test

Now I wanted to do the same thing in a batch file:

@echo off
cd  C:\TreeTest
tree /f > .\TreeStructure.txt

But the output from the batch file execution had the encoding screwed up:

Bad encoding tree structure test

Why is the encoding of the PowerShell output different from the output of the batch file?

I know that I can get an ASCII output by adding /a to the tree command, but I would prefer the pretty UTF-8 output to be saved to my tree file.

I tried changing the codepage by adding "chcp 65001" to my batch file, but it didn't change the file output.

Best Answer

LotPing's answer is right. Just for detailed description:

The > redirection operator (send specified stream to a file):

  • in Powershell, encoding of an output file is UCS-2 LE BOM:

When you are writing to files, the redirection operators use Unicode encoding. If the file has a different encoding, the output might not be formatted correctly. To redirect content to non-Unicode files, use the Out-File cmdlet with its Encoding parameter.

  • in Windows command prompt (cmd.exe):
    • cmd.exe /A (default): encoding of an output file is ANSI, and
    • cmd.exe /U: encoding of an output file is UCS-2 LE (no BOM):

The CMD Shell can redirect ASCII/ANSI (the default) or Unicode (UCS-2 le) but not UTF-8.
This can be selected by launching CMD /A or CMD /U.

However, output of the pretty old-fashioned utility tree.com isn't ready to be converted to Unicode. Therefore, straight start "" cmd /U /C "tree>tree_U.txt" still produces (mojibake) garbled, ANSI encoded file. The following cmd commands should do the trick:

tree>"%temp%\auxTree.txt"
start "" cmd /U /C "type "%temp%\auxTree.txt">tree_Unicode.txt"
del "%temp%\auxTree.txt"

BTW, here are those pretty characters and their codes (garbled in ANSI):

Char Unicode  OEM  ANSI  UTF-8     Character_description
 ─   U+2500   196  n/a   0xE29480  Box Drawings Light Horizontal
 │   U+2502   179  n/a   0xE29482  Box Drawings Light Vertical
 └   U+2514   192  n/a   0xE29494  Box Drawings Light Up And Right
 ├   U+251C   195  n/a   0xE2949C  Box Drawings Light Vertical And Right
Related Question