I have a file which I need to convert to unix format, so I am doing dos2unix filename.txt. But the format of the file after conversion is different on centos when compared to the macos format.
I have tried to update the dos2unix version on centos but didn't help.
Centos:
-bash-4.1$ dos2unix -V
dos2unix 3.1 (Thu Nov 19 1998)
MacOs:
m-c02xd0nmjgh7:test files s0c03h1$ dos2unix -V
dos2unix 7.4.0 (2017-10-10)
With Unicode UTF-16 support.
Without native language support.
Original file format:
<?xml version="1.0" encoding="utf-8"?>
<ACES version="3.0">
<Header>
<Company>Disc Brakes Australia</Company>
<SenderName>SEMA Data Co-op</SenderName>
<SenderPhone>888-958-6698 option 2</SenderPhone>
<TransferDate>2019-03-21</TransferDate>
<BrandAAIAID>DMWK</BrandAAIAID>
<DocumentTitle>SDC ACES XML File</DocumentTitle>
<EffectiveDate>2019-03-21</EffectiveDate>
<SubmissionType>FULL</SubmissionType>
<VcdbVersionDate>2019-02-22</VcdbVersionDate>
<QdbVersionDate>2019-02-22</QdbVersionDate>
<PcdbVersionDate>2019-02-22</PcdbVersionDate>
</Header>
<App action="A" id="1">
<BaseVehicle id="119723" />
<SubModel id="973" />
<EngineBase id="6067" />
<Region id="3" />
<Qty>2</Qty>
<PartType id="1896" />
<MfrLabel>T3 5000 Series T-Slot Slotted Rotor, Black Hat Test label 1</MfrLabel>
<Position id="22" />
<Part>DBA52120BLKS</Part>
</App>
<App action="A" id="2">
<BaseVehicle id="119723" />
<SubModel id="973" />
<EngineBase id="3930" />
<Region id="1" />
<Qty>2</Qty>
<PartType id="1896" />
<MfrLabel>T3 5000 Series T-Slot Slotted Rotor, Black Hat Test label 2</MfrLabel>
<Position id="22" />
<Part>DBA52120BLKS</Part>
</App>
<Footer>
<RecordCount>2</RecordCount>
</Footer>
</ACES>
m-c02xd0nmjgh7:test files s0c03h1$ od -bc FileName.XML | head -10
0000000 357 273 277 074 077 170 155 154 040 166 145 162 163 151 157 156
357 273 277 < ? x m l v e r s i o n
0000020 075 042 061 056 060 042 040 145 156 143 157 144 151 156 147 075
= " 1 . 0 " e n c o d i n g =
0000040 042 165 164 146 055 070 042 077 076 015 012 074 101 103 105 123
" u t f - 8 " ? > \r \n < A C E S
0000060 040 166 145 162 163 151 157 156 075 042 063 056 060 042 076 015
v e r s i o n = " 3 . 0 " > \r
0000100 012 040 040 074 110 145 141 144 145 162 076 015 012 040 040 040
\n < H e a d e r > \r \n
Macos file format after conversion:
m-c02xd0nmjgh7:test files s0c03h1$ dos2unix FileName.XML
dos2unix: converting file FileName.XML to Unix format...
m-c02xd0nmjgh7:test files s0c03h1$ od -bc FileName.XML | head -10
0000000 074 077 170 155 154 040 166 145 162 163 151 157 156 075 042 061
< ? x m l v e r s i o n = " 1
0000020 056 060 042 040 145 156 143 157 144 151 156 147 075 042 165 164
. 0 " e n c o d i n g = " u t
0000040 146 055 070 042 077 076 012 074 101 103 105 123 040 166 145 162
f - 8 " ? > \n < A C E S v e r
0000060 163 151 157 156 075 042 063 056 060 042 076 012 040 040 074 110
s i o n = " 3 . 0 " > \n < H
0000100 145 141 144 145 162 076 012 040 040 040 040 074 103 157 155 160
e a d e r > \n < C o m p
Centos file format after conversion:
-bash-4.1$ dos2unix output.txt
dos2unix: converting file output.txt to UNIX format ...
-bash-4.1$ od -bc output.txt | head -10
0000000 357 273 277 074 077 170 155 154 040 166 145 162 163 151 157 156
357 273 277 < ? x m l v e r s i o n
0000020 075 042 061 056 060 042 040 145 156 143 157 144 151 156 147 075
= " 1 . 0 " e n c o d i n g =
0000040 042 165 164 146 055 070 042 077 076 012 074 101 103 105 123 040
" u t f - 8 " ? > \n < A C E S
0000060 166 145 162 163 151 157 156 075 042 063 056 060 042 076 012 040
v e r s i o n = " 3 . 0 " > \n
0000100 040 074 110 145 141 144 145 162 076 012 040 040 040 040 074 103
< H e a d e r > \n < C
I want the same results as I get from unix2dos in a mac as showed above.
Best Answer
357 273 277
is octal representation of BOM (byte order mark) in UTF-8. The original file is with BOM. In one of your systemsdos2unix
removes it.In my Debian
man 1 dos2unix
says:If you have the same (or similar) options available, use them. Example:
But your
dos2unix
on CentOS is very old (1998-11-19? over 20 years! this is even more awkward, considering the first CentOS release was in 2004). The changelog says-r
and-b
were added on 2014-07-07. Get a newerdos2unix
.Alternatively seek
bomstrip
. The description ofbomstrip
package in my Debian is: