Microsoft Excel – Bulk Fix for ‘Format Does Not Match Extension’ Error

csvmicrosoft excelsave as

Context

I have received around 1700 97-2003 excel files with a ton of data including some text information. I am trying to read them all into Stata before I clean and run some statistical analysis. But, Stata is unable to open them.
The files have a .xls format with large amounts of text in them which I want to keep, and convert them into .csv files (hence the csv tag).

What I've tried

  • On closer look, every time I manually open a file in Excel I get the error "Format does not match extension", I have to click on "Open anyway" and manually save it before the file can be imported into Stata. I can't possibly do this for 1700 files. I am using a windows OS and all files are saved locally on my drive.

I additionally tried to write a bulk line of code on stata to save all files as csv or atleast an xls that can be imported into stata but I have had zero luck. Everytime I try to save them, I get a corrupted csv or an xls with the exact same error that pops up as " try xlCreateBook()" on stata and the "format does not match extension" error when I manually try to open the file.

For reference, under is the stata code I wrote.

cd "$dropbox\project8\data\2008-2009\A.Monthwise\dakota"
local oldfiles : dir "." files "*.xls" , respectcase
mac list _oldfiles

foreach fn of local oldfiles {
  local oldfn `"`fn'"'
  local res = ustrregexm(`"`oldfn'"', "_(.*)\.xls", .)
  if `res'==1 {
    local newfn = "new_" + ustrregexs(1) + ".dta"
    di `"Renaming `oldfn' to `newfn'"'
    copy `"./`oldfn'"' `"./new/`newfn'"', public
     }
} 

Question

Is there a way I can fix this extesnion/format issue for 1700 files and convert everything in bulk, such that Stata can read it?

Best Answer

The error "format does not match extension" means that the files are not truly .xls. As Excel can still open them, they are probably of another format that is supported. My guess would be .xlsx.

You may test by using a hex-editor (such as HxD) to check the file signature, which is in the first few bytes:

  • xlsx: First 4 bytes are 50 4B 03 04

  • xls: First 8 bytes are D0 CF 11 E0 A1 B1 1A E1

To rename all the files in the current folder at once, you could use the Command Prompt (CMD) with the command:

ren *.xls *.xlsx

If your file-signature is not one of the above, please add it to your post. It will help identify these files.


As the files were HTML, not xsl or even xslx, it's possible to do mass-conversion in Excel using VBA.

The article How to convert multiple xls files to xlsx files in Excel? contains this script that worked for the poster:

Sub ConvertToXlsx()
'Updateby Extendoffice
Dim strPath As String
Dim strFile As String
Dim xWbk As Workbook
Dim xSFD, xRFD As FileDialog
Dim xSPath As String
Dim xRPath As String
Set xSFD = Application.FileDialog(msoFileDialogFolderPicker)
With xSFD
.Title = "Please select the folder contains the xls files:"
.InitialFileName = "C:\"
End With
If xSFD.Show <> -1 Then Exit Sub
xSPath = xSFD.SelectedItems.Item(1)
Set xRFD = Application.FileDialog(msoFileDialogFolderPicker)
With xRFD
.Title = "Please select a folder for outputting the new files:"
.InitialFileName = "C:\"
End With
If xRFD.Show <> -1 Then Exit Sub
xRPath = xRFD.SelectedItems.Item(1) & "\"
strPath = xSPath & "\"
strFile = Dir(strPath & "*.xls")
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Do While strFile <> ""
If Right(strFile, 3) = "xls" Then
Set xWbk = Workbooks.Open(Filename:=strPath & strFile)
xWbk.SaveAs Filename:=xRPath & strFile & "x", _
FileFormat:=xlOpenXMLWorkbook
xWbk.Close SaveChanges:=False
End If
strFile = Dir
Loop
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
Related Question