Excel – n Excel formula to identify special characters in a cell

microsoft excelspecial charactersworksheet-function

We have about 3500 documents whose filenames need to be manually scrubbed to remove special characters like brackets, colons, semicolons, commas, etc.

I have a text file that I've dumped into excel, and I'm trying to create a column that flags the filename for modification if it includes special characters. The pseudocode formula would be

=IF (cellname contains [^a-zA-z_-0-9], then "1", else "0")

to flag the row if it contains any characters other than A-Z, 0-9, – or _, regardless of case.

Anyone know of something that may work for me? I'm hesitant to code and massive if statement if there's something quick and easy.

Best Answer

No code? But it's so short and easy and beautiful and... :(

Your RegEx pattern [^A-Za-z0-9_-] is used to remove all special characters in all cells.

Sub RegExReplace()

    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    RegEx.Global = True

    RegEx.Pattern = "[^A-Za-z0-9_-]"
    For Each objCell In ActiveSheet.UsedRange.Cells
        objCell.Value = RegEx.Replace(objCell.Value, "")
    Next

End Sub

Edit

This is as close as I can get to your original question.

enter image description here

The second code is a user-defined function =RegExCheck(A1,"[^A-Za-z0-9_-]") with 2 arguments. The first one is the cell to check. The second one is the RegEx pattern to check for. If the pattern matches any of the characters in your cell, it will return 1 otherwise 0.

You can use it like any other normal Excel formula if you first open VBA editor with ALT+F11, insert a new module (!) and paste the code below.

Function RegExCheck(objCell As Range, strPattern As String)

    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    RegEx.Global = True
    RegEx.Pattern = strPattern

    If RegEx.Replace(objCell.Value, "") = objCell.Value Then
        RegExCheck = 0
    Else
        RegExCheck = 1
    End If

End Function

For users new to RegEx I'll explain your pattern: [^A-Za-z0-9_-]

[] stands for a group of expressions
^ is a logical NOT
[^ ] Combine them to get a group of signs which should not be included
A-Z matches every character from A to Z (upper case)
a-z matches every character from a to z (lower case)
0-9 matches every digit
_ matches a _
- matches a - (This sign breaks your pattern if it's at the wrong position)
Related Question