Bash – How to search many files with regular expression and output matches to lines in a new file

bashregular expressionsearch

I have 1000's of source files and I would like to find all text that matches a regular expression and then output each match on its own line in a resulting text file.

For instance;

// a.cs
string test = _.Text("Hello World!") + _.Text("Foo");
// b.cs
Debug.Log(_.ActionText("Bar"));

// results.txt
_.Text("Hello World")
_.Text("Foo")
_.ActionText("Bar")

Which command would is capable of achieving this? could you please show an example?

Best Answer

sed '/\n/P;//!s/_\.[^ ("]*Text([^)]*)/\n&\n/;D' files... >results.txt

...would probably work. Run on your example data it prints:

_.Text("Hello World!")
_.Text("Foo")
_.ActionText("Bar")

All it does is attempt to enclose the first match on a line in \newlines. Whether or not it succeeds it Deletes up to the first \newline in pattern space - which for a non-matching line completely removes it from output, but for a match deletes only up to the head of your pattern and the script starts again from the top. If a \newline is matched in pattern space - which can only happen if a match was just found and then Deleted - then sed prints only up to the first occurring \newline in pattern space - which is at the tail of your matched string. The s///ubstitution is !not attempted when there is a \newline already in pattern space, so the Delete command clears the already printed match and the cycle starts again from the tail of the last match on.

Depending on your sed you may need to use a literal \newline in place of the n in the right-hand substitution field, though. But you should be able to do all of the file arguments at once - or, at least, very many at a time (depending on your ARGMAX limits). You can just shell glob for those, or maybe do...

find /path -name pattern -exec sed script_above {} + >>results.txt

...because sed will treat all input files as a single stream.

Related Question