Well, here's a wikipedia page for matching or replacing with Perl one liners. I did this in Cygwin:
Perl can behave like grep or like sed.
The /s
makes dot match new line.
The -0777
makes it apply the regular expression to the whole thing instead of line by line.
\n
can match new line as well.
$ echo -e 'a\nb\nc\nd' | perl -0777 -pe 's/.*c//s'
d
user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -pe 's/.*c//s'
a
b
d
Here is the other form, -ne
with print $1
:
user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -ne 'print $1 if /(.*c)/s'
c
user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -0777 -ne 'print $1 if /(.*c)/s'
a
b
c
user@comp ~
$
Also
$ echo xxx|perl -lne 'print ""'
Perl's equivalent of \0 or &, i.e. the whole match is $_ or to be able to put text before and after without a space, ${_}
$ echo xxx|perl -lne 'print "a${_}${_}a"'
axxxxxxa
and
$ echo xxx|perl -lpe 's/.*/a${_}${_}a"/'
axxxxxxa"
###Some further examples
$ cat t.t
<ul>
<li>item 1</li>
<li>item 2</li>
</ul>
$ perl -0777 -ne 'print $1 if /\<ul\>(.*?)\<\/ul>/s' t.t
<li>item 1</li>
<li>item 2</li>
user@comp ~
$ perl -0777 -ne 'print $1 if /(.*)/s' t.t
<ul>
<li>item 1</li>
<li>item 2</li>
</ul>
user@comp ~
$
An example of Global for the -ne
one (change "if" to "while"):
$ echo -e 'bbb' | perl -0777 -ne 'print $1 while /(b)/sg'
bbb
For the -pe
one, just add the g
at the end (/sg
or /gs
, same thing):
$ echo -e 'aaa' | perl -0777 -pe 's/a/z/s'
zaa
user@comp ~
$ echo -e 'aaa' | perl -0777 -pe 's/a/z/sg'
zzz
Note- This question contrasts /s and -0777
Those print $1
examples don't show the whole line. this link https://dzone.com/articles/perl-as-a-better-grep has this example that does perl -wln -e "/RE/ and print;" foo.txt
In the lesson you linked to, you are asked to write a regex that captures the file name of these two
file_a_record_file.pdf
file_yesterday.pdf
and skips
testfile_fake.pdf.tmp
The simplest regex to do that is
(.*)\.pdf$
This means match everything that ends in .pdf
but capture only the file name.
So, why is capturing useful? That depends on the program you are using these regexes with. Capturing patterns allows you to save what you have captured as a variable. For example, using Perl, the first captured pattern is $1
, the second $2
etc:
echo "Hello world" | perl -ne '/(.+) (.+)/; print "$2 $1\n"'
This will print "world Hello" because the first parenthesis captured Hello
and the second captured world
but we are then printing $2 $1
so the two matches are inverted.
Other regex implementations allow you to refer to the captured patterns using \1
, \2
etc. For example, GNU sed
:
echo "Hello world" | sed 's/\(.*\) \(.*\)/\2 \1/'
So, in general, capturing patterns is useful when you need to refer to these patterns later on. This is known as referencing and is briefly explained a little later in the tutorials you are doing.
Best Answer
I like using all Powershell commands when I can. After a bit of testing, this is the best I can do.
The first three lines are just to make this easier to read, you can define the variables inside the actual commands if you want. The key to this code sample is the the "Where-Object" command which is a filter that accepts regular expression matching. It should be noted that regular expression support is a little weird. I found a PDF reference card here that has the supported characters on the left side.
[EDIT]
As "@Johannes Rössel" mentioned, you can also reduce the last two lines down to a single line.
The main difference is that Johannes's way does object filtering and my way does text filtering. When working with Powershell, it's almost always better to use objects.
[EDIT2]
As @smoknheap mentioned, the above scripts will flatten out the folder structure and put all your files in one folder. I'm not sure if there is a switch that retains folder structure. I tried the -Recurse switch and it doesn't help. The only way I got this to work is to go back to string manipulation and add in folders to my filter.
I'm sure that there is a more elegant way to do this, but from my tests it works. It gather s everything and then filters for both name matches and folder objects. I had to use the ToString() method to gain access to the string manipulation.
[EDIT3]
Now if you want to report the pathing to make sure you have everything correct. You can use the "Write-Host" Command. Here's the code that will give you some hints as to what's going on.
This should return the relevant strings. If you get nothing somewhere, you'll know what item is having problems with.
Hope this helps