searching with YUM
You generally don't use any regular expressions (globs) when searching with yum search
since the command search
is already looking for sub-strings within the package names and their summaries. How do I know this? There's a message that tells you this when you use yum search
.
Name and summary matches only, use "search all" for everything.
NOTE: The string [cl-*]
is technically a glob in the Bash shell.
So you generally look for fragments of strings that you want with search
. The regular expressions come into play when you're looking for particular packages. These are the YUM commands like list
and install
.
For example:
$ yum list cl-* | expand
Loaded plugins: fastestmirror, langpacks, refresh-packagekit, tsflags
Loading mirror speeds from cached hostfile
* fedora: mirror.dmacc.net
* rpmfusion-free: mirror.nexcess.net
* rpmfusion-free-updates: mirror.nexcess.net
* rpmfusion-nonfree: mirror.nexcess.net
* rpmfusion-nonfree-updates: mirror.nexcess.net
* updates: mirror.dmacc.net
Available Packages
cl-asdf.noarch 20101028-5.fc19 fedora
cl-clx.noarch 0.7.4-4.3 home_zhonghuaren
cl-ppcre.noarch 2.0.3-3.3 home_zhonghuaren
The only caveat you have to be careful with regexes/globs, is if there are files within your shell that are named such that they too would matchcl-*
. In those cases your shell will expand the regex/glob prior to it being presented to YUM.
So instead of running yum list cl-*
you'll be running the command yum list cl-file
, if there's a file matching the regex/glob cl-*
.
For example:
$ ls cl-file
cl-file
$ yum list cl-*
Loaded plugins: fastestmirror, langpacks, refresh-packagekit, tsflags
Loading mirror speeds from cached hostfile
* fedora: mirror.steadfast.net
* rpmfusion-free: mirror.nexcess.net
* rpmfusion-free-updates: mirror.nexcess.net
* rpmfusion-nonfree: mirror.nexcess.net
* rpmfusion-nonfree-updates: mirror.nexcess.net
* updates: mirror.steadfast.net
Error: No matching Packages to list
You can guard against this happening by escaping the wildcard like so:
$ yum list cl-\* | expand
Loaded plugins: fastestmirror, langpacks, refresh-packagekit, tsflags
Loading mirror speeds from cached hostfile
* fedora: mirror.dmacc.net
* rpmfusion-free: mirror.nexcess.net
* rpmfusion-free-updates: mirror.nexcess.net
* rpmfusion-nonfree: mirror.nexcess.net
* rpmfusion-nonfree-updates: mirror.nexcess.net
* updates: mirror.dmacc.net
Available Packages
cl-asdf.noarch 20101028-5.fc19 fedora
cl-clx.noarch 0.7.4-4.3 home_zhonghuaren
cl-ppcre.noarch 2.0.3-3.3 home_zhonghuaren
So what about the brackets
I suspect you have files in your local directory that are getting matched when you used [cl-*]
as an argument to yum search
. These files after being matched by the shell, were passed to the yum search
command where matches where then found.
For example:
$ ls cl-file
cl-file
$ yum search cl-*
Loaded plugins: fastestmirror, langpacks, refresh-packagekit, tsflags
Loading mirror speeds from cached hostfile
* fedora: mirror.dmacc.net
* rpmfusion-free: mirror.nexcess.net
* rpmfusion-free-updates: mirror.nexcess.net
* rpmfusion-nonfree: mirror.nexcess.net
* rpmfusion-nonfree-updates: mirror.nexcess.net
* updates: mirror.dmacc.net
======================================================================= N/S matched: cl-file =======================================================================
opencl-filesystem.noarch : OpenCL filesystem layout
Name and summary matches only, use "search all" for everything.
NOTE: The match above was matched against my file's name, cl-file
, and not the cl-*
as I had intended.
- In POSIX awk,
Is there a builtin function which can achieve either of the two objectives?
No. You can achieve the same effect, but not with a single builtin function.
Does the match
builtin function only find the leftmost and longest match?
Yes. Regular expressions in POSIX awk
(and GNU awk
) are always greedy (i.e. longest match always wins).
To achieve the first objective, is it a correct way to repeatedly
apply match
to the suffix of the target string created by finding
each match and removing the match and the prefix before it from
the target string?
Yes, but if you want 100% compatibility with gsub()
the details are pretty tricky.
Is https://gist.github.com/mllamazing/a40946fcf8211a503bed a correct
implementation?
Mostly, if you remove the gsub line. The devil is in the details: the code will loop if regex
is an empty string. Classic awk
didn't allow empty regexps, but IIRC nawk
did. To fix that you could do something like this:
function FindAllMatches(str, regex, match_arr) {
ftotal = 0;
ini = RSTART;
leng = RLENGTH;
delete match_arr;
while (str != "" && match(str, regex) > 0) {
match_arr[++ftotal] = substr(str, RSTART, RLENGTH)
str = substr(str, RSTART + (RLENGTH ? RLENGTH : 1))
}
RSTART = ini;
RLENGTH = leng;
}
That's not 100% compatible to gsub()
however, because
$ echo 123 | awk '{ gsub("", "-") } 1'
-1-2-3-
while the function above finds only 3 matches (namely, it misses the match at the end).
You could try this instead:
function FindAllMatches(str, regex, match_arr) {
ftotal = 0;
ini = RSTART;
leng = RLENGTH;
delete match_arr;
while (match(str, regex) > 0) {
match_arr[++ftotal] = substr(str, RSTART, RLENGTH)
if (str == "") break
str = substr(str, RSTART + (RLENGTH ? RLENGTH : 1))
}
RSTART = ini;
RLENGTH = leng;
}
This fixes the problem above, but it breaks other cases: if str = "123"
and regex = "[1-9]*"
the function finds two occurrences, 123
and the empty string at the end, while gsub()
finds only one, 123
.
There may be other similar differences, that I can't be bothered to hunt right now.
In Gawk,
does array
after a call patsplit(string, array, fieldpat, seps)
store the matches as required in the second objective?
Mostly yes. However, corner cases related to regexps can be unexpectedly subtle. There may be some differences, as above.
Can the
locations of the match location be found from array
and seps
,
based on that seps[i]
is the separator string between array[i]
and array[i+1]
?
Yes.
Best Answer
...would probably work. Run on your example data it prints:
All it does is attempt to enclose the first match on a line in
\n
ewlines. Whether or not it succeeds itD
eletes up to the first\n
ewline in pattern space - which for a non-matching line completely removes it from output, but for a match deletes only up to the head of your pattern and the script starts again from the top. If a\n
ewline is matched in pattern space - which can only happen if a match was just found and thenD
eleted - thensed
prints only up to the first occurring\n
ewline in pattern space - which is at the tail of your matched string. Thes///
ubstitution is!
not attempted when there is a\n
ewline already in pattern space, so theD
elete command clears the already printed match and the cycle starts again from the tail of the last match on.Depending on your
sed
you may need to use a literal\n
ewline in place of then
in the right-hand substitution field, though. But you should be able to do all of the file arguments at once - or, at least, very many at a time (depending on your ARGMAX limits). You can just shell glob for those, or maybe do......because
sed
will treat all input files as a single stream.