I'm running into weird behavior when trying to grep a man page on macOS. For example, the Bash man page clearly has an occurrence of the string NAME
:
$ man bash | head -5 | tail -1
NAME
And if I grep for name
I do get results, but if I grep for NAME
I don't:
$ man bash | grep 'NAME'
$ man bash | grep NAME
I've tried other uppercase words that I know are in there, and searching for SHELL
yields nothing whereas searching for BASH
yields results.
What's going on here?
Update: Thanks for all the answers! I thought it worth adding the context in which I ran into this. I wanted to write a bash function to wrap man
and in cases where I've tried to look up the man page for a shell builtin, jump to the relevant section of the Bash man page. There might be a better way, but here's what I've got currently:
man () {
case "$(type -t "$1")" in
builtin)
local pattern="^ *$1"
if bashdoc_match "$pattern \+[-[]"; then
command man bash | less --pattern="$pattern +[-[]"
elif bashdoc_match "$pattern\b"; then
command man bash | less --pattern="$pattern[[:>:]]"
else
command man bash
fi
;;
keyword)
command man bash | less --hilite-search --pattern='^SHELL GRAMMAR$'
;;
*)
command man "$@"
;;
esac
}
bashdoc_match() {
command man bash | col -b | grep -l "$1" > /dev/null
}
Best Answer
If you add a
| sed -n l
to thattail
command, to show non-printable characters, you'll probably see something like:That is, each character is written as
X
BackspaceX
. On modern terminals, the character ends up being written over itself (as Backspace aka BS aka\b
aka^H
is the character that moves the cursor one column to the left) with no difference. But in ancient tele-typewriters, that would cause the character to appear in bold as it gets twice as much ink.Still, pagers like
more
/less
do understand that format to mean bold, so that's still whatroff
does to output bold text.Some man implementations would call
roff
in a way that those sequences are not used (or internally callcol -b -p -x
to strip them like in the case of theman-db
implementation (unless theMAN_KEEP_FORMATTING
environment variable is set)), and don't invoke a pager when they detect the output is not going to a terminal (soman bash | grep NAME
would work there), but not yours.You can use
col -b
to remove those sequences (there are other types (_
BSX
) as well for underline).For systems using GNU
roff
(like GNU or FreeBSD), you can avoid those sequences being used in the first place by making sure the-c -b -u
options are passed togrotty
, for instance by making sure the-P-cbu
options is passed togroff
.For instance by creating a wrapper script called
groff
containing:That you put ahead of /usr/bin/groff in
$PATH
.With macOS'
man
(also using GNUroff
), you can create aman-no-overstrike.conf
with:And call
man
as:Still with GNU
roff
, if you set theGROFF_SGR
environment variable (or don't set theGROFF_NO_SGR
variable depending on how the defaults have been set at compile time), thengrotty
(as long as it's not passed the-c
option) will use ANSI SGR terminal escape sequences instead of those BS tricks for character attributes.less
understand them when called with the-R
option.FreeBSD's man calls
grotty
with the-c
option unless you're asking for colours by setting the MANCOLOR variable (in which case-c
is not passed togrotty
andgrotty
reverts to the default of using ANSI SGR escape sequences there).will work there.
On Debian, GROFF_SGR is not the default. If you do:
however, because
man
's stdout is not a terminal, it takes it upon itself to also pass aGROFF_NO_SGR
variable togrotty
(I suppose so it can usecol -bpx
to strip the BS sequences ascol
doesn't know how to strip the SGR sequences, even though it still does it withMAN_KEEP_FORMATTING
) which overrides ourGROFF_SGR
. You can do instead:(in a terminal) to have the SGR escape sequences.
That time, you'll notice that some of those NAMEs do appear in bold on the terminal (and in a
less -R
pager). If you feed the output tosed -n l
(MANPAGER='sed -n /NAME/l'
), you'll see something like:Where
\e[1m
is the sequence to enable bold in ANSI compatible terminals, and\e[0m
the sequence to revert all SGR attributes to the default.On that text
grep NAME
works as that text does containNAME
, but you could still have problems if looking for text where only parts of it is in bold/underline...