Regex Library – Determining the Regular Expression Library Supplied by the System

librariesregular expression

I was attempting to use the following regexp in less yesterday:
^\+1[[:space:]]*$, which worked for me in grep. This didn't work in less, so I checked the manpage to see what it does support, and found this:

/pattern
    Search forward in the file for the N-th line containing the pattern. N defaults to 1. The pattern is a regular expression, as recognized by the regular expression library supplied by your system. 
    The search starts at the first line displayed (but see the -a and -j options, which change this).

I asked this question in /dev/chat, and there wasn't much of a consensus (to me) on what library is used, or even the priority in choosing a library, let alone a way to actually check what is currently used. I currently use Fedora 30, but hopefully the answers are Linux-agnostic.

So, the questions are:

  1. How do I determine what regexp library is supplied by my system that less would use?
  2. What does it mean for a regexp library to be supplied by my system?
  3. What other utilities and programs does this supplied regexp library affect?
  4. If you mention any specific regexp libraries that could be/are used by the system, please provide a link to a page on that regexp library, if possible.

ldd shows

[unge@localhost ~]$ ldd "$(command -v less)"
    linux-vdso.so.1 (0x00007fff040e0000)
    libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f6733339000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f6733173000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f67333be000)

Best Answer

  1. If you’re referring to a less binary, less --version will tell you which regex implementation it’s using; for example

    $ less --version
    less 487 (GNU regular expressions)
    Copyright (C) 1984-2016  Mark Nudelman
    
    less comes with NO WARRANTY, to the extent permitted by law.
    For information about the terms of redistribution,
    see the file named README in the less distribution.
    Homepage: http://www.greenwoodsoftware.com/less
    

    At build time, the library is determined by the --with-regex given to ./configure:

    --with-regex=LIB        select regular expression library (LIB is one of auto,none,gnu,pcre,posix,regcmp,re_comp,regcomp,regcomp-local) [auto]
    

    and traced in the build logs.

    Some of the implementations are available as separate libraries (pcre for example), others are included in the C library (gnu for example), one of them is included in the less source code (regcomp-local).

  2. I think the expression refers to whichever library was available on the system less was built on, in the context of the auto option at least. Once built, a given less binary won’t change its regex implementation.

  3. None.

The supported libraries are:

  • POSIX regcomp (identified as “POSIX” in the version string);
  • PCRE (“PCRE”);
  • the GNU C library’s re_compile_pattern (“GNU”);
  • regcmp (“V8”);
  • Unix V8 regcomp, either provided by the system or less’ own copy (Henry Spencer’s implementation; “Spencer V8”);
  • re_comp (“BSD”).
Related Question