Why is egrep [wW][oO][rR][dD] faster than grep -i word

grepperformance

I've been using grep -i more often and I found out that it is slower than its egrep equivalent, where I match against the upper or lower case of each letter:

$ time grep -iq "thats" testfile

real    0m0.041s
user    0m0.038s
sys     0m0.003s
$ time egrep -q "[tT][hH][aA][tT][sS]" testfile

real    0m0.010s
user    0m0.003s
sys     0m0.006s

Does grep -i do additional tests that egrep doesn't?

Best Answer

grep -i 'a' is equivalent to grep '[Aa]' in an ASCII-only locale. In a Unicode locale, character equivalences and conversions can be complex, so grep may have to do extra work to determine which characters are equivalent. The relevant locale setting is LC_CTYPE, which determines how bytes are interpreted as characters.

In my experience, GNU grep can be slow when invoked in a UTF-8 locale. If you know that you're searching for ASCII characters only, invoking it in an ASCII-only locale may be faster. I expect that

time LC_ALL=C grep -iq "thats" testfile
time LC_ALL=C egrep -q "[tT][hH][aA][tT][sS]" testfile

would produce indistinguishable timings.

That being said, I can't reproduce your finding with GNU grep on Debian jessie (but you didn't specify your test file). If I set an ASCII locale (LC_ALL=C), grep -i is faster. The effects depend on the exact nature of the string, for example a string with repeated characters reduces the performance (which is to be expected).

Standalone printf

Part of the "expense" in invoking a process is that several things have to happen that are resource intensive.

The executable has to be loaded from the disk, this incurs slowness since the HDD has be be accessed in order to load the binary blob from the disk which the executable is stored as.
The executable is typically built using dynamic libraries, so some secondary files to the executable will also have to be loaded, (i.e. more binary blob data being read from the HDD).
Operating system overhead. Each process that you invoke incurs overhead in the form of a process ID having to be created for it. Space in memory will also have be carved out to both house the binary data being loaded from the HDD in steps 1 & 2, as well as multiple structures having to be populated to store things such as the processes' environment (environment variables etc.)

excerpt of an strace of /usr/bin/printf

    $ strace /usr/bin/printf "%s\n" "hello world"
    *execve("/usr/bin/printf", ["/usr/bin/printf", "%s\\n", "hello world"], [/* 91 vars */]) = 0
    brk(0)                                  = 0xe91000
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6b000
    access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY)      = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=242452, ...}) = 0
    mmap(NULL, 242452, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd155a2f000
    close(3)                                = 0
    open("/lib64/libc.so.6", O_RDONLY)      = 3
    read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\357!\3474\0\0\0"..., 832) = 832
    fstat(3, {st_mode=S_IFREG|0755, st_size=1956608, ...}) = 0
    mmap(0x34e7200000, 3781816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34e7200000
    mprotect(0x34e7391000, 2097152, PROT_NONE) = 0
    mmap(0x34e7591000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x191000) = 0x34e7591000
    mmap(0x34e7596000, 21688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34e7596000
    close(3)                                = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2e000
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2c000
    arch_prctl(ARCH_SET_FS, 0x7fd155a2c720) = 0
    mprotect(0x34e7591000, 16384, PROT_READ) = 0
    mprotect(0x34e701e000, 4096, PROT_READ) = 0
    munmap(0x7fd155a2f000, 242452)          = 0
    brk(0)                                  = 0xe91000
    brk(0xeb2000)                           = 0xeb2000
    brk(0)                                  = 0xeb2000
    open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
    mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd14fb9b000
    close(3)                                = 0
    fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6a000
    write(1, "hello world\n", 12hello world
    )           = 12
    close(1)                                = 0
    munmap(0x7fd155a6a000, 4096)            = 0
    close(2)                                = 0
    exit_group(0)                           = ?*

Looking through the above you can get a sense of the additional resources that /usr/bin/printf is having to incur due to it being a standalone executable.

Builtin printf

With the built version of printf all the libraries that it depends on as well as its binary blob have already been loaded into memory when Bash was invoked. So none of that has to be incurred again.

Effectively when you call the builtin "commands" to Bash, you're really making what amounts to a function call, since everything has already been loaded.

An analogy

If you've ever worked with a programming language, such as Perl, it's equivalent to making calls to the function (system("mycmd")) or using the backticks (`mycmd`). When you do either of those things, you're forking a separate process with it's own overhead, vs. using the functions that are offered to you through Perl's core functions.

Anatomy of Linux Process Management

There's a pretty good article on IBM Developerworks that breaks down the various aspects of how Linux processes are created and destroyed along with the different C libraries involved in the process. The article is titled:Anatomy of Linux process management - Creation, management, scheduling, and destruction. It's also available as a PDF.

Best Answer

Related Solutions

Linux – Using Linux cgroups to balance CPU performance

Bash – Why is bash’s printf faster than /usr/bin/printf

Standalone printf

Builtin printf

An analogy

Anatomy of Linux Process Management

Related Question