Command-Line – When to Use xargs

command linexargs

The xargs command always confuses me. Is there a general rule for it?

Consider the two examples below:

$ \ls | grep Cases | less

prints the files that match 'Cases', but changing the command to touch will require xargs:

$ \ls | grep Cases | touch
touch: missing file operand
Try `touch --help' for more information.

$ \ls | grep Cases | xargs touch

Best Answer

The difference is in what data the target program is accepting.

If you just use a pipe, it receives data on STDIN (the standard input stream) as a raw pile of data that it can sort through one line at a time. However some programs don't accept their commands on standard in, they expect it to be spelled out in the arguments to the command. For example touch takes a file name as a parameter on the command line like so: touch file1.txt.

If you have a program that outputs filenames on standard out and want to use them as arguments to touch, you have to use xargs which reads the STDIN stream data and converts each line into space separated arguments to the command.

These two things are equivalent:

# touch file1.txt
# echo file1.txt | xargs touch

Don't use xargs unless you know exactly what it's doing and why it's needed. It's quite often the case that there is a better way to do the job than using xargs to force the conversion. The conversion process is also fraught with potential pitfalls like escaping and word expansion etc.

Good way

Normally you can't do this with grep but you can use other tools. AWK was already mentioned but you can also use sed, like this:

sed -e '1p' -e '/youpattern/!d'

How it works:

Sed utility works on each line individually, running specified commands on each of them. You can have multiple commands, specifying several -e options. We can prepend each command with a range parameter that specifies if this command should be applied to specific line or not.
"1p" is a first command. It uses p command which normally prints all the lines. But we prepend it with a numerical value that specifies the range it should be applied to. Here, we use 1 which means first line. If you want to print more lines, you can use x,yp where x is first line to print, y is last line to print. For example to print first 3 lines, you would use 1,3p
Next command is d which normally deletes all the lines from buffer. Before this command we put yourpattern between two / characters. This is the other way (first was to specify which lines as we did with p command) of addressing lines that the command should be running at. This means the command will only work for the lines that match yourpattern. Except, we use ! character before d command which inverts its logic. So now it will remove all the lines that do not match specified pattern.
At the end, sed will print all the lines that are left in buffer. But we removed lines that do not match from the buffer so only matching lines will be printed.

To sum up: we print 1st line, then we delete all the lines that do not match our pattern from input. Rest of the lines are printed (so only lines that do match the pattern).

First line problem

As mentioned in comments, there is a problem with this approach. If specified pattern matches also first line, it will be printed twice (once by p command and once because of a match). We can avoid this in two ways:

Adding 1d command after 1p. As I already mentioned, d command deletes lines from buffer and we specify it's range by number 1, which means it will only delete 1st line. So the command would be sed -e '1p' -e '1d' -e '/youpattern/!d'
Using 1b command, instead of 1p. It's a trick. b command allows us to jump to other command specified by a label (this way some commands can be omitted). But if this label is not specified (as in our example) it just jumps to the end of commands, ignoring rest of the commands for our line. So in our case, last d command won't remove this line from buffer.

Full example:

ps aux | sed -e '1b' -e '/syslog/!d'

Using semicolon

Some sed implementations can save you some typing by using semicolon to separate commands instead of using multiple -e options. So if you don't care about being portable the command would be ps aux | sed '1b;/syslog/!d'. It works at least in GNU sed and busybox implementations.

Crazy way

Here's, however, rather crazy way to do this with grep. It's definitely not optimal, I'm posting this just for learning purposes, but you may use it for example, if you don't have any other tool in your system:

ps aux | grep -n '.*' | grep -e '\(^1:\)\|syslog'

How it works

First, we use -n option to add line numbers before each line. We want to numerate all the lines we we are matching .* - anything, even empty line. As suggested in comments, we can also match '^', result is the same.
Then we are using extended regular expressions so we can use \| special character which works as OR. So we match if the line starts with 1: (first line) or contains our pattern (in this case its syslog).

Line numbers problem

Now the problem is, we are getting this ugly line numbers in our output. If this is a problem, we can remove them with cut, like this:

ps aux | grep -n '.*' | grep -e '\(^1:\)\|syslog' | cut -d ':' -f2-

-d option specifies delimiter, -f specifies fields (or columns) we want to print. So we want to cut each lines on every : character and print only 2nd and all subsequent columns. This effectively removes first column with it's delimiter and this is exactly what we need.

Best Answer

Related Solutions

Number of Backslashes Needed for Escaping Regex Backslash on Command-Line

Bash Command Line – How to Grep a Specific Line and the First Line of a File?