Sed Expression – Deleting Lines with Repeated Fields

regular expressionsedtext processing

I was doing research on how to generate all non-repeating permutations of a set of a numbers without using recursion in Bash and I found this answer that worked, but I want to understand why.

Say you have three numbers: 1, 2, 3.

The following command will generate all possible non-repeating permutations:

printf "%s\n" {1,2,3}{1,2,3}{1,2,3} | sort -u | sed '/\(.\).*\1/d'
123
132
213
231
312
321

I understand what the printf with %s does when the argument is the brace expansions of the set {1, 2, 3} three times (which would print every single possible outcome).

I know that sort -u will output only unique lines.

I know that sed /<pattern>/d is used to delete any lines that match <pattern>.

Reading the pattern within the sed, I am somewhat confused. I know how to read regex but I don't see how this pattern works within the sed command.

\( = literal '('
.  = any character, once
\) = literal ')'
.* = any character, zero or more times
\1 = reference to first captured group

How does then the sed command remove non-unique values from this regex pattern? I don't understand how there's a reference to a captured group, when there's not really one? The parentheses are being used in the pattern to be matched literally? Everything about this execution makes sense to me until the sed command.

Best Answer

That's basic regular expressions (BRE) for sed by default, so $.$ is a capture group containing any one character. Then the .* just skips everything, and \1 matches whatever the group matched. If the whole lot can be made to match, then some character showed up twice, once for the group, and once for the backreference.

In fact, if I'm not mistaken, that wouldn't even work with standard extended regular expressions, since (for whatever reasons) backreferences aren't supported in them. Backreferences are only mentioned under "BREs matching multiple characters", not under EREs, and in fact the same thing with ERE doesn't work on my macOS (it takes the \1 as meaning a literal number 1):

$ printf "%s\n" 122 321 | sed -E -e '/(.).*\1/d'
122

GNU tools do support backreferences in ERE, though.

(I don't think sort -u is necessary here, the combination of brace expansions should produce all combinations without duplicates.)

Related Solutions

Concatenating fields from lines with different numbers of fields

awk '{
  s = m = ""
  for (i = 3; i < NF; i++) {m = m s $i; s = "_"}
  if (m == "") m = "_"
  print $1, $2, m, $NF}'

Sed Text Processing – Sed Deletes All Lines Instead of Selected Line

The /etc/fstab file becomes empty because you're not only using -i but also -n.

The -n option turns off the "default p (print) command" which is usually triggered at the end of each cycle.

With -n, your script truly outputs nothing, not even if you remove the -i option, because there is no p command in the script, not even the default p at the end of the cycle, which would have printed all lines that weren't deleted. Since the sed script outputs nothing, telling sed to do the changes in-place with -i empties the file.

So, to resolve your issue, remove -n from your command.

Personally, I would have written the code as

mntdir=/mnt/wsbackup

cp /etc/fstab /etc/fstab.orig
sed "\\:$mntdir:d" /etc/fstab.orig >/etc/fstab

This allows us to keep a copy of the original fstab file.

Alternatively,

mntdir=/mnt/wsbackup

sed -i.orig "\\:$mntdir:d" /etc/fstab

which would do pretty much the same thing, depending on what sed implementation you use.

Wrap that in a test on grep -q -F -e "$mntdir" /etc/fstab (similarly to what you did already) if you need to avoid doing anything to the file if the $mntdir string is not found in it, i.e.

mntdir=/mnt/wsbackup

grep -q -F -e "$mntdir" /etc/fstab &&
sed -i.orig "\\:$mntdir:d" /etc/fstab

Best Answer

Related Solutions

Concatenating fields from lines with different numbers of fields

Sed Text Processing – Sed Deletes All Lines Instead of Selected Line

Related Question