Sed Expression – Deleting Lines with Repeated Fields

regular expressionsedtext processing

I was doing research on how to generate all non-repeating permutations of a set of a numbers without using recursion in Bash and I found this answer that worked, but I want to understand why.

Say you have three numbers: 1, 2, 3.

The following command will generate all possible non-repeating permutations:

printf "%s\n" {1,2,3}{1,2,3}{1,2,3} | sort -u | sed '/\(.\).*\1/d'
123
132
213
231
312
321

I understand what the printf with %s does when the argument is the brace expansions of the set {1, 2, 3} three times (which would print every single possible outcome).

I know that sort -u will output only unique lines.

I know that sed /<pattern>/d is used to delete any lines that match <pattern>.

Reading the pattern within the sed, I am somewhat confused. I know how to read regex but I don't see how this pattern works within the sed command.

\( = literal '('
.  = any character, once
\) = literal ')'
.* = any character, zero or more times
\1 = reference to first captured group

How does then the sed command remove non-unique values from this regex pattern? I don't understand how there's a reference to a captured group, when there's not really one? The parentheses are being used in the pattern to be matched literally? Everything about this execution makes sense to me until the sed command.

Best Answer

That's basic regular expressions (BRE) for sed by default, so \(.\) is a capture group containing any one character. Then the .* just skips everything, and \1 matches whatever the group matched. If the whole lot can be made to match, then some character showed up twice, once for the group, and once for the backreference.

In fact, if I'm not mistaken, that wouldn't even work with standard extended regular expressions, since (for whatever reasons) backreferences aren't supported in them. Backreferences are only mentioned under "BREs matching multiple characters", not under EREs, and in fact the same thing with ERE doesn't work on my macOS (it takes the \1 as meaning a literal number 1):

$ printf "%s\n" 122 321 | sed -E -e '/(.).*\1/d'
122

GNU tools do support backreferences in ERE, though.

(I don't think sort -u is necessary here, the combination of brace expansions should produce all combinations without duplicates.)

Related Question