I realize I'm asking a similar question that was already asked and answered but I was not able to extrapolate the answer I needed since the regex and regex engine is different enough. I have hardware asset management logs which are pipe delimited but not are major delimited between endpoints. The logs look like this:


What I would like to do is replace every 6th | with a carriage return to look like this:


The closest I've gotten selects each endpoint but I'm not quite sure how to utilize it using powershell.


I'm familiar with the replace command in PS and I'm imagining the end result would be something to this effect:

$hosts = $hosts -replace "<highspeed_low_drag_velcro_snap_regex_here>","\r\n"

Thanks in advance!

Best Answer

Ok, so this one's actually a little tricky. Arguably, regex isn't the best tool for the job, but it can do it.

-replace "(?<=^((\|[^|]*){5})+)\|","`n|"

I'll try to walk you through it:

  • Your text has a section you want to match and a section you want to replace. Traditionally, regex replaces the entire search string, so you would use a capture group to specify some part of the search string to be cloned to the replacement output. Another way is to use a lookaround, which is what I've done here. PowerShell (.NET) is one of the few regex languages that supports variable-length lookbehinds, so we're in luck.
  • The (?<=) section is a lookbehind. That means everything between the = and ) is matched but not replaced. So ^((\|[^|]*){5})+ is used as a condition - the replacement will only happen if this bit matches the text before the intended replacement.
  • The ^((\|[^|]*){5})*[^|]* section can be summed up as "from the start of the line (^), match sets of five |s, and then match the text up to the next |".
    • The start of the line ^ is important - otherwise it can match anywhere in the line and there's no guarantee of how many |s came before.
    • Because | has a special meaning in regex, it needs to be escaped: \|. It does not need to be escaped when within a character class ([]).
    • [^|]* means "text up to the next |" — more technically, "as many characters other than | as possible" — more technically "repeat the [^|] character class as many times as possible, where that character class matches any character other than |".
    • * means "zero or more repetitions of the previous character, as many as possible"
    • So (\|[^|]*) means match | followed by as many characters as possible up till the next |. This will match |text
    • {5} means repeat the previous token exactly 5 times. It's exactly equivalent to copy-pasting the preceding token 5 times. So this will match |text|text|text|text|text
    • ((\|[^|]*){5})+ is one or more repetitions of that entire group. So it can match |text|text|text|text|text, |text|text|text|text|text|text|text|text|text|text, etc. - in multiples of 5. The reason we use + instead of * is we don't want to match the empty group and replace the very first |.
    • And that makes the entire lookbehind, meaning it will only replace a | with exactly a multiple of 5 |s behind it, from the start of the line.
  • Following that up with a \| as the actual text to replace, preceded by the matched lookbehind.
  • Taking your example |STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3, it will match the following:


You'll notice here (if you haven't already) that you're actually trying to replace every 5th | minus the first, not every 6th. But the lookbehind method handles the "minus the first" situation fairly cleanly.

And now the replacement string.

  • Because this is PowerShell, when we want \n, we actually want `n because the PowerShell escape character is `. Note that this is only necessary in the replacement string; in the regex itself you would still use \n to pass that literal sequence to the regex engine.
  • And because you have a leading | on every line, we need to add a new | after the new line. This works out because your original lines do not end with a |, therefore there is nothing to replace at the end of the lines, therefore we don't end up with an extra new line nor trailing |.

If you prefer the more traditional capture group method:

-replace "((?:[^|]+\|){4}[^|]+)\|","`$1`n|"

Figuring out how this works is left as an exercise to the reader ;) Tip: the $1 backreference has to be escaped (with `) because otherwise PowerShell interprets it as a shell variable.

