I realize I'm asking a similar question that was already asked and answered but I was not able to extrapolate the answer I needed since the regex and regex engine is different enough. I have hardware asset management logs which are pipe delimited but not are major delimited between endpoints. The logs look like this:
|STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3
What I would like to do is replace every 6th |
with a carriage return to look like this:
|STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1
|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2
|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3
The closest I've gotten selects each endpoint but I'm not quite sure how to utilize it using powershell.
[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*
I'm familiar with the replace command in PS and I'm imagining the end result would be something to this effect:
$hosts = $hosts -replace "<highspeed_low_drag_velcro_snap_regex_here>","\r\n"
Thanks in advance!
Best Answer
Ok, so this one's actually a little tricky. Arguably, regex isn't the best tool for the job, but it can do it.
I'll try to walk you through it:
(?<=)
section is a lookbehind. That means everything between the=
and)
is matched but not replaced. So^((\|[^|]*){5})+
is used as a condition - the replacement will only happen if this bit matches the text before the intended replacement.^((\|[^|]*){5})*[^|]*
section can be summed up as "from the start of the line (^
), match sets of five|
s, and then match the text up to the next|
".^
is important - otherwise it can match anywhere in the line and there's no guarantee of how many|
s came before.|
has a special meaning in regex, it needs to be escaped:\|
. It does not need to be escaped when within a character class ([]
).[^|]*
means "text up to the next|
" — more technically, "as many characters other than|
as possible" — more technically "repeat the[^|]
character class as many times as possible, where that character class matches any character other than|
".*
means "zero or more repetitions of the previous character, as many as possible"(\|[^|]*)
means match|
followed by as many characters as possible up till the next|
. This will match|text
{5}
means repeat the previous token exactly 5 times. It's exactly equivalent to copy-pasting the preceding token 5 times. So this will match|text|text|text|text|text
((\|[^|]*){5})+
is one or more repetitions of that entire group. So it can match|text|text|text|text|text
,|text|text|text|text|text|text|text|text|text|text
, etc. - in multiples of 5. The reason we use+
instead of*
is we don't want to match the empty group and replace the very first|
.|
with exactly a multiple of 5|
s behind it, from the start of the line.\|
as the actual text to replace, preceded by the matched lookbehind.Taking your example
|STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3
, it will match the following:You'll notice here (if you haven't already) that you're actually trying to replace every 5th
|
minus the first, not every 6th. But the lookbehind method handles the "minus the first" situation fairly cleanly.And now the replacement string.
\n
, we actually want`n
because the PowerShell escape character is`
. Note that this is only necessary in the replacement string; in the regex itself you would still use\n
to pass that literal sequence to the regex engine.|
on every line, we need to add a new|
after the new line. This works out because your original lines do not end with a|
, therefore there is nothing to replace at the end of the lines, therefore we don't end up with an extra new line nor trailing|
.If you prefer the more traditional capture group method:
Figuring out how this works is left as an exercise to the reader ;) Tip: the
$1
backreference has to be escaped (with`
) because otherwise PowerShell interprets it as a shell variable.