PowerShell regex split on single and double white space

powershellregex

I've got an issue and I'm sure it's something simple I'm overlooking or not understanding when using regex and the alternation method for expression matching for double and single white space.

I'm using the shorthand meta characters \s|\s\s with PowerShell -split to return multiple strings objects with each on a new line per each single or double white space so everything else is on its own line splitting as expected.

Example Data and PowerShell Command

Note: Unfortunately this is an example of the data I'm working with and I have no control over that so it'll have the single and double white space both

$Content = "Data is over here
and here is some down  under too"

$Content -split "\s|\s\s"

Result

Data
is
over
here

and
here
is
some
down

under
too

Expected Result

Data
is
over
here
and
here
is
some
down
under
too

Environment Specs

  • Windows 10 Pro X64
  • PowerShell 5.0

Question

I'd like to understand what's going on with the regex format I'm using with the shorthand meta character alternation syntax but I'll consider a workaround as well if I get nothing definitive otherwise.

Best Answer

Use this instead, which means one or more occurrences of any white space character, such as tabs, spaces, and so forth:

$Content -split "\s{1,}"

Result:

PS C:\WINDOWS\system32> $Content = "Data is over here
and here is some down  under too"

$Content -split "\s{1,}"
Data
is
over
here
and
here
is
some
down
under
too

PS C:\WINDOWS\system32> 
Related Question