In respond to:
12345678 TXT 19700101 0 100 20160624 100 Comment text
12345678 TXT 19700101 100 100,25 20160624 0,25 Comment text
12345678 TXT 19700101 100,25 100,5 20160624 0,25 Comment text
For 4th column:
^((?:\S+\s+){3}\d+)(\s)
to \1,0\2
^((?:\S+\s+){3}\d+,\d)(\s)
to \10\2
For 5th/7th column:
similar to above, just replace {3}
with {4}
/{6}
in the rule respectively
Explanation
The 1st rule appends ,0
to numbers without ,
. Now all numbers must have ,\d
.
The 2nd rule appends a 0
to those with single digit after comma.
As for (?:)
:non-capture group, the previous columns are already captured as \1
so additional capturing is unnecessary.
This only pads number to 2 decimal places. To pad an arbitrary amount, use the pad excessively, then trim
approach.
Final word?
In my opinion, plain regex as in notepad++ is inadequate for this task. Some basic scripting like bash or perl would have handled this with much higher readability.
Using lookahead is much more efficient and can deal with any number of alternations without increasing complexity:
- Ctrl+F
- "Find what":
^(?=.*\b35=D\b)(?=.*\bEUR/USD\b)(?=.*\b150=8\b).+$
- Check "Match case"
- Check "Wrap around"
- Check "Regular expression"
- Uncheck "
.
matches newline"
- Find All in Current Document
Explanation:
^ # Beginning of line
(?= # Start positive lookaead, make sure we have after:
.* # 0 or more any character but newline
\b # Word boundary to be sure not matching 135=DATA
35=D # Literally
\b # word boundary
) # End lookahead
(?= # Start positive lookaead, make sure we have after:
.* # 0 or more any character but newline
\b # Word boundary
EUR/USD # Literally
\b # Word boundary
) # End lookahead
(?= # Start positive lookaead, make sure we have after:
.* # 0 or more any character but newline
\b # Word boundary
150=8 # Literally
\b # Word boundary
) # End lookahead
.+ # One or more any character but newline
$ # End of line
Screen capture:
Best Answer
I'm not 100% clear on what your requirements are (that is, what the possible contents of the lines to be deleted are) but I've made a regex going off of the assumption that the line starts with between 1 and 3 numbers, followed by an optional full stop, and without anything else on the line (beginning or end).
Find what :
^\d{1,3}\.?\r?\n?
Replace with : (blank)
Tested on the following dataset
which leaves behind
Explanation
^
is a special character that matches the very start of the line only. This is just to make sure that the pattern isn't simply on the end of a longer line.\d
is used to match any numerical value (that is, 0-9).{1,3}
is used to modify the previous statement (in this case,\d
), and is used to specify the minimum and maximum number of times that you want that statement to match. Therefore, this line says that you want to match a number between 1 and 3 times (inclusive).\.
is used to match the dot character..
is a special character in regex, used to match any character (except a newline by default, although there is an option in Notepad++ to change this behaviour). Due to this, we need to escape it with a backslash to make sure that it is taken as.
"the character" and not.
"the matching pattern".?
is used to modify the last statement similar to{1,3}
, but this time it is used to say that the previous statement ([.]
) is optional (technically it says to match it between 0 and 1 times, but the end result is the same).\r?\n?
is used to match the new line as you mentioned already. The regex would work without this, but it wouldn't remove the line, only clear it (leaving a blank line behind). By making both\r
and\n
optional, this will become portable across Windows, Linux, and Mac.Finally, the reason that we leave the
Replace with :
field blank is simply because of the fact that we don't want anything to go back into the line that we're deleting.Hopefully this is what you were looking for and, if not, points you in the right direction.