sort
operates on entire lines. By default, it sorts on the entire contents of that line, but -k
can be used to sort on one or more fields within those lines. -t
can be used to change the delimiter between fields. I can't think of a case where using -t
without also using -k
makes any sense.
Your second command, which is equivalent to:
printf "%s\n%s\n" "110,20,30,13" "kill,gill,burger" | sort -t',' -n
produces:
kill,gill,burger
110,20,30,13
Which is what I'd expect. -t','
has no effect because it's changing the field delimiter when you haven't told sort to operate on individual fields, and so k
is sorted before 1
because its numerical value is 0 (and you requested numerical ordering using -n
).
AWK
Using GNU awk or mawk:
$ awk '$1~"^"word{printf("--\n%s",$0)}' word='are' RS='--\n' infile
--
are you happy
--
are(you hungry
too
This sets the variable word to the word to match at the beginning of the record and RS (record separator) to '--' followed by a new line \n
. Then, for any record which starts with the word to match ($1~"^"word
) print a formatted record. The format is a starting '--' with a new line with the exact record found.
GREP
Using (GNU for the -z
option) grep:
grep -Pz -- '--\nare(?:[^\n]*\n)+?(?=--|\Z)' infile
grep -Pz -- '(?s)--\nare.*?(?=\n--|\Z)\n' infile
grep -Pz -- '(?s)--\nare(?:(?!\n--).)*\n' infile
Description(s)
For the following descriptions, the PCRE option (?x)
is used to add (a lot) of explaining comments (and spaces) inline with the actual (working) regex. If the comments (and most spaces) (up to the next newline) are removed, the resulting string is still the same regex. This allow the description of the regex in detail in working code. This makes code maintenance a lot easier.
Option 1 regex (?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)
(?x) # match the remainder of the pattern with the following
# effective flags: x
# x modifier: extended. Spaces and text after a #
# in the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-Capturing Group (?:[^\n]*\n)+?
[^\n] # matches non-newline characters
* # Quantifier — Matches between zero and unlimited times, as
# many times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
) # Close the Non-Capturing Group
+? # Quantifier — Matches between one and unlimited times, as
# few times as possible, expanding as needed (lazy)
# A repeated capturing group will only capture the last iteration.
# Put a capturing group around the repeated group to capture all
# iterations or use a non-capturing group instead if you're not
# interested in the data
(?= # Positive Lookahead (?=--|\Z)
# Assert that the Regex below matches
# 1st Alternative --
-- # matches the characters -- literally (case sensitive)
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or before
# the line terminator right at the end of the
# string (if any)
) # Closing the lookahead.
Option 2 regex (?sx)--\nare.*?(?=\n--|\Z)\n
(?sx) # match the remainder of the pattern with the following eff. flags: sx
# s modifier: single line. Dot matches newline characters
# x modifier: extended. Spaces and text after a # in
# the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
.*? # matches any character
# Quantifier — Matches between zero and unlimited times,
# as few times as possible, expanding as needed (lazy).
(?= # Positive Lookahead (?=\n--|\Z)
# Assert that the Regex below matches
# 1st Alternative \n--
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally.
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or
# before the line terminator right at
# the end of the string (if any)
) # Close the lookahead parenthesis.
\n # matches a line-feed (newline) character (ASCII 10)
Option 3 regex (?xs)--\nare(?:(?!\n--).)*\n
(?xs) # match the remainder of the pattern with the following eff. flags: xs
# modifier x : extended. Spaces and text after a # in are ignored
# modifier s : single line. Dot matches newline characters
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-capturing group (?:(?!\n--).)
(?! # Negative Lookahead (?!\n--)
# Assert that the Regex below does not match
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally
) # Close Negative lookahead
. # matches any character
) # Close the Non-Capturing group.
* # Quantifier — Matches between zero and unlimited times, as many
# times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
sed
$ sed -nEe 'bend
:start ;N;/^--\nare/!b
:loop ;/^--$/!{p;n;bloop}
:end ;/^--$/bstart' infile
Best Answer
Drav's
awk
solution is good, but that means running onesort
command per paragraph. To avoid that, you could do:Or you could do the whole thing in
perl
:Note that above, separators are blank lines (for the
awk
one, lines with only space or tab characters, for theperl
one, any horizontal or vertical spacing character) instead of empty lines. If you do want empty lines, you can replace!NF
with!length
or$0==""
, and/\S/
with/./
.