Ubuntu – Grep beginning of line

grepregexsedtext processing

I have a file with the following contents:

(((jfojfojeojfow 
//
hellow_rld
(((jfojfojeojfow
//
hellow_rld

How can I extract every line that starts with a parenthesis?

Best Answer

The symbol for the beginning of a line is ^. So, to print all lines whose first character is a (, you would want to match ^(:

grep
```
grep '^(' file
```
sed
```
sed -n '/^(/p' file
```

Related Solutions

How to Grep for Multiple Patterns on Multiple Lines

Updated 18-Nov-2016 (since grep behavior is changed: grep with -P parameter now doesn't support ^ and $ anchors [on Ubuntu 16.04 with kernel v:4.4.0-21-generic])(wrong (non-)fix)

$ grep -Pzo "begin(.|\n)*\nend" file
begin
Some text goes here.  
end

note: for other commands just replace the '^' & '$' anchors with new-line anchor '\n' ______________________________

With grep command:

grep -Pzo "^begin\$(.|\n)*^end$" file

If you want don't include the patterns "begin" and "end" in result, use grep with Lookbehind and Lookahead support.

grep -Pzo "(?<=^begin$\n)(.|\n)*(?=\n^end$)" file

Also you can use \K notify instead of Lookbehind assertion.

grep -Pzo "^begin$\n\K(.|\n)*(?=\n^end$)" file

\K option ignore everything before pattern matching and ignore pattern itself.
\n used for avoid printing empty lines from output.

Or as @AvinashRaj suggests there are simple easy grep as following:

grep -Pzo "(?s)^begin$.*?^end$" file

grep -Pzo "^begin\$[\s\S]*?^end$" file

(?s) tells grep to allow the dot to match newline characters.
[\s\S] matches any character that is either whitespace or non-whitespace.

And their output without including "begin" and "end" is as following:

grep -Pzo "^begin$\n\K[\s\S]*?(?=\n^end$)" file # or grep -Pzo "(?<=^begin$\n)[\s\S]*?(?=\n^end$)"

grep -Pzo "(?s)(?<=^begin$\n).*?(?=\n^end$)" file

see the full test of all commands here (_{out of dated as grep behavior with -P parameter is changed})

Note:

^ point the beginning of a line and $ point the end of a line. these added to the around of "begin" and "end" to matching them if they are alone in a line.
In two commands I escaped $ because it also using for "Command Substitution"($(command)) that allows the output of a command to replace the command name.

From man grep:

-o, --only-matching
      Print only the matched (non-empty) parts of a matching line,
      with each such part on a separate output line.

-P, --perl-regexp
      Interpret PATTERN as a Perl compatible regular expression (PCRE)

-z, --null-data
      Treat the input as a set of lines, each terminated by a zero byte (the ASCII 
      NUL character) instead of a newline. Like the -Z or --null option, this option 
      can be used with commands like sort -z to process arbitrary file names.

Ubuntu – Move pattern to beginning of line

Try sed with the following regex:

$ sed -i.bak 's_\(.*\),[[:blank:]]\([[:alpha:]]\+,[[:blank:]][[:alpha:]]\+[[:blank:]][[:digit:]]\+,[^,]\+$\)_\2 \1_' file.txt 
Friday, Mar 13,2015 16:59:42 blah, blah, blah
Friday, Mar 13,2015 16:51:11 yadi, yadi, yada

Here we have used the sed's group substitution method to get the desired output.

$.*$ will match upto blah, blah, blah as we have ,[[:blank:]] to match , after it.
$[[:alpha:]]\+,[[:blank:]][[:alpha:]]\+[[:blank:]][[:digit:]]\+,[^,]\+$$ will match the remaining portion of the line (the portion we want to put at the start).

Then we have \2 \1 to put the second group at first and then then a space and then the first group.

The original file will be backed up as file.txt.bak, if you don't want that use just -i instead of -i.bak.

**Although you will get the desired output, using Regex/sed will not be the optimum solution in this case.

EDIT: If you have a line like [Internet disconnected] Friday, Mar 13,2015 15:48:34, try this:

$ sed -i.bak 's_\(.*[^,]\),*[[:blank:]]\([[:alpha:]]\+,[[:blank:]][[:alpha:]]\+[[:blank:]][[:digit:]]\+,[^,]\+$\)_\2 \1_' file.txt 
Friday, Mar 13,2015 15:48:34 [Internet disconnected]
Friday, Mar 13,2015 16:59:42 blah, blah, blah
Friday, Mar 13,2015 16:51:11 yadi, yadi, yada

In the previous regex we had $.*$,[[:blank:]] (a comma and a whitespace after the first matching group), now to include the new line in the output we have made the first matching group $.*[^,]$ to ensure that it does not end with a comma and then we have matched ,* i.e. one or more commas. So, the new sed command will work for all mentioned cases.