I want to find the string
Time series prediction with ensemble models
in a pdf fle using shell script.I am using pdftotext "$file" - | grep "$string"
.where $file
is the pdf file name and $string
is the above string.It can find out the line if the entire string contains in a line.but it can't find out line like:
Time series prediction with
ensemble models
how can I resolve it.I am new to linux. so explanation in detail is appreciated.thanks in advance.
Best Answer
One possible way might be to replace
grep
bypcregrep
(available from the 'universe' repository), which supports multiline matches, and then instead of searching for the literal stringsearch instead for the perl compatible regular expression (PCRE)
where
\s+
stands for one or more whitespace characters (including newlines). Using the bash shell's built-in string substitution capabilities to perform the latter stepIf you can't use
pcregrep
then you might be able to get the output you want using plaingrep
with the-z
switch: this tellsgrep
to consider the input "lines" to be delimited byNUL
characters rather than newlines - in this case, effectively making it treat the whole input as a single line. So for example if you only want to print the matches (without context)