Ubuntu – Extracting text from file and output the line into a file

bashcommand line

After using grep on an html file, I get the following output:

      <div id="v3060000-3062005" class="BLAH...>
      <div id="v50001027-50002018" class="BLAH...>
      <div id="v907200-907202" class="BLAH...>
      <div id="v20024011-20024012" class="BLAH...>

I need to extract the strings of numbers from the lines above and combine them into a URL such as:

http://x.y.z/3060000-3062005,50001027-50002018,907200-907202,20024011-20024012.mp3

May I know how I can do this using a shell script?

Best Answer

Normally, I would advise that you use a proper HTML parser to parse HTML.

However, this data looks pretty straightforward: using a double quote (optionally followed by "v") as the field separator, grab the 2nd field of each line. Then join the pieces with commas

result=$( grep ... file.html | awk -F'"v?' '{print $2}' | paste -sd, )
echo "http://x.y.z/$result.mp3"

Related Solutions

Ubuntu – How to remove the filename from wc -l output

I'd use:

<file wc -l

Which contarily to cat file | wc -l doesn't need to fork a shell and to run another process (and runs faster):

% time </tmp/ramdisk/file wc -l     
8000000
wc -l < /tmp/ramdisk/file  0,07s user 0,06s system 97% cpu 0,132 total
% time cat /tmp/ramdisk/file | wc -l
8000000
cat /tmp/ramdisk/file  0,01s user 0,16s system 80% cpu 0,204 total
wc -l  0,09s user 0,10s system 94% cpu 0,203 total

(/tmp/ramdisk/file was stored in a ramdisk to take I/O and caching out of the equation.)

However for small files indeed the difference is neglectable.

Yet another way would be:

wc -l file | cut -d ' ' -f 1

Which in my tests performs approximately the same as <file wc -l.

Ubuntu – grep command for a text file in multiple directories

To recursively search using grep, use the -R option.

To search for an exact string, use -F, so that 2* isn't treated as a regular expression.

To search only on specific filenames, use the --include option. Combined:

grep -FR --include=DATA.txt '2* x' main_directory > another_text_file

Best Answer

Related Solutions

Ubuntu – How to remove the filename from wc -l output

Ubuntu – grep command for a text file in multiple directories

Related Question