I am extracting URLs from a website using cURL as below.
curl www.somesite.com | grep "<a href=.*title=" > new.txt
My new.txt file is as below.
<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">
<a href="http://websitenotneeded.com" title="something NOTNEEDED">
However, I need to extract only the below information.
<a href="http://website1.com" title="something">
<a href="http://website2.com" information="something" title="something">
I am trying to ignore the <a href
which have information in them and whose title end with NOTNEEDED.
How can I modify my grep statement?
Best Answer
I'm not fully following your example + the description but it sounds like what you want is this:
So for your example: