Grep to ignore patterns

grep

I am extracting URLs from a website using cURL as below.

curl www.somesite.com | grep "<a href=.*title=" > new.txt

My new.txt file is as below.

<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">
<a href="http://websitenotneeded.com" title="something NOTNEEDED">

However, I need to extract only the below information.

<a href="http://website1.com" title="something">
<a href="http://website2.com" information="something" title="something">

I am trying to ignore the <a href which have information in them and whose title end with NOTNEEDED.

How can I modify my grep statement?

Best Answer

I'm not fully following your example + the description but it sounds like what you want is this:

$ grep -v "<a href=.*title=.*NOTNEEDED" sample.txt 
<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">

So for your example:

$ curl www.example.com | grep -v "<a href=.*title=" | grep -v NOTNEEDED > new.txt
Related Question