Regex to parse URLs from text

regex

I have this regex:

[a-z]+[:.].*?\s

I run it on the following text:

regexbuddy.com
www.regexbuddy.com
http://regexbuddy.com cvc
http://www.regexbuddy.com cvcv
http://www.regexbuddy.com/ g
http://www.regexbuddy.com/index.html f
http://www.regexbuddy.com/index.html?source=library f
You can download RegexBu    ddy at http://www.regexbuddy.com/download.html. f
"www.domain.com/quoted URL with spaces"
http://10.2.2.1.2/ttxx/txt/gg v
support@regexbuddy.com

I need to match the following – the bolded text only:

How can I do that?

UPDATE

@slhck your revised regex matches almost everything except when the url starts with www.
e.g
"www.domain.com/quoted URL with spaces"

I made some changes to the regex to match the leading www. It looks like

(https?)://.(?=\s)|(www.).?(?=\s)

Can you please review ? and suggest if there exists better ways of matching it.

Best Answer

If you don't want to include the trailing whitespace in a match, use a negative lookahead:

[a-z]+[:.].*?(?=\s)

In your example, this would match:

regexbuddy.com
www.regexbuddy.com
http://regexbuddy.com
http://www.regexbuddy.com
http://www.regexbuddy.com/
http://www.regexbuddy.com/index.html
http://www.regexbuddy.com/index.html?source=library
http://www.regexbuddy.com/download.html.
www.domain.com/quoted
http://10.2.2.1.2/ttxx/txt/gg

To further match only http or https, and optional www use something like:

(https?):\/\/(www\.)?[a-z0-9\.:].*?(?=\s)

Here's John Gruber's regex to check for what looks like an URL, which appears to work quite well in your case:

(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

But honestly, all those approaches will only get you false matches sooner or later. If you need a regular expression to parse URLs, see this Stack Overflow question: What is the best regular expression to check if a string is a valid URL?

Related Question