I have this regex:
[a-z]+[:.].*?\s
I run it on the following text:
regexbuddy.com
www.regexbuddy.com
http://regexbuddy.com cvc
http://www.regexbuddy.com cvcv
http://www.regexbuddy.com/ g
http://www.regexbuddy.com/index.html f
http://www.regexbuddy.com/index.html?source=library f
You can download RegexBu ddy at http://www.regexbuddy.com/download.html. f
"www.domain.com/quoted URL with spaces"
http://10.2.2.1.2/ttxx/txt/gg v
support@regexbuddy.com
I need to match the following – the bolded text only:
- regexbuddy.com
- www.regexbuddy.com
- http://regexbuddy.com cvc
- http://www.regexbuddy.com cvcv
- http://www.regexbuddy.com/ g
- http://www.regexbuddy.com/index.html f
- http://www.regexbuddy.com/index.html?source=library f
- You can download RegexBu ddy at http://www.regexbuddy.com/download.html. f
- "www.domain.com/quoted URL with spaces"
http://10.2.2.1.2/ttxx/txt/gg
v
support@regexbuddy.com
How can I do that?
UPDATE
@slhck your revised regex matches almost everything except when the url starts with www.
e.g
– "www.domain.com/quoted URL with spaces"
I made some changes to the regex to match the leading www. It looks like
(https?)://.(?=\s)|(www.).?(?=\s)
Can you please review ? and suggest if there exists better ways of matching it.
Best Answer
If you don't want to include the trailing whitespace in a match, use a negative lookahead:
In your example, this would match:
To further match only
http
orhttps
, and optionalwww
use something like:Here's John Gruber's regex to check for what looks like an URL, which appears to work quite well in your case:
But honestly, all those approaches will only get you false matches sooner or later. If you need a regular expression to parse URLs, see this Stack Overflow question: What is the best regular expression to check if a string is a valid URL?