I am trying to use sed to extract the value part of one of the many key-value pairs in a URL's query string
This is what I am trying:
echo 'http://www.youtube.com/watch?v=abc&g=xyz' | sed 's@^https?://(www.)?youtube.com/(watch\\?)?.*?v(=|/)([a-zA-Z0-9\-_]*)(&.*)?$@$4@'
but it always outputs the input URL as is.
What am I doing wrong?
Update 1
To clarify some issues:
- The regex is more complicated than it has to be because I am also trying to check the validity of the input and generate the output only if the input is valid. So a stricter match.
- The desired output is the value of the key 'v' in the query string.
- Have been unable to find the version of
sed
that I am using, but it's the one that comes with Mac OS X (10.7.5). - In my version of
sed
$1, $2 etc. seem to be the matches, \1, \2 etc. give the error:
sed: 1: "s@^https?://(www.)?yout ...": \4 not defined in the RE
Not correct! as I found out later. Apologies for causing the confusion.
Update 2
Have updated the sed
RE to make it more specific based on suggestion by @slhck below, but the issue remains as before.
Update 3
Based on the man
page for this version of sed
it appears that this is a BSD-flavoured version.
Best Answer
Even simpler, if you just want the
abc
:If you want the
xyz
:EXPLANATION:
awk
: is a scripting language that automatically processes input files line by line, splitting each line into fields. So, when you process a file withawk
, for each line, the first field is$1
, the second$2
etc up to$N
. By defaultawk
uses blanks as the field separator.-F'[=&]'
:-F
is used to change the field delimiter from spaces to something else. In this case, I am giving it a class of characters. Square brackets ([ ]
) are used by many languages to denote groups of characters. So, specifically,-F'[=&]'
means thatawk
should use both&
and=
as field delimiters.Therefore, given the input string from your question, using
&
and=
as delimiters,awk
will read the following fields:So, all you need to do is print whichever one you want
{print $4}
.You said you also want to check that the string is a valid youtube URL, you can't do that with
sed
since if it does not match the regex you give it, it will simply print the entire line. You can use a tool likePerl
to only print if the regex matches:Finally, to simply print
abc
you can use the standard UNIX toolcut
: