Shell – Using the cURL command, how does one fetch stock data which comes back *without* commas and spaces

command linecurlosxshell-script

What can I add to the below script so that it fetches the stock data and brings it back WITHOUT any commas and spaces? For example GOOG Outstanding shares is 675,000,000.

I desire the output of 675000000 for my txt file. (no spaces or commas or punctuation). I need decimal functionality however for share prices though).

cd desktop/quoteUpdate
while true
do
 curl -o quotes.txt -s "http://download.finance.yahoo.com/d/quotes.csv?s=avxl,goog,aapl&f=snl1c6j2s6f6"
 sed -i '.bak' 's/,/ /g' quotes.txt # replace commas with spaces
echo UPDATED:
date
sleep 10
done

Best Answer

The problem is, that while the url would suggest it is a CSV, it really is not - the share volumes that contain commas are not properly quoted. That said you'll need to employ additional knowledge. In this case, try changing the format of the output from:

http://download.finance.yahoo.com/d/quotes.csv?s=avxl,goog,aapl&f=snl1c6j2s6f6

producing:

"AVXL","ANAVEX LIFE SCIEN",0.1799,"-0.0041",    38,260,000,0,    23,703,000
"GOOG","Google Inc.",500.87,"+4.69",   678,365,000,67.911B,   572,967,000
"AAPL","Apple Inc.",109.80,"-0.42",  5,864,839,000,182.8B,  5,856,335,000

to e.g.:

http://download.finance.yahoo.com/d/quotes.csv?s=avxl,goog,aapl&f=sl1c6sj2ss6sf6

which yields:

"AVXL",0.1799,"-0.0041","AVXL",    38,260,000,"AVXL",0,"AVXL",    23,703,000
"GOOG",500.87,"+4.69","GOOG",   678,365,000,"GOOG",67.911B,"GOOG",   572,967,000
"AAPL",109.80,"-0.42","AAPL",  5,864,839,000,"AAPL",182.8B,"AAPL",  5,856,335,000

You can then parse this with e.g.:

sed 's/"[A-Z][^"]*",/ & /g' \
| awk -- '{
        gsub("\"", "", $2);
        gsub(",", "", $4);
        gsub(",", "", $8);
        print $1 $2 $4 "," $6 $8
    }'

which will give you something more CSV-like:

"AVXL",0.1799,-0.0041,38260000,0,23703000
"GOOG",500.87,+4.69,678365000,67.911B,572967000
"AAPL",109.80,-0.42,5864839000,182.8B,5856335000

The trick is that the ticker symbol is a well-matchable thing and you can thus use it as an anchor where you need it.

The magic incantation above does this:

  • the sed invocation surrounds the occurrences of the ticker symbols (which are double quoted strings beginning with a capital letter) with spaces, thus marking it effectively a white-spaces separated lists

  • awk first replaces all double quotes (first line) and commas (second and third line) in fields 2 (to prevent the price change field being double quoted and thus being treated as a string instead of a floating point number if you then start processing it with a spreadsheet) and 4 and 8 respectively. The last line prints the modified fields (and omits the now superfluous additional ticker symbols).

Thus in the end you can do it like this:

curl -s 'http://download.finance.yahoo.com/d/quotes.csv?s=avxl,goog,aapl&f=sl1c6sj2ss6sf6' \
| sed 's/"[A-Z][^"]*",/ & /g' \
| awk -- '{
        gsub("\"", "", $2);
        gsub(",", "", $4);
        gsub(",", "", $8);
        print $1 $2 $4 "," $6 $8
    }'

Note the \ backslashes at the end of the lines - these make sure, that the commands are not invoked separately, but rather as if they were on one line. This notation is used to enhance readability. The backslashes are not used in the four line AWK script, since that is surrounded by quotes and the new lines are thus part of the whole command. And be sure to read some basic tutorials on UNIX shell scripting - it will save you lots of time later on.

Also note the quotes around the URL - these make sure that special characters (& in this case) don't get interpreted by the shell.

Related Question