Ubuntu – How to get WGET to download exact same web page html as browser

wget

Using a web browser (IE or Chrome) I can save a web page (.html) with Ctl-S, inspect it with any text editor, and see data in a table format. One of those numbers I want to extract, but for many, many web pages, too many to do manually. So I'd like to use WGET to get those web pages one after another, and write another program to parse the .html and retrieve the number I want. But the .html file saved by WGET when using the same URL as the browser does not contain the data table. Why not? It is as if the server detects the request is coming from WGET and not from a web browser, and supplies a skeleton web page, lacking the data table. How can I get the exact same web page with WGET? – Thx!

MORE INFO:

An example of the URL I'm trying to fetch is:
http://performance.morningstar.com/fund/performance-return.action?t=ICENX&region=usa&culture=en-US
where the string ICENX is a mutual fund ticker symbol, which I will be changing to any of a number of different ticker symbols. This downloads a table of data when viewed in a browser, but the data table is missing if fetched with WGET.

Best Answer

As roadmr noted, the table on this page is generated by javascript. wget doesn't support javascript, it just dumps the page as received from the server (ie before any javascript code runs) and so the table is missing.

You need a headless browser that supports javascript like phantomjs:

$ phantomjs save_page.js http://example.com > page.html

with save_page.js:

var system = require('system');
var page = require('webpage').create();

page.open(system.args[1], function()
{
    console.log(page.content);
    phantom.exit();
});

Then if you just want to extract some text, easiest might be to render the page with w3m:

$ w3m -dump page.html

and/or modify the phantomjs script to just dump what you're interested in.