Firefox – How to tell Firefox to ignore unprintable characters

firefoxspecial characters

Edit: Summary

Apparently the intended character to display in this case is an "en-dash".

This page has a table half way down that shows that for the –, some software will convert the correct hex code of 2013 to 0096. (look at the first row in the table).

This answer on Stackoverflow explains that somehow this is a mixup between Windows-1252 and UTF-8

This blog article enforces this:

Character 150 (0x96) is the unicode
character "START OF GUARDED AREA" in
the non-displayed C1 control character
range, but in the Windows-1252
encoding it's mapped to to the
displayable character 0x2013 "en-dash"
(a short dash).

Others have struggled with this when producing content, as this answer on Stackoverflow shows how to replace 0x0096 with 0x2013.

Google must realize this, because as stated in my original question below, Google's cached version of the Amazon page has – so it seems they are automatically correcting these mistakes on pages they cache.

I have tried setting my encoding to Windows-1252 but that does not help.

So now I guess my question is, how can I tell Firefox to ignore unprintable characters like these?


Original content below:


(Firefox 3.6.13 on Windows XP)

Every once in a while I notice an odd character on certain web pages when browsing the web. It is a outline of a box with a 4-digit number inside.

And example of a page that has these characters is:
http://aws.amazon.com/ec2/#highlights

After each section heading (Elastic, Completely Controlled, …) I see a box with the number "0096" inside. I looked at the cached version on Google, and google has – in it's place, so I'm guessing I should be seeing a dash there instead of the box with the numbers in it.

I have tried changing the character encoding in Firefox but haven't been able to find one that shows these characters correctly.

Is there a way to allow Firefox to view these characters?

Thanks in advance!

Edit – adding a screen shot of the "special" characters:

alt text

Edit #2 – tried in Ubuntu – new screenshots

I logged into my Ubuntu desktop and browsed to the amazon page in Chrome and Firefox. Chrome completely ignores character, even if I inspect or view page source. Firefox in Unbutu displays the character exactly like Firefox on my Windows XP box. I copied the character and played around with it at the command line – here is a screenshot of the results:

alt text

It looks like I can paste the character into this post as well: “

It is definitely not isolated to Windows XP. I tried setting the character encoding for my terminal to Windows 1252 (from Dennis' comment below), but then it just displays this character as a question mark.

I pulled the webpage down with wget and with curl, and both outputs show this characters as: <96>

It makes me wonder if this character renders correctly for anyone? It appears webkit just ignores it, my IE6 ignores it, Firefox displays the box with the numbers in it. I would have to imagine the design team at Amazon can see it correctly?

It's not a huge deal to get these characters displaying correctly, but it would be nice to know if there is a solution to this.

Best Answer

0096 is most likely an ASCII reference to the ' char which can be displayed within HTML as &#96;

Looking at your link however the HTML looks normal and there is no reference to &ndash;

...

<p><span class="product_highlights">Elastic</span>  Amazon <span class="caps">EC2</span> enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.</p> 


    <p><span class="product_highlights">Completely Controlled</span>  You have complete control of your instances. You have root access to each one, and you can interact with them as you would any machine. You can stop your instance while retaining the data on your boot partition and then subsequently restart the same instance using web service APIs. Instances can be rebooted remotely using web service APIs. You also have access to console output of your instances.</p> 


    <p><span class="product_highlights">Flexible</span>  You have the choice of multiple instance types, operating systems, and software packages.  Amazon <span class="caps">EC2</span> allows you to select a configuration of memory, <span class="caps">CPU</span>, instance storage, and the boot partition size that is optimal for your choice of operating system and application.  For example, your choice of operating systems includes numerous Linux distributions, Microsoft Windows Server and OpenSolaris.</p> 

...

Firefox should have no issues displaying the dash glyph as I just tested on 3.6.*...

<html>
    <head>
        <body>
            My dash is &ndash;
        </body>
    </head>
</html>

...copy and paste the above code in a test document and name it test.html and open it up in Firefox. It should display your dash without any problems.

EDIT: As pointed out by Dave 0x96 is the ANSI equivalent of en dash. With this understanding it appears that this is a parsing issue with regards to the doctype specifiction within the page itself. Check out this thread.

You could extract the HTML and modify the doctype to see if this indeed where the issue is stemming from. It is most likely a cross between encoded values ie...ANSI -> Unicode; as Unicode the value is a non-printable char.

Related Question