How to convert html entities to readable text

character encodinghtmltext processing

I want html number entities like ę and want to convert it to real character. I have emails mostly from linkedin that look like this:

chciałabym zapytać, czy rozważa Pan takze
udział w nowych projektach w Warszawie ? Obecnie poszukujemy
specjalisty javascript/architekta z bardzo dobrą
znajomością Angular.js do projektu, który dotyczy
systemu, służącego do monitorowania i
zarządzania flotą pojazdów. Zespół, do
którego poszukujemy

I'm using clawsmail, switching to html don't convert it to text, I've try to copy and use

xclip -o -sel clip | html2text | less

but it didn't convert the entities. Is there a way to have that text using command line tools?

The only way I can think of is to use data:text/html,<PASTE THE EMAIL> and open it in a browser, but would prefer the command line.

Best Answer

With Free recode (formerly known as GNU recode):

recode html < file

If you don't have recode or HTML::Entities and only need to decode &#x<hex>; entities, you could do it by hand with:

perl -Mopen=locale -pe 's/&#x([\da-f]+);/chr hex $1/gie'
Related Question