Categories
Development

Cheerio scraper escapes special symbols with html entities when performing .html()

As developers scrape data off the web, we use Node.js along with handy Cheerio scraper. When fetching .html() Cheerio parser returns the special symbols as HTML encoded entities, eg.:
ä as ä
ß as ß

Cheerio developer vindication of the parser action

(1) It’s not the job of a parser to preserve the original document. 
(2) .html() returns an HTML representation of the parsed document, which doesn’t have to be equal to the original document.
source.