Categories
Miscellaneous

Scraping HTML graphic elements: possibilities and limits

Question: “How do I set up a daily automatic scraping of www.pollen.com data into a Google sheet?” (link)

Answer: Originally I doubted if svg HTML elements are scrapable. After some trial and error experience I realized, that svg elements are indeed scrapable; one can get their xPath, children nodes. Yet, they are scrapable by importXML() when being static html.

svg-elements

Static or dynamic

In the case of the site pollen.com, the data charts are loaded dynamically, where a controlling JS is embedded into iFrames. The importXML() google function as well as other similar ones deal only with the static XML/HTML code of a web page. The use of dynamic scrapers is needed in that case. See the list of scraping software, the first 5 in that list being dynamic scrapers.

I’ve added some info about a difference between SVG and Canvas below.

Difference between SVG and Canvas

(from w3schools.com)

SVG is a language for describing 2D graphics in XML.

Canvas draws 2D graphics, on the fly (with a JavaScript).

SVG is XML based, which means that every element is available within the SVG DOM. You can attach JavaScript event handlers for an element. In SVG, each drawn shape is remembered as an object. If attributes of an SVG object are changed, the browser can automatically re-render the shape.

Canvas is rendered pixel by pixel. In Canvas, once the graphic is drawn, it is forgotten by the browser. If its position should be changed, the entire scene needs to be redrawn, including any objects that might have been covered by the graphic.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.