Web Scraper Shortcode Plugin Issue

Since we’ve reviewed the Web Scraper Shortcode, we consider now some issues with this Word Press plugin. It is the Word Press plugin for extracting a web page or a part of it and inserting it into a custom Word Press driven page.

Some users have pointed out that the issue with this plugin is not being able to extract specific elements of a web page. They wanted to get some finance info from this page http://ca.finance.yahoo.com/q?s=rab.v&ql=1, and particularly this element: <div class=yfi_rt_quote_summary>.

To scrape this element I inserted element=’div#yfi_rt_quote_summary’ into the plugin shortcode and it worked fine:

But from the same page the other DOM (Data Object Model) element: <span id=”yfs_j10_rab.v”>62.94M</span> was not scraped by the plugin.

The figure below shows the element in question inspected through Chrome Developers’ Tools:

So inside of the web scraper shortcode, when I defined element=’span#yfs_j10_rab.v’, the plugin’s logic didn’t reveal the corresponding elements.

I suppose that’s because the yfs_j10_rab.v notation (dot in between) could be considered by the plugin as consecutive elements, like node1.node2 .

For this issue, when the class or id name of a DOM element is a compound one, consisting of several parts with the dot (.) delimiter, the Web Scraper Shortcode is not suitable for extracting those elements.

Leave a Reply Cancel reply