Visual Web Ripper is a visual multi-featured data extractor. It easily scrapes dynamic pages (including Ajax) and works well for a variety of databases for data output. This product is created by Sequentum group.
Visual Web Ripper is an excellent tool for automated web scraping. This very user-friendly visual project designer extracts complete data structures, such as product catalogues. If needed Visual Web Ripper may repeatedly submit forms for all possible input values; that’s important for a multiple search. With the recent explosion of web content, this product is complex enough to extract data from highly dynamic websites including AJAX websites.
As we were working with Visual Web Ripper, we found it simple to operate and a highly convenient tool. In fact, when I looked at the hierarchy of templates and content elements, it reminded me of the object-oriented programming concept designs used by developers. Additionally, Visual Web Ripper features data extraction from websites using CAPTCHA protection.
The basic idea of Visual Web Ripper in creating a project is to make a set of templates and following content fields for extraction. When the project is set, we define the data output format and launch the project. In the picture, one can see the basic templates and content elements hierarchy:
The templates differ in type, so they might work as list of links, page area cropping, navigation to “Next” links or other.
After pointing to the data needed, it defines data extraction fields (elements) for future harvesting.
The elements may undergo script transformation (if written) or built-in changes, but basically they are ripped with default options. Then, one needs to set the destination data source.
The project runs features in debugging mode, WebBrowser agent or WebCrawler agent modes option and scraped data preview.
That’s how easy it can be.
Visual Web Ripper is able to extract data from even the most difficult websites. For this, its features work by:
- proxying to hide IP-address,
- bypassing CAPTCHAs,
- submitting forms and logging,
- using Internet Explorer to connect to a website through a random time delay between requests (thus being undetected)
- working with dynamic page content (Ajax, JS).
These are given in an explicit overview in the highlighted features of the product. The developers also provided an explicit manual, as well as rich video demonstrations (for every level of difficulty)! The multi-thread is implemented by the WebCrawler agent mode, yet parsing only HTML. This mode may be turned on in the “Action” tab at the “Options” to the link templates (see picture below). This seems to complicate the project building and it doesn’t support the dynamic content parsing. WebBrowser mode is unavoidable when a CAPTCHA bypass is needed, thus slowing data extraction in times.
The Visual Web Ripper is a many-features-inclusive tool to simplify your work in the modern web world. It is well-suited for most tricky tasks, as well as works steadily for common projects. It’s easy to master for an inexperienced user. Yet, for extra functionality or some difficult cases the user may need to dig into some special techniques (i.e., XPath, Regex, programming scripts). Custom post-processing and comprehensive API is also a remarkable feature of it. This tool, being the most recent tool to be evaluated, stands on its own and functions dependably.