There is a question I’ve wanted to shed some light upon for a long time already: “What if I need to scrape several URL’s based on data in some external database?“.
Category: Web Scraping Software
Sometimes it is necessary to use external data sources to provide parameters for the scraping process. For example, you have a database with a bunch of ASINs and you need to scrape all product information for each one of them. As far as Visual Web Ripper is concerned, an input data source can be used to provide a list of input values to a data extraction project.
8+ Best CAPTCHA Solvers
In this post we want to share about some decaptcha software and services that we have encountered in our web scraping experience.
Let’s look how to use Screen Scraper for scraping Amazon products having a list of asins in external database.
DeCaptcher Review
Recently we came across a CAPTCHA solving service called DeCaptcher. As we proceed with the new Scraper Test Drive stage, we have encountered some off-the-shelf scrapers that use a 3rd party service for CAPTCHA solving. The DeCaptcher service is the most popular among them.
TEST DRIVE: CAPTCHA
The new Scraper Test Drive stage is on, called CAPTCHA. What can the scrapers perform to get through the “robot fighters”? The off-the-shelf scrapers are not designed for CAPTCHA solving by default. Furthermore, some stated that “bypassing Captchas was compatible with Internet good ethics”. I agree with this, but for the full Scraper Test Drive taste, we still want to try out the scrapers.
While we did our CAPTCHA TEST DRIVE we realized that those JavaScript captchas we used for testing (like QapTcha and AJAX FANCY CAPTCHA) can be easily bypassed by sending a couple of simple POST requests.
Continuing our research on the most popular scraping software, I decided to compare Mozenda and Visual Web Ripper on a search trends basis. What is the frequency of searches for these products online over time? I used Google Trends for this purpose, and below I share the current statistics and conclusions.
Debuggex is an online Regex testing tool that allows visualization of Regex match algorithms. The visualization feature is good both for the learners who do some Regex exploration and for the experienced users who might want to track the Regex match forward or back. It is also useful for an instant Regex pattern match by highlighting, thus eliminating the need for pressing any buttons to run Regex patterns. This tool is one of a dozen online Regex testers.
Fiddler web sniffer Review
The Fiddler web sniffer software is one of several well-known HTTP debugging proxies. It works well, and the proxy nature of the sniffer makes it stand out among other web sniffing tools.