Sometimes it is necessary to use external data sources to provide parameters for the scraping process. For example, you have a database with a bunch of ASINs and you need to scrape all product information for each one of them. As far as Visual Web Ripper is concerned, an input data source can be used to provide a list of input values to a data extraction project.
Author: admin
XPath in Examples
Here we’ll show how XPath works. Let’s take the following XML as a lab rat.
8+ Best CAPTCHA Solvers
In this post we want to share about some decaptcha software and services that we have encountered in our web scraping experience.
Let’s look how to use Screen Scraper for scraping Amazon products having a list of asins in external database.
Free Website Backup
For simple web scraping jobs I often prefer a php + mysql bundle putting the project right to the web and working online. But as you work online a problem appears: how to backup your work results?
How to Analyze Competitors
There is no doubt that you have to spend time on competitor analysis if you care about your business. Doing this, you may realize many things that you didn’t realize before about the market in which you are laboring, If you are just beginning your business, then you need to analyze your future competitors all the more in order to know how do they do their business and what they are focusing on.
Recently I had to choose a tool for keyword monitoring and after reviewing several online services, I ended up with Colibri. It has a nice design and intuitive interface. If you have matured enough to take your SEO more seriously (and even to pay the price for it), this short review may help you to make a choice.
DeCaptcher Review
Recently we came across a CAPTCHA solving service called DeCaptcher. As we proceed with the new Scraper Test Drive stage, we have encountered some off-the-shelf scrapers that use a 3rd party service for CAPTCHA solving. The DeCaptcher service is the most popular among them.
TEST DRIVE: CAPTCHA
The new Scraper Test Drive stage is on, called CAPTCHA. What can the scrapers perform to get through the “robot fighters”? The off-the-shelf scrapers are not designed for CAPTCHA solving by default. Furthermore, some stated that “bypassing Captchas was compatible with Internet good ethics”. I agree with this, but for the full Scraper Test Drive taste, we still want to try out the scrapers.
While we did our CAPTCHA TEST DRIVE we realized that those JavaScript captchas we used for testing (like QapTcha and AJAX FANCY CAPTCHA) can be easily bypassed by sending a couple of simple POST requests.