Often, you want to detect changes on some eBay offerings or get notified of the latest items of interest from craigslist in your area. Or, you want to monitor updates on a website (your competitor’s, for example) where no RSS feed is available. How would you do it, by visiting it over and over again? No, now there are handy tools for website change monitoring. We’ve evaluated some tools and would like to recommend the most useful ones that will make your monitoring job easy. Those tools nicely complement the web scraping software, service and plugins.
Dexi.io is a powerful scraping suite (SaaS). This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and running them in clouds for user convenience. Yet it includes the API, each scraper being a JSON definition similar to other services like Import.io and ParseHub.
Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.
NB Pipes robots are available starting from PROFESSIONAL plans.
Scrapinghub is the developer-focused web scraping platform. It provides web scraping tools and services to extract structured information from online sources. The Scrapinghub platform also offers several useful services to collect organized data from the internet. Scrapinghub has four major tools – Scrapy Cloud, Crawlera, and Splash. We’ve decided to try the service. In this post we’ll review its main functionality and also share our experience with Scrapinghub.
The modern web requires you to spend huge amount of processing power to mine it for information. How could a start-up or a small business do comprehensive data crawling without having to build the giant server farms used by major search engines?