Categories
Challenge

Q&A with ScrapeHero

In this post we’d like to share an interview with a young service called ScrapeHero. We’ve interviewed Tony Paul (marketing head) and this is what he had to say.

Categories
Web Scraping Software

Scraping software and services landscape

After almost 3 years in running this scraping blog and reviewing dozens of products; in this small post I’d like to categorise the tools/means used for web scraping available to end user. Here are the typical examples of scrapers in those categories.

Categories
Challenge Development

CloudFlare – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called CloudFlare, former ScrapeShield.

CloudFlare

The CloudFlare app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

Categories
Challenge Review

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.

Categories
Development

A Simple Code that Extracts a Hotel List from Booking.com

In this post I will show you how easy it is to write a Python code that extracts hotel list from booking.com. The simplicity of this code is achieved with the help of Selenium Web Driver which acts as the main data extraction means here.

Categories
Review

OutWit Hub Review

OutWit Hub is a software providing simple data extraction without requiring any programming skills or advanced technical knowledge. What impressed me about Outwit Hub is its general approach to data gathering: harvest everything (links, text, images, etc.) and, then, let the user choose what is needed (sift by scrapers). The program is apt to browse over links on pages, so this feature works well if multiple chains web scraping is required. UPDATE: OutWit Hub 4.0 is released!

Categories
Miscellaneous

Scraping with import.io Magic – The Future?

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.

Categories
Development

Web Scraping with Python + Scrapy (blog series)

This is part 1 of a series dedicated to getting novices started using a simple web scraping framework using python.

Categories
Uncategorized

import.io’s New Scraping Process and Features


Web scraping Data platform import.io, announced last week that they have secured $3M in funding from investors that include the founders of Yahoo! and MySQL.

They also released a new beta version of the tool that is essentially a better version of their extraction tool, with some new features and a much cleaner and faster user experience.

Categories
Web Scraping Software

Using ProxyMesh with Visual Web Ripper

ProxyMesh is another rotating anonymous proxy server service that lets users stay anonymous with the help of a network of continuously rotated IP proxy servers. This service requires no software to be downloaded but it can be easily used in conjunction with Visual Web Ripper software.