webscraping.pro – Page 30

Tips & Tricks for Scraping Business Directories

Post author By admin
Post date April 9, 2015
3 Comments on Tips & Tricks for Scraping Business Directories

business directory Recently I received a question in my mail box about scraping data aggregate sites (aka yellow pages) or business directories.
I replied to him directly, but our conversation on business directories was an interesting one that I thought you guys would find useful.

Here’s the question:

I am interested in scraping the database in such a website www.1881.no. My guess is that I would need a webdriver, like Selenium to do the job. I am very newbie to this field, but I believe if given some pointers, I can get some data out.

Could you please provide me with pointers on how to extract data from this website.
Sandeep

As a generic answer, I’ll provide you with some basics of scraping those business (and private life) directories.

Tags business directory

Development

How To Automaticlly Convert files in Google App Script

Post author By admin
Post date April 9, 2015
No Comments on How To Automaticlly Convert files in Google App Script

The other day I was challenged to do some cloud converting following the web scraping project with Google App Script(GAS)^[1]. Namely to get a google doc file and to convert it into MS Word format.

Tags Google

Uncategorized

Writing next generation scraping scripts with Web Robots IDE

Post author By admin
Post date March 25, 2015
No Comments on Writing next generation scraping scripts with Web Robots IDE

Most scraping solutions fall into two categories: Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug.

Tags scraping tool, service

Challenge

Q&A with ScrapeHero

In this post we’d like to share an interview with a young service called ScrapeHero. We’ve interviewed Tony Paul (marketing head) and this is what he had to say.

Tags web scraping

Web Scraping Software

Scraping software and services landscape

Post author By admin
Post date February 19, 2015
No Comments on Scraping software and services landscape

After almost 3 years in running this scraping blog and reviewing dozens of products; in this small post I’d like to categorise the tools/means used for web scraping available to end user. Here are the typical examples of scrapers in those categories.

Tags scraping tool, web scraping

Challenge Development

CloudFlare – a limited feature anti-content-duplicate tool

Post author By admin
Post date February 16, 2015
No Comments on CloudFlare – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called CloudFlare, former ScrapeShield.

CloudFlare

The CloudFlare app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

Tags anti-scrape, CloudFlare

Challenge Review

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.

Tags anti-scrape

Development

A Simple Code that Extracts a Hotel List from Booking.com

Post author By admin
Post date February 5, 2015
5 Comments on A Simple Code that Extracts a Hotel List from Booking.com

In this post I will show you how easy it is to write a Python code that extracts hotel list from booking.com. The simplicity of this code is achieved with the help of Selenium Web Driver which acts as the main data extraction means here.

Tags Python, Selenium

Review

OutWit Hub Review

OutWit Hub is a software providing simple data extraction without requiring any programming skills or advanced technical knowledge. What impressed me about Outwit Hub is its general approach to data gathering: harvest everything (links, text, images, etc.) and, then, let the user choose what is needed (sift by scrapers). The program is apt to browse over links on pages, so this feature works well if multiple chains web scraping is required. UPDATE: OutWit Hub 4.0 is released!

Tags scraper, software