Categories
Miscellaneous

Simple way HTML change monitoring

html_change_mnitoring_logo1I recently came across this question in the Q&A section of a forum I belong to:

“I want to run once a day a script that will check whether the specific part of code has been changed, and if it did, we would get some return message (ideally directly to my email). What would be the easiest, simplest way to do that? I’ve read about web crawlers, web scrappers, but they seem to be doing far more than we need.”

Sure, if all you want to do is something as lightweight as monitoring a set of target pages for changes, then using a ready monitoring tool is probably way more than you need. You need to keep it simple. So, here’s a quick solution with Google spreadsheet.

Categories
Development

How To Automaticlly Convert files in Google App Script

Google Apps Script file converting

The other day I was challenged to do some cloud converting following the web scraping project with Google App Script (GAS)[1]. Namely to get a google doc file and to convert it into MS Word format.  

Categories
Development

Handling HTTP Cookies in cURL

http cookieMost of developers stuck with the cookie handlng in web scraping. Sure it’s a tricky thing and this once has been my stumbling stone too. So here mainly for new scraing engineers i’d like to share of how to handle cookie in web scraping when using PHP. We’ve already done the post on scrape by cURL in PHP, so here we’ll only focus on a cookie side. The cookie is a small piece of data sent from a website and stored in a user’s web browser while the user is browsing that website. So when browser requests a page and along with web content cookie is returned browser does all the dirty job to store cookie and later send them back to server which rendered that web page in following web requests.

Categories
Challenge Development

Tips & Tricks for Scraping Business Directories

business directoryRecently I received a question in my mail box about scraping data aggregate sites (aka yellow pages) or business directories.
I replied to him directly, but our conversation on business directories was an interesting one that I thought you guys would find useful. 

Here’s the question:

I am interested in scraping the database in such a website www.1881.no. My guess is that I would need a webdriver, like Selenium to do the job. I am very newbie to this field, but I believe if given some pointers, I can get some data out.

Could you please provide me with pointers on how to extract data from this website.
Sandeep

As a generic answer, I’ll provide you with some basics of scraping those business (and private life) directories.

Categories
Development

How To Automaticlly Convert files in Google App Script


The other day I was challenged to do some cloud converting following the web scraping project with Google App Script(GAS)[1]. Namely to get a google doc file and to convert it into MS Word format. 

Categories
Uncategorized

Writing next generation scraping scripts with Web Robots IDE

http cookie
Most scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug.

Categories
Challenge

Q&A with ScrapeHero

In this post we’d like to share an interview with a young service called ScrapeHero. We’ve interviewed Tony Paul (marketing head) and this is what he had to say.

Categories
Web Scraping Software

Scraping software and services landscape

After almost 3 years in running this scraping blog and reviewing dozens of products; in this small post I’d like to categorise the tools/means used for web scraping available to end user. Here are the typical examples of scrapers in those categories.

Categories
Challenge Development

CloudFlare – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called CloudFlare, former ScrapeShield.

CloudFlare

The CloudFlare app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

Categories
Challenge Review

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.