Author: admin

Search queries in a search engine for scraping

Post author By admin
Post date July 10, 2015
No Comments on Search queries in a search engine for scraping

Recently I’ve got a note with the question on search engine queries through the web scraping software.

“I’m looking for a scraper program that can initiate search queries in a search engine automatically, using proxies would be an added benefit if possible.” – Mike

Import.Io Magic Method API

Post author By admin
Post date June 26, 2015
No Comments on Import.Io Magic Method API

Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:

The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
It seems to have trouble with some javascript pages.

But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes.

Guest posting Web Scraping Software

Turn any interactive website into an API with ParseHub

Post author By admin
Post date June 22, 2015
No Comments on Turn any interactive website into an API with ParseHub

parsehub Anyone should be able to pull data from the web and access it in the format they want. If a website does not have an API available, scraping is one of the only options to get the data you need. But figuring out how to scrape data in the complicated HTML is a pain.

ParseHub is a new web browser extension that you can use to turn any dynamic and poorly structured website into an API, without writing code. ParseHub is a scraping tool that is designed to work on websites with JavaScript and Ajax; it is similar to web scraping tools such as Import.io and Kimono Labs.

Tags scraping tool, web scraping

Guest posting Web Scraping Software

UiPath – Robotic Process Automation Software

Post author By admin
Post date June 22, 2015
6 Comments on UiPath – Robotic Process Automation Software

UiPath is an Enterprise Robotic Process Automation (RPA) Software designed to empower companies to automate repetitive, manual, rules-based business processes. Any repetitive task a user performs on his computer, including data entry, legacy application integration, data or content migration, screen scraping and testing can be automated with UiPath.

Tags automation, software

Uncategorized

My site is being scraped, how can I prevent being scraped?

Post author By admin
Post date June 2, 2015
No Comments on My site is being scraped, how can I prevent being scraped?

As anyone who has spent any time on the scraping field will know, there are plenty of anti-scraping techniques on the market. And since I regularly get asked what the best way to prevent someone from scraping a site, I thought Id do a post rounding up some of the most popular methods. If you think we’ve missed any out, please let me know in the comments below!

If you are interesting of how to find out if your site is being scraped, then turn to this post: How to detect your site is being scraped?

Tags anti-scrape

Miscellaneous

Simple way HTML change monitoring

Post author By admin
Post date May 7, 2015
No Comments on Simple way HTML change monitoring

html_change_mnitoring_logo1 I recently came across this question in the Q&A section of a forum I belong to:

“I want to run once a day a script that will check whether the specific part of code has been changed, and if it did, we would get some return message (ideally directly to my email). What would be the easiest, simplest way to do that? I’ve read about web crawlers, web scrappers, but they seem to be doing far more than we need.”

Sure, if all you want to do is something as lightweight as monitoring a set of target pages for changes, then using a ready monitoring tool is probably way more than you need. You need to keep it simple. So, here’s a quick solution with Google spreadsheet.

Tags Xpath

Development

How To Automaticlly Convert files in Google App Script

Post author By admin
Post date April 9, 2015
1 Comment on How To Automaticlly Convert files in Google App Script

Google Apps Script file converting

The other day I was challenged to do some cloud converting following the web scraping project with Google App Script (GAS)^[1]. Namely to get a google doc file and to convert it into MS Word format.

Tags Google

Challenge Development

Tips & Tricks for Scraping Business Directories

Post author By admin
Post date April 9, 2015
3 Comments on Tips & Tricks for Scraping Business Directories

business directory Recently I received a question in my mail box about scraping data aggregate sites (aka yellow pages) or business directories.
I replied to him directly, but our conversation on business directories was an interesting one that I thought you guys would find useful.

Here’s the question:

I am interested in scraping the database in such a website www.1881.no. My guess is that I would need a webdriver, like Selenium to do the job. I am very newbie to this field, but I believe if given some pointers, I can get some data out.

Could you please provide me with pointers on how to extract data from this website.
Sandeep

As a generic answer, I’ll provide you with some basics of scraping those business (and private life) directories.

Tags business directory

Development

How To Automaticlly Convert files in Google App Script

Post author By admin
Post date April 9, 2015
No Comments on How To Automaticlly Convert files in Google App Script

The other day I was challenged to do some cloud converting following the web scraping project with Google App Script(GAS)^[1]. Namely to get a google doc file and to convert it into MS Word format.

Tags Google