Categories
Development

Extract browser’s Local Storage with Python

Some of you may be wondering if it’s possible to extract a web browser’s local storage by web scraping?

Categories
Development

Solve ReCaptcha with Selenium (python)

breaked by seleniumI’ve already written about how the new No CAPTCHA ReCaptcha works, and even had some success breaking it with an iMacros’ browser automation. But, the latest scraping tools are – for most part – driven by Python, so now I want to try the same experiment with Selenium + Python.

Categories
Development SEO and Growth Hacking

ReCaptcha to be solved with iMacros

breaked by imacroRecently I’v been getting requests for a tutorial showing how to solve Google’s No CAPTCHA ReCaptcha. I’ve introduced it before and promised to work out a script to automate solving it. And here’s what I’ve come up with.

Categories
Development Web Scraping Software

Content Grabber with free proxy account integration for business directories scrape

Professional data extraction requires adequate proxying to keep anonymity of scraping robots. When attempting to extract large data sets (over 1M records, ex. business directories) reliable and fast proxy service is needed.

Sequentum has released the Nohodo proxy service integration for Content Grabber. Nohodo provides a free account for Content Grabber users (up to 5000 requests monthly for free). The feature is available for both trial users and regular customers. Here’s how it works…

Categories
Development Miscellaneous

Import.io: Connector-GUIDs, User-GUIDs, API keys and how to get them?

Suppose I run a query to import.io API:

$url = "https://query.import.io/store/connector/" . $connectorGuid . "/_query?_user=" . urlencode($userGuid) . "&_apikey=" . urlencode($apiKey);

“HI there can you please tell me that what are connector-guid, user-guid and api key in below given code and how to get them for any website?”

I came across this question on StackOverflow, and as an avid import.io user I thought I’d answer it here as well, in case any of you have the same issue. 

Categories
Development Miscellaneous Web Scraping Software

Import.Io Magic Method API

Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:

  • The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
  • It seems to have trouble with some javascript pages.

But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes.

Categories
Development

How To Automaticlly Convert files in Google App Script

Google Apps Script file converting

The other day I was challenged to do some cloud converting following the web scraping project with Google App Script (GAS)[1]. Namely to get a google doc file and to convert it into MS Word format.  

Categories
Development

Handling HTTP Cookies in cURL

http cookieMost of developers stuck with the cookie handlng in web scraping. Sure it’s a tricky thing and this once has been my stumbling stone too. So here mainly for new scraing engineers i’d like to share of how to handle cookie in web scraping when using PHP. We’ve already done the post on scrape by cURL in PHP, so here we’ll only focus on a cookie side. The cookie is a small piece of data sent from a website and stored in a user’s web browser while the user is browsing that website. So when browser requests a page and along with web content cookie is returned browser does all the dirty job to store cookie and later send them back to server which rendered that web page in following web requests.

Categories
Challenge Development

Tips & Tricks for Scraping Business Directories

business directoryRecently I received a question in my mail box about scraping data aggregate sites (aka yellow pages) or business directories.
I replied to him directly, but our conversation on business directories was an interesting one that I thought you guys would find useful. 

Here’s the question:

I am interested in scraping the database in such a website www.1881.no. My guess is that I would need a webdriver, like Selenium to do the job. I am very newbie to this field, but I believe if given some pointers, I can get some data out.

Could you please provide me with pointers on how to extract data from this website.
Sandeep

As a generic answer, I’ll provide you with some basics of scraping those business (and private life) directories.

Categories
Development

How To Automaticlly Convert files in Google App Script


The other day I was challenged to do some cloud converting following the web scraping project with Google App Script(GAS)[1]. Namely to get a google doc file and to convert it into MS Word format.