Categories
Web Scraping Software

A tool to extract phone numbers from a list of URLs

Today I got a question from one of my readers asking if there is a good out-of-the-box solution for crawling multiple websites for contact information. 

Categories
Development

Solve ReCaptcha with Selenium (python)

breaked by seleniumI’ve already written about how the new No CAPTCHA ReCaptcha works, and even had some success breaking it with an iMacros’ browser automation. But, the latest scraping tools are – for most part – driven by Python, so now I want to try the same experiment with Selenium + Python.

Categories
Development SEO and Growth Hacking

ReCaptcha to be solved with iMacros

breaked by imacroRecently I’v been getting requests for a tutorial showing how to solve Google’s No CAPTCHA ReCaptcha. I’ve introduced it before and promised to work out a script to automate solving it. And here’s what I’ve come up with.

Categories
SEO and Growth Hacking

Automate your social marketing by bulk tweeting all your blog posts

twitter_auto_logoA good social presence is important for any successful blogger. But running a full time blog and keeping up your tweet volume is incredibly time consuming. It would be so much more convenient if you could set up bulk tweets for all your posts. Recently as I was doing some reCaptcha automation, I came up with an idea to use the iMacros browser plugin to automate just such a task. Here’s how I did it…

Categories
Development Web Scraping Software

Content Grabber with free proxy account integration for business directories scrape

Professional data extraction requires adequate proxying to keep anonymity of scraping robots. When attempting to extract large data sets (over 1M records, ex. business directories) reliable and fast proxy service is needed.

Sequentum has released the Nohodo proxy service integration for Content Grabber. Nohodo provides a free account for Content Grabber users (up to 5000 requests monthly for free). The feature is available for both trial users and regular customers. Here’s how it works…

Categories
Featured Web Scraping Software

Dexi.io Review

dexi-medium-height-130pxDexi.io is a powerful scraping suite. This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and runnig them in clouds for user convenience. Yet it includes the API, each scraper being a json definition similar to other services like import.io, kimono lab and parseHub.

Categories
Challenge

How to insert and configure reCAPTCHA v2 code in php

We’ve already introduced you to the theory behind the new NO CAPTCHA reCAPTCHA v2, but now we come to the practical integration part. Here we’ll share how to insert and configure “NO CAPTCHA reCAPTCHA” into a web page.

Categories
Challenge

No CAPTCHA reCaptcha challenge

Sooner or later a new generation of spam protection methods will emerge to block all unwanted site visitors. The recently launched Google “No CAPTCHA reCaptcha” or ReCaptcha v2.0 could just be such a method.

This new behaviour analysis tool is getting more and more attention both from the site owners and from scraping engines who are trying to break it. Since Google does not reveal any secrets of its operation, we want to share with you the techniques used in this new smart analysis CAPTCHA that determines between bot and human. Let s look inside.

Categories
Uncategorized

Search queries in a search engine for scraping

Recently I’ve got a note with the question on search engine queries through the web scraping software.

“I’m looking for a scraper program that can initiate search queries in a search engine automatically, using proxies would be an added benefit if possible.”  – Mike
Categories
Development Miscellaneous

Import.io: Connector-GUIDs, User-GUIDs, API keys and how to get them?

Suppose I run a query to import.io API:

$url = "https://query.import.io/store/connector/" . $connectorGuid . "/_query?_user=" . urlencode($userGuid) . "&_apikey=" . urlencode($apiKey);

“HI there can you please tell me that what are connector-guid, user-guid and api key in below given code and how to get them for any website?”

I came across this question on StackOverflow, and as an avid import.io user I thought I’d answer it here as well, in case any of you have the same issue.