Categories
Development SEO and Growth Hacking

Google Scraper Hints

google_scraperBeing the biggest scraper Google itself doesn’t like when somebody scrapes it. This makes life of google scrapers difficult.

In this post I offer you several hints on how to scrape Google in a safe way (if you still decided to do this).

Proxy

The first thing Google scrapers need to have is a proxy source that is reliable. This will allow you to change your IP address. It goes without saying that any proxy that you choose needs to be of the high anonymous variety. You also need to be certain that the proxy is extremely fast and that it has not been guilty of any Google abuse previously.

You can utilize any proxy solution (choose any of them) or view some reviews of proxy services. They are able to deliver quality IPs that have never been utilized for the purpose of accessing Google before.

A person should use anywhere from 50 to 150 proxies for their continued scraping activity. This will depend on what the average result set is for all individual search queries. There will inevitably be some projects that require additional proxies.

If you do not have enough IPs, do not scrape. Do not continue to scrape if Google does detect you.

Make sure you choose the right time to change your IP. This is critical if you are going to scrape successfully. Always change your IP following every keyword switch if you are receiving 300-1,000 results for each keyword. If you are receiving less than 300 results, a single IP can be used to scrape several keywords. However, you may need to add a delay or increase the amount of proxies you are using.

Cookies

Be certain that you clear all of your cookies following every IP change or totally disable them.

Threads

Google scrapers should never utilize threads unless they are needed. Threads are multiple scraping processes that are done at the same time. It is possible for you to scrape millions of results every day without the use of threads.

Keywords

Add &num=100 to the search URL in order to set the maximum amount of search results to 100.

Your main search should have other keywords appended to it. Google makes it difficult to obtain more than 1,000 results for a single topic. However, it is possible to obtain almost all URLs.

Blacklisting

Avoid gray or blacklisting for reliable scraping. Google scrapers should never scrape more than 500 requests during a 24-hour period for each IP address.

Captcha

In the event that you get a captcha or virus warning, you need to stop what you are doing right away. Captcha indicates that they have detected your scraping activities. Increase the amount of proxies. If you are using more than 100, it might be necessary to utilize a different source for your IPs. Use the private proxy source listed above. It is possible to scrape Google constantly without them ever detecting you.

Read more about the ReCaptcha v2 solving services.

5 replies on “Google Scraper Hints”

Good post Michael.
Often when reading your posts I’d like to just vote up/down quickly or rank a previous comment vs this comment system here.
For garnering more feedback from readers – have you looked at something like a http://disqus.com/websites/ or similar as well?
It seems to spur a wider community contribution and I’d like to see more feedback from your nicely constructed reviews.

Cheers,
Scraping.Pro-FanBoy
Andrew

Leave a Reply to Websitescraper Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.