Categories
Development

JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info
Categories
Uncategorized

Handy Web Extractor

Handy Web ExtractorHandy Web Extractor is a simple tool for everyday web content monitoring. It will periodically download the web page, extract the necessary content and display it in the window on your desktop. One may consider it as the data extraction software, taking its own nitch in the scraping software and plugins.

It’s totally free and available for download.

Categories
Uncategorized

Scrape with Google App Script

In this post I want to let you how I ve managed to complete the challenge of scraping a site with Google Apps Script (GAS).

Categories
Review

Test ReCaptcha 2.0 solving services

We’ve tested several captcha solving services. The test results are based on 1000 ReCaptchas 2.0 submitted to each service.

 StarsAvg.
solving time,
seconds
Fastest
solving time,
seconds
Performance,
%
Notes
DeathByCaptcha
411696,8Dec. 2019
2Captcha
631595,2Dec. 2019
CaptchaSolutions
1113778Oct. 2017
Useful testing codes

2Captcha Test Code (JAVA)

CaptchaSolutions Test Code (Python)

Categories
Development

Simple JAVA email crawler

In the post we share with you the simple JAVA email crawler that crawls a input host (website) and searches for all the emails at the host and stores them.

The script uses JSoup library and the full project you may find here.

Categories
Uncategorized

Smartproxy Review

Getting precise and localized data is becoming difficult. Advanced proxy networks are the only thing that is keeping some companies running intense data gathering operations.

Categories
Development

Crawling web pages with Netpeak Spider in conjunction with NetNut and GeoSurf proxies

NS-owlAgree, it’s hard to overestimate the importance of information – “Master of information, master of situation”. Nowadays, we have everything to become a “master of situation”. We have all needed tools like spiders and parsers that could scrape various data from websites. Today we will consider scraping the Amazon with a web spider equipped with proxy services.

Categories
Legal

US court stated scraping, even when against TOS, is legal

court_smallLast month a legal case took place in a US court where four professors plus a media organization sued the US Government. The District Court for the District of Columbia conclusion stated that moderate scraping, even when against ToS, is legal.

Categories
Challenge

How to insert and configure reCAPTCHA v2 code in php

We’ve already introduced you to the theory behind the new NO CAPTCHA reCAPTCHA v2, but now we come to the practical integration part. Here we’ll share how to insert and configure “NO CAPTCHA reCAPTCHA” into a web page.

Categories
Development

How to connect Content Grabber with Proxy-connect

Consistent web scraping requires the use of multiple rotating proxies to prevent blocking and throttling by your target website. Let’s take the Content Grabber – a visual scraper with the Proxy-Connect rotating proxy server service for an example scrape.