Tag: JAVA

Yelp scraping for high quality B2B leads

Post author By mihaschenko
Post date September 16, 2022
No Comments on Yelp scraping for high quality B2B leads

Recently we’ve performed the Yelp business directory scrape for acquiring high quality B2B leads (company + CEO info). This forced us to apply many techniques like proxying, external company site scrape, email verification and more.

Tags business directory, JAVA, web scraping

Challenge Development

Bypass GoDaddy Firewall thru VPN & browser automation

Post author By admin
Post date July 23, 2022
No Comments on Bypass GoDaddy Firewall thru VPN & browser automation

Recently we encountered a website that worked as usual, yet when composing and running scraping script/agent it has put up blocking measures.

In this post we’ll take a look at how the scraping process went and the measures we performed to overcome that.

Tags anti-scrape, automation, browser-automation, JAVA

Development

Auth proxy with JAVA

In the post we’ll show how to leverage auth ptoxy (with login/pass) for JAVA application.

Tags JAVA

Development

Backconnect Proxy Service with authorization in JAVA

Post author By admin
Post date December 25, 2020
No Comments on Backconnect Proxy Service with authorization in JAVA

Working with a Backconnect proxy service (Oxylab.io) we spent a long time looking for a way to authorize it. Originally we used JSoup to get the web pages’ content. The proxy() method can be used there when setting up the connection, yet it only accepts the host and port, no authentication is possible. One of the options that we found, was the following:

Tags JAVA, proxy, service

Development

JAVA, Selenium, headless Chrome, JSoup to scrape data of the web

Post author By mihaschenko
Post date November 5, 2020
No Comments on JAVA, Selenium, headless Chrome, JSoup to scrape data of the web

In this post we share with you how to perform web scraping of a JS-rendered website. The tools as seen in the header are JAVA with Selenium library driving headless Chrome instances (download driver) and JSoup as parser to fetch data of the acquired HTML.

Tags JAVA, scraper, Selenium

Development

JAVA library to scrape Linkedin & its data affiliates

Post author By admin
Post date June 2, 2020
No Comments on JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info

Tags business directory, JAVA, library, LinkedIn

Development

Simple JAVA email crawler

Post author By mihaschenko
Post date November 2, 2019
No Comments on Simple JAVA email crawler

In this post we share the code of a simple Java email crawler. It crawls emails of a given website, with an infinite crawling depth. A previous post showed us Python simple email crawler.

Tags crawling, JAVA

Development Guest posting

Captcha solving with Java and why you should avoid it

Post author By admin
Post date August 20, 2019
No Comments on Captcha solving with Java and why you should avoid it

In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place.
For the Python code (+ captcha API) see that post.

The post author is Kevin Sahin from ScrapingNinja.co.

Captcha solving

“Completely Automated Public Turing test to tell Computers and Humans Apart” is what captcha stands for. Captchas are used to prevent bots from accessing and performing actions on websites or applications.

The last one is the most used captcha mechanism, Google ReCaptcha v2. That’s why we are going to see how to “break” these captchas.

Tags captcha, JAVA, Recaptcha, scrape detection

Development

Creating REST API with Spring

Post author By admin
Post date September 26, 2018
No Comments on Creating REST API with Spring

Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck.

Tags JAVA, structured APIs