JAVA library to scrape Linkedin & its data affiliates

Post author By admin
Post date June 2, 2020
No Comments on JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info

Library

The library offers the two following LinkedIn scrape useful classes:

AuthorizationInLinkedin
It works whether thru cookie or email/password. Stored at txtFilexExample/data.properties file.
ScrapeLinkedinCompanyData
Gets data from several sources:
* – linkedin.com
* – crunchbase.com (company email and phone number)
* – bing.com (longitude and latitude)

Some comments

The JAVA code uses Selenium ChromeDriver instance.
The crunchbase.com is a selenium-proof one, so we use a simple scrape process. JSoup library is also in use to parse fetched html.
The classes work with urls/links from text files and return output data into the output stream.
To do: plug in a work with proxies/proxy services.

Download the library code from here.

Tags business directory, JAVA, library, LinkedIn

Leave a Reply Cancel reply