Categories
Development

JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info

Library

The library offers the two following LinkedIn scrape useful classes:

  • AuthorizationInLinkedin
    It works whether thru cookie or email/password. Stored at txtFilexExample/data.properties file.
  • ScrapeLinkedinCompanyData 
    Gets data from several sources:
    * – linkedin.com
    * – crunchbase.com (company email and phone number)
    * – bing.com (longitude and latitude)

Some comments

  • The JAVA code uses Selenium ChromeDriver instance.
  • The crunchbase.com is a selenium-proof one, so we use a simple scrape process. JSoup library is also in use to parse fetched html.
  • The classes work with urls/links from text files and return output data into the output stream.
  • To do: plug in a work with proxies/proxy services.

Download the library code from here.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.