In the post we’ll show how to leverage auth ptoxy (with login/pass) for JAVA application.
How to handle cookie, user-agent, headers when scraping with JAVA? We’ll use for this a static class
ScrapeHelper that easily handles all of this. The class uses Jsoup library methods to fetch from data from server and parse html into DOM document.
Working with a Backconnect proxy service (Oxylab.io) we spent a long time looking for a way to authorize it. Originally we used JSoup to get the web pages’ content. The proxy() method can be used there when setting up the connection, yet it only accepts the host and port, no authentication is possible. One of the options that we found, was the following:
In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!
In this post we share the code of a simple Java email crawler. It crawls emails of a given website, with an infinite crawling depth. A previous post showed us Python simple email crawler.
In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place.
For the Python code (+ captcha API) see that post.
“Completely Automated Public Turing test to tell Computers and Humans Apart” is what captcha stands for. Captchas are used to prevent bots from accessing and performing actions on websites or applications.
The last one is the most used captcha mechanism, Google ReCaptcha v2. That’s why we are going to see how to “break” these captchas.
Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck.
Web scraping or crawling is the act of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be done manually, but generally this term refers to the automated process of downloading the HTML content of a page, parsing/extracting the data, and saving it into a database for further analysis or use.
LinkedIn API doesn’t allow you to publish into groups if you are not their administrator. That was done in order to eliminate spamming, but if you are a member of several groups of a similar topic and you want to share some interesting information with all of those groups, you have to do it manually group by group and eventually it becomes tedious. In this post I’ll show you a simple way to automate this process in C# using Selenium WebDriver.