In this post we share with you how to perform web scraping of a JS-rendered website. The tools as seen in the header are JAVA with Selenium library driving headless Chrome instances (download driver) and JSoup as parser to fetch data of the acquired HTML.
Question: What is Selenium web scraping?
Answer: A picture is better than 1000 words:
So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc., target web pages.
Since Selenium WebDriver is created for browser automation, it can be easily used for scraping data from the web. In this post we will consider some advantages and drawbacks of using WebDriver for web scraping.
Recently I’ve got a question:
How do I get pass the dynamic “load more” button using a Python web scraper?
I have already written several articles on how to use WebDriver for web scraping, but I have never touched on the topic of changing WebDriver’s IP address. Nevertheless, this topic is quite crucial when you come to web scraping, and here I’d like to show you an example of using proxies with WebDriver in Python (and you can easily convert it into your language API).
LinkedIn API doesn’t allow you to publish into groups if you are not their administrator. That was done in order to eliminate spamming, but if you are a member of several groups of a similar topic and you want to share some interesting information with all of those groups, you have to do it manually group by group and eventually it becomes tedious. In this post I’ll show you a simple way to automate this process in C# using Selenium WebDriver.
I have already written several articles on how to use Selenium WebDriver for web scraping and all those examples were for Windows. But what about if you want to run your WebDriver-based scraper somewhere on a headless Linux server? For example on a Virtual Private Server with SSH-only access. Here I will show you how to do it in several simple steps.
Recently I published an article on how to solve captcha in C# using DeathByCaptcha service, and I promised to offer you an example in other languages as well. In this post I’ll offer a Java project that does the same thing.