webscraping.pro – Page 15

JAVA library to scrape Linkedin & its data affiliates

Post author By admin
Post date June 2, 2020
No Comments on JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info

Tags business directory, JAVA, library, LinkedIn

Guest posting Review

Octoparse 8 vs Octoparse 7 comparison – what’s new in 8.1

Post author By admin
Post date May 29, 2020
No Comments on Octoparse 8 vs Octoparse 7 comparison – what’s new in 8.1

Our brand new version Octoparse 8 (OP 8) just came out a few weeks ago. To help you get a better understanding of what the differences between OP 8 and 7 are, we have included all the updates in this article.

Tags Octoparse, scraping tool, web scraping

Review

Oxylabs.io at a glance

Oxylabs.io is an experienced player in the proxy market. In the past few years, they have significantly expanded their proxy pool.

Right now they have a residential proxy pool with over 60M IPs and over 2M datacenter proxies. Their residential proxies cover every country in the world (!) and offer city-level targeting. Oxylabs datacenter proxies come from 82 locations and feature 7850 subnets.

Oxylabs is mainly focused on businesses and it is reflected in their product subscription plans. But recently they have introduced a Fast-Checkout feature, where customers can purchase residential proxies in a few clicks. Together with a recently added smaller plan ($300/month for 20GB of traffic) Oxylabs becomes much more attractive for smaller customers as well.

Tags proxy, service

Development

Scrape a JS Lazy load page by Python requests

Post author By admin
Post date April 29, 2020
1 Comment on Scrape a JS Lazy load page by Python requests

The JS loading page is usually scraped by Selenium or another browser emulator. Yet, for a certain shopping website we’ve
found a way to perform a pure Python requests scrape.

Tags Python

Review

NetNut.io Review

The most successful enterprises are always the ones which manage to stay a step ahead of their rivals. And to remain ahead, you have to be able to access the industry information faster and more consistently than anybody else. This is especially true for e-commerce and online retail industries, where the pricing contest is extremely fierce. Thus, the smallest developments in information processes can result in large changes in the outcomes.

Tags proxy

Legal Monetize

What is legal: scrape, or scrape & sell, or code a scraper

Post author By admin
Post date April 13, 2020
No Comments on What is legal: scrape, or scrape & sell, or code a scraper

Which of the following is illegal:
(1) Scrape emails from a site and send one email to each address.
(2) Scrape emails from a website and sell them.
(3) Make a scraping script and sell it without using it.
Note: The target website Terms of Use (ToU) state that no one can crawl/scrape it.

Tags legal, web scraping

Uncategorized

Netpeak Software sales and offers

Post author By admin
Post date April 9, 2020
No Comments on Netpeak Software sales and offers

If you haven’t meet Netpeak Spider and Checker yet, let us explain to you why it worth your attention. These tools help SEOs and webmasters with in-depth SEO auditing, website and search engine scraping, comprehensive analysis, data aggregation from top SEO services (Ahrefs, Moz, SimilarWeb, Whois,…), and many more. Netpeak (April 2020 Special Offer)

Tags SEO

Development

Bulk db prepared insert with rollback even if 1 record fails, PHP

Post author By admin
Post date April 8, 2020
No Comments on Bulk db prepared insert with rollback even if 1 record fails, PHP

Recently I needed to make a bulk insert into db with prepared statement query. The task was to do it so that if one record failed one can rollback all records and return an error. That way no data is affected by faulty code and/or wrong data provided.

Tags PHP

Review

ScrapingBee, an API for web scraping

Post author By admin
Post date April 1, 2020
No Comments on ScrapingBee, an API for web scraping

The web is becoming increasingly difficult to scrape. There are more and more websites using single page application frameworks like Vue.js / Angular.js / React.js and you need to use headless browsers to extract data from those websites.

Using headless Chrome on your local computer is easy. But scaling to dozens of Chrome instances in production is a difficult task. There are many problems, you need powerful servers with plenty of RAM, you’ll get into random crashes, zombie processes…

Tags scraping tool, web scraping

Development

Problem scraping javascript site – help needed

Post author By admin
Post date March 31, 2020
No Comments on Problem scraping javascript site – help needed

Problem

I am trying to scrape the page https://tienda.mercadona.es/categories/112 and I have installed the docker and followed all the required steps given in the post. Splash works well, but the spyder does not and I don’t know why. The IP of the splash_url is correct but I can’t see in the response object when I write scrapy shell “webpage” the complete page, ie, the page has not rendered correctly.

Tags Javascript