Categories
Development

Scrape a JS Lazy load page by Python requests

The JS loading page is usually scraped by Selenium or another browser emulator. Yet, for a certain shopping website we’ve
found a way to perform a pure Python requests scrape.

Categories
Review

NetNut.io Review

Netnut.io logo

The most successful enterprises are always the ones which manage to stay a step ahead of their rivals. And to remain ahead, you have to be able to access the industry information faster and more consistently than anybody else. This is especially true for e-commerce and online retail industries, where the pricing contest is extremely fierce. Thus, the smallest developments in information processes can result in large changes in the outcomes.

Categories
Legal Monetize

What is legal: scrape, or scrape & sell, or code a scraper

Which of the following is illegal:
(1) Scrape emails from a site and send one email to each address.
(2) Scrape emails from a website and sell them.
(3) Make a scraping script and sell it without using it.
Note: The target website Terms of Use (ToU) state that no one can crawl/scrape it.

Categories
Uncategorized

Netpeak Software sales and offers

If you haven’t meet Netpeak Spider and Checker yet, let us explain to you why it worth your attention. These tools help SEOs and webmasters with in-depth SEO auditing, website and search engine scraping, comprehensive analysis, data aggregation from top SEO services (Ahrefs, Moz, SimilarWeb, Whois,…), and many more.Netpeak (April 2020 Special Offer)

Categories
Development

Bulk db prepared insert with rollback even if 1 record fails, PHP

Recently I needed to make a bulk insert into db with   prepared statement query. The task was to do it so that if one record failed one can rollback all records and return an error. That way no data is affected by faulty code and/or wrong data provided.

Categories
Review

ScrapingBee, an API for web scraping

scrapingbee_smallThe web is becoming increasingly difficult to scrape. There are more and more websites using single page application frameworks like Vue.js / Angular.js / React.js and you need to use headless browsers to extract data from those websites.

Using headless Chrome on your local computer is easy. But scaling to dozens of Chrome instances in production is a difficult task. There are many problems, you need powerful servers with plenty of RAM, you’ll get into random crashes, zombie processes…