Categories
Development

Fast scrape of a simple website using Node.js, Apify & Cheerio scraper

We recently composed a scraper that works to extract data of a static site. By a static site, we mean such a site that does not utilize JS scripting that loads or transforms on-site data. If you are interested in a scrape JS-rendered site, please read the following: Scraping a Javascript-dependent website with puppeteer. Technologies […]

Categories
Uncategorized

Chromium Command Line switches

When we use Selenium or Node.js + Puppeteer to run [headless] Chrome/Chromium we might need to add some extra functionality/conditions to launch browsers with. Below you’ll find all kinds of Conditions and their explanations. How to use command line switches? The Chromium Team has made a page on which they briefly explain how to use these switches.

Categories
Development Guest posting Web Scraping Software

Octoparse Alternatives

Let me tell you what you already know! Octoparse is a great web scraping tool! But like every great tool, it’s got its limitations. At times, you may wonder if there are any alternatives to Octoparse. We wondered the same and put together this blog to provide you a short list of Octoparse alternatives along […]

Categories
Development

Selenium Web Scraping in simple words

Question: What is Selenium web scraping? Answer: A picture is better than 1000 words: So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc.,  target web pages.

Categories
Development

Linkedin scrape guide lines

The LinkedIn crawl success rate is low; one request that a bot makes might require several retries to be successful. So, here we share the crucial Linkedin scraping guide lines. Rate limit Limit the crawling rate for LinkedIn. The acceptable approximate frequency is: 1 request every second, 60 requests per minute. Public pages only LinkedIn […]

Categories
Challenge

Most popular web scraping targets and how to scrape them

Online marketplaces In the marketplaces people offer their products for sale. Similar to garage sales, but online. (eg. eCrater, www.1188.no). Easy to scrape since they are usually free and do not tend to protect their data. Business directories The usually huge online directories targeted at the general audience. (eg. Yellow Pages). They do protect their […]

Categories
Uncategorized

Crawler vs Scraper vs Parser

In the post we share the differences between Crawler, Scraper and Parser.

Categories
Web Scraping Software

The present trends in web scraping tools

  Recently I got a question from one of the blog readers. After I replied to it, I decided to share it with a wider audience. Question: Hi, I found your [web]scraping.pro site and found it very helpful, then realized the web scraper solutions rating was from 2014.  What is the best solution for today?   I have […]

Categories
Development

Web Scraping with Node.js

The web scraping topic has been actively growing in popularity for dozens of years now. Freelance sites are overcrowded with orders connected with this contradictory data extracting process. Today we will combine two new and revolutionary directions in web development. So, let’s consider an elegant and modern way to scrape data from websites with Node.js!