DataFlowKit review

data-flow-kit-logoRecently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com
Let’s first highlight some of its outstanding features:

  1. Visual online scraper tool: point, click and extract.
  2. Javascript rendering; any interactive site scrape by headless Chrome run in the cloud
  3. Open-source back-end
  4. Scrape a website behind a login form
  5. Web page interactions: Input, Click, Wait, Scroll, etc.
  6. Proxy support, incl. Geo-target proxying
  7. Scraper API
  8. Follow the direction of robots.txt
  9. Export results to Google drive, DropBox, MS OneDrive.

Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here.

The scrape algorithm you can look at here.

Using Modern Tools such as Node.js, Puppeteer, Apify for Web Scraping (Xing scrape)

I want to share with you the practical implementation of modern scraping tools for scraping JS-rendered websites (pages loaded dynamically by JavaScript). You can read more about scraping JS rendered content  here.

Headless browser python scraper at pythonanywhere

Recently I decided to work with pythonanywhere.com for running python scripts on JS stuffed websites.

Originally I tried to leverage the dryscrape library, but I failed to do it, and a nice support explained to me: “…unfortunately dryscrape depends on WebKit, and WebKit doesn’t work with our virtualisation system.”