Categories
Uncategorized

Chromium Command Line switches

When we use Selenium or Node.js + Puppeteer to run [headless] Chrome/Chromium we might need to add some extra functionality/conditions to launch browsers with. Below you’ll find all kinds of Conditions and their explanations. How to use command line switches? The Chromium Team has made a page on which they briefly explain how to use these switches.

Categories
Development Guest posting Web Scraping Software

Octoparse Alternatives

Let me tell you what you already know! Octoparse is a great web scraping tool! But like every great tool, it’s got its limitations. At times, you may wonder if there are any alternatives to Octoparse. We wondered the same and put together this blog to provide you a short list of Octoparse alternatives along […]

Categories
Development

Selenium Web Scraping in simple words

Question: What is Selenium web scraping? Answer: A picture is better than 1000 words: So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc.,  target web pages.

Categories
Review

DataFlowKit review

Recently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com Let’s first highlight some of its outstanding features: Visual online scraper tool: point, click and extract. Javascript rendering; any interactive site scrape by headless Chrome run in the cloud Open-source […]

Categories
Development

Scraping JavaScript protected content

Here we come to one new milestone: the JavaScript-driven or JS-rendered websites scrape. Recently a friend of mine got stumped as he was trying to get content of a website using PHP simplehtmldom library. He was failing to do it and finally found out the site was being saturated with JavaScript code. The anti-scrape JavaScript […]

Categories
Challenge

Is there any way to skip CAPTCHA?

JavaScript powered CAPTCHA Most of the answers to the question in internet forums are given by services that automatically solve captchas. They provide services to solve CAPTCHA rather than to fully skip it.

Categories
Development Guest posting

Captcha solving with Java and why you should avoid it

In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place. For the Python code (+ captcha API) see that post. The post author is Kevin Sahin from ScrapingNinja.co. Captcha solving “Completely Automated Public Turing test to tell Computers and […]

Categories
Development

Is there any way to skip CAPTCHA?

  Is there a way to skip CAPTCHA?

Categories
Uncategorized

How to detect your site is being scraped?

In the age of the modern web there are a lot of data hunters people who want to take the data that is on your website and re-use it. The reasons someone might want to scrape your site are incredibly varied, but regardless it is important for website owners to know if it is happening. […]

Categories
Review SaaS

Scrapinghub review

Scrapinghub is the developer-focused web scraping platform. It provides web scraping tools and services to extract structured information from online sources. The Scrapinghub platform also offers several useful services to collect organized data from the internet. Scrapinghub has four major tools – Scrapy Cloud, Crawlera, and Splash. We’ve decided to try the service. In this […]