Categories
Development Web Scraping Software

My experience of choosing web scraping platform for company critical data feed

Recently we engaged with the online e-commerce startup for the need of gov. tenders/RFP scraping. Since the project size is immense , we have to switch from the hand made scripting extractors to a enterprise grade scraping platform. Below I share my experience of the scraping platforms as a feature table.

 OctoparseDexi.ioMozendaSequentum SaaSImport.io
Able to set up robot/agent3 min3 failures in a row"For some insight, we are working with customers in managed service engagements for large scale, mission critical web integration requirements - so we no longer have a SaaS tool offering. We have a heavy focus in digital commerce and work with customers on use cases in ecomm/retail, travel/hospitality, and tickets/events." - customer service
Support response12 hours. It does excellent job.12 hours12 hours12 hours
Base64 encodingnoUsing a JavaScript step; btoa() is a function that takes a string and encodes it to Base64. yes, one can encode the given value in the Transformation Script of any command
Robot/agent development assistance yes
Categories
Challenge Featured Review SaaS

Data collectors to scrape tough websites

Recently we encountered a new powerful scraping service called Data Collector [of Bright Data]. The life-test and thorough drill-in are coming soon. Yet now we want to highlight it main features that has badly (in positive sense, strongly) impressed us.

Categories
Development Guest posting Web Scraping Software

Octoparse Alternatives

Let me tell you what you already know! Octoparse is a great web scraping tool! But like every great tool, it’s got its limitations. At times, you may wonder if there are any alternatives to Octoparse. We wondered the same and put together this blog to provide you a short list of Octoparse alternatives along with their features and distinguishing factors. Let’s get started!

Categories
Development

Selenium Web Scraping in simple words

Question: What is Selenium web scraping?

Answer: A picture is better than 1000 words:selenium main diagram

So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc.,  target web pages.

Categories
Development

Linkedin scrape guide lines

The LinkedIn crawl success rate is low; one request that a bot makes might require several retries to be successful. So, here we share the crucial Linkedin scraping guide lines.

  1. Rate limit
    Limit the crawling rate for LinkedIn. The acceptable approximate frequency is: 1 request every second, 60 requests per minute.
  2. Public pages only
    LinkedIn allows for bots only public pages; pages that are private cannot be crawled.
Categories
Challenge

Most popular web scraping targets and how to scrape them

  1. Online marketplaces
    In the marketplaces people offer their products for sale. Similar to garage sales, but online. (eg. eCrater, www.1188.no).
    Easy to scrape since they are usually free and do not tend to protect their data.
  2. Business directories
    The usually huge online directories targeted at the general audience. (eg. Yellow Pages). They do protect their data to avoid duplication and loss of audience. See some posts on this.
Categories
Review

DataFlowKit review

data-flow-kit-logoRecently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com
Let’s first highlight some of its outstanding features:

  1. Visual online scraper tool: point, click and extract.
  2. Javascript rendering; any interactive site scrape by headless Chrome run in the cloud
  3. Open-source back-end
  4. Scrape a website behind a login form
  5. Web page interactions: Input, Click, Wait, Scroll, etc.
  6. Proxy support, incl. Geo-target proxying
  7. Scraper API
  8. Follow the direction of robots.txt
  9. Export results to Google drive, DropBox, MS OneDrive.
Categories
Guest posting Review

Octoparse 8 vs Octoparse 7 comparison – what’s new in 8.1

Our brand new version Octoparse 8 (OP 8) just came out a few weeks ago. To help you get a better understanding of what the differences between OP 8 and 7 are, we have included all the updates in this article.

Categories
Legal Monetize

What is legal: scrape, or scrape & sell, or code a scraper

Which of the following is illegal:
(1) Scrape emails from a site and send one email to each address.
(2) Scrape emails from a website and sell them.
(3) Make a scraping script and sell it without using it.
Note: The target website Terms of Use (ToU) state that no one can crawl/scrape it.

Categories
Review

ScrapingBee, an API for web scraping

scrapingbee_smallThe web is becoming increasingly difficult to scrape. There are more and more websites using single page application frameworks like Vue.js / Angular.js / React.js and you need to use headless browsers to extract data from those websites.

Using headless Chrome on your local computer is easy. But scaling to dozens of Chrome instances in production is a difficult task. There are many problems, you need powerful servers with plenty of RAM, you’ll get into random crashes, zombie processes…