Categories
Challenge Development

Oxylabs’ Web Scraper API – Experience & more

Experience

We’ve succesfully tested the Web-Scraper-API of Oxylabs. It did well to get data off the highly protected sites. One eg. is Zoro.com protected with Akamai, DataDome, CloudFlare and ReCaptcha! See the numerical results here.

Categories
Guest posting Review

Octoparse 8 vs Octoparse 7 comparison – what’s new in 8.1

Our brand new version Octoparse 8 (OP 8) just came out a few weeks ago. To help you get a better understanding of what the differences between OP 8 and 7 are, we have included all the updates in this article.

Categories
Review

ScrapingBee, an API for web scraping

scrapingbee_smallThe web is becoming increasingly difficult to scrape. There are more and more websites using single page application frameworks like Vue.js / Angular.js / React.js and you need to use headless browsers to extract data from those websites.

Using headless Chrome on your local computer is easy. But scaling to dozens of Chrome instances in production is a difficult task. There are many problems, you need powerful servers with plenty of RAM, you’ll get into random crashes, zombie processes…

Categories
Web Scraping Software

The present trends in web scraping tools

 

Recently I got a question from one of the blog readers. After I replied to it, I decided to share it with a wider audience.
Question:

Hi,

I found your [web]scraping.pro site and found it very helpful, then realized the web scraper solutions rating was from 2014.  What is the best solution for today?   I have lots of sites I need to scrape, mainly search then drill-down sites.   I would like to be able to schedule the scraping to run on a daily basis.  Is there a direction you could point me?  I’m a seasoned developer by trade but am seeing all these point and click solutions (e.g. import.io) and am wondering if I should stick with Node.JS or .NET or if I should investigate some of these GUI scrapers of today.
Categories
Guest posting Web Scraping Software

A revolutionary web scraping software to boost your business

If you were an Amazon seller, would you want to know the listing price of a product of all competitors? Since you don’t have direct access to the Amazon database, you are out of luck and have to browse and click through every listing in order to construct a table of sellers and prices. A web scraping tool comes in handy. It automatically downloads your desired information such as product name, seller’s name, price, etc. However, web scraping that requires coding skill can be painful for professionals in IT, SEO, marketing, e-commerce, real estate, hospitality, etc.

It seems beyond one’s job description if he/she needs to learn how to code in order to obtain certain useful data from the web. For example, I have a friend who graduated in Mass Communication and works as a content marketer. She wants to scrape some data from the web, so she decided to learn Python herself. It took her two weeks to come up with a page of messy codes. Not only did she waste time on learning Python, but she also lost the time she could have used for doing her real work.

Categories
Miscellaneous

Scraping HTML graphic elements: possibilities and limits

Question: “How do I set up a daily automatic scraping of www.pollen.com data into a Google sheet?” (link)

Answer: Originally I doubted if svg HTML elements are scrapable. After some trial and error experience I realized, that svg elements are indeed scrapable; one can get their xPath, children nodes. Yet, they are scrapable by importXML() when being static html.

Categories
Development

Dexi.io addon functionality (images & geocoding)

dexi-io-geo-maps-logoThe Dexi.io web scraping service has remade its functionality by adding [paid plan] addons. Through addons, more features are made available to customers, e.g. more step types/pipe actions. Those features also allow the integration of scrape results to data stores and endpoints like PostgreSQL, MySQL, Amazon S3 and other. 

Categories
Web Scraping Software

Octoparse 7.0 – a free web scraping tool for non-developers

octoparse has recently launched a brand new version 7.0, which has turned out to be the most revolutionary upgrade in the past two years, with not only a more user-friendly UI, but also some of the advanced features make web scraping even easier. In this post, I will walk through some of the new features/changes made available in this new version, with respect to how a beginner, even one without any coding background, can approach this web scraping tool.

Categories
Miscellaneous Web Scraping Software

Hotel: scrape prices, Q&A

 

Question

I want to extract the hotel name and the current room price of some hotels daily from https://www.expedia.ca/Hotel-Search?#&destination=Quebec,%20Quebec,%20Canada&startDate=06/11/2016&endDate=07/11/2016&regionId=&adults=2

I am a small hotel owner and want those info quite often, and hope I can do it with codes automatically in someway.  You are expert in this field, what is the easiest ways to get those information?  Can you give me some example codes?

Categories
Web Scraping Software

Dexi.io – how to improve performance

Intro

Some may argue that extracting 3 records per minute is not fast enough for an automated scraper (see my last post on Dexi multi-threaded jobs). However, you should realize that Dexi extractor robots behave like a full-blown modern browser and fetch all the resources that crawled pages load (CSS, JS, fonts, etc.).
In terms of performance, an extractor robot might not be as fast as a pure HTTP scraping script, but its advantage is the ability to extract data from dynamic websites which require running JavaScript code in order to generate a user-facing content. It will also be harder for anti-bot mechanisms to detect and block it.