Categories
Challenge SaaS

My experience with Zyte AI spiders, part 2

I’ve described my initial experience with Zyte AI spiders leveraging Zype API and Scrapy Cloud Units. You might find it here. Now I’d share more sobering report of what happened with the data aggregator scrape.

Categories
Development

Google Sheets or MS Excel to scrape business directories ?

We’ve already stated some Tips and Tricks of scraping business directories or data aggregators sites. Yet recently someone has asked us to do aggregators’ scraping in the context of Google Sheets and/or MS Excel.

Categories
Challenge Development

Yelp scraping for high quality B2B leads

Recently we’ve performed the Yelp business directory scrape for acquiring high quality B2B leads (company + CEO info). This forced us to apply many techniques like proxying, external company site scrape, email verification and more.

Categories
Development

Scrapy to get dynamic business directory data thru API

In this post I want to share on how one may scrape business directory data, real estate using Scrapy framework.

Categories
Challenge SaaS

Web Scraper IDE to scrape tough websites

Recently we encountered a new powerful scraping service called Web Scraper IDE [of Bright Data]. The life-test and thorough drill-in are coming soon. Yet now we want to highlight its main features that has badly (in positive sense, strongly) impressed us.

Categories
Challenge Development

Business directory simple scraper (python) at pythonanywhere

business directoryMy goal was to retrieve data from a web business directory.

Since the business directories scrape is the most challenging task (beside SERP scrape) there are some basic questions for me to answer:

  1. Is there any scrape protection set at that site?
  2. How much data is in that web business directory?
  3. What kind of queries can I run to find all the directory’s items?
Categories
Development

JAVA library to scrape Linkedin & its data affiliates

In this post we want to share with you a new useful JAVA library that helps to crawl and scrape Linkedin companies. Get business directories scraped!

If you are considering the Linkedin data scrape legal issues, please refer to the following post: Linkedin lost in court to data analytic company that scrapes Linkedin’s public profiles info
Categories
Development

Python LinkedIn downloader

We’ve done the Linkedin scraper that downloades the free study courses. They include text data, exercise files and 720HD videos. The code does not represent the pure Linkedin scraper, a business directory data extractor. Yet, you might grasp the main thoughts and useful techniques for your Linkedin scraper development.

Categories
Uncategorized

How to scrape Yellow Pages with ScreenScraper Chrome Extension

Recently I was asked to help with the job of scraping company information from the Yellow Pages website using the ScreenScraper Chrome Extension. After working with this simple scraper, I decided to create a tutorial on how to use this Google Chrome Extension for scraping pages similar to this one. Hopefully, it will be useful to many of you.

Categories
Development

Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here.

The scrape algorithm you can look at here.