Categories
Guest posting SaaS

The Importance of Transparency and Trust in Data and Generative AI

Sharing an informative article by Sarah McKenna (CEO of Sequentum & Forbes Technology Council Member), The Importance Of Transparency And Trust In Data And Generative AI. It includes factors for responsible data collection (aka scraping) and web data usefulness for AI post processing. She touches on security, adherence to regulatory requirements, bias prevention, governance, auditability, vendor evaluation and more. 

getty

In the age of data-driven decision-making, the quality of your outcomes depends on the quality of the underlying data. Companies of all sizes seek to harness the power of data, tailored to their specific needs, to understand the market, pricing, opportunities, etc. In this data-rich environment, using generic or unreliable data not only has the intangible costs that prevent companies from achieving their full potential, it has real tangible costs as well.

Categories
Development SaaS

My experience with Zyte AI spiders, part 1

Recently I was given a bunch of sites to scrape, most of them being simple e-commerce. I decided to try Zyte AI powered spiders. To utilize it, I had to apply for a Zyte API subscription and access to Scrapy Cloud. Zyte AI proved to be a good choice for the fast data extraction & delivery thru spiders that are Scrapy Spiders. Below you can see my experience and the results.
I have done another “experience” post on the Zyte platform usage.

Categories
Challenge SaaS

My experience with Zyte AI spiders, part 2

I’ve described my initial experience with Zyte AI spiders leveraging Zype API and Scrapy Cloud Units. You might find it here. Now I’d share more sobering report of what happened with the data aggregator scrape.

Categories
Challenge SaaS

Web Scraper IDE to scrape tough websites

Recently we encountered a new powerful scraping service called Web Scraper IDE [of Bright Data]. The life-test and thorough drill-in are coming soon. Yet now we want to highlight its main features that has badly (in positive sense, strongly) impressed us.

Categories
SaaS

Web Page Change Tracking

Often, you want to detect changes on some eBay offerings or get notified of the latest items of interest from craigslist in your area. Or, you want to monitor updates on a website (your competitor’s, for example) where no RSS feed is available. How would you do it, by visiting it over and over again? No, now there are handy tools for website change monitoring. We’ve evaluated some tools and would like to recommend the most useful ones that will make your monitoring job easy. Those tools nicely complement the web scraping software, service and plugins.

Categories
Review SaaS

Dexi.io review

Dexi.io is a powerful scraping suite (SaaS). This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and running them in clouds for user convenience. Yet it includes the API, each scraper being a JSON definition similar to other services like Import.io and ParseHub.

Categories
Development SaaS

Dexi Pipes: multi-threaded web scraping of site aggregators

dexi-pipes-logoToday I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.
NB Pipes robots are available starting from PROFESSIONAL plans.

Categories
Review SaaS

Scrapinghub review

Scrapinghub is the developer-focused web scraping platform. It provides web scraping tools and services to extract structured information from online sources. The Scrapinghub platform also offers several useful services to collect organized data from the internet. Scrapinghub has four major tools – Scrapy Cloud, Crawlera, and Splash. We’ve decided to try the service. In this post we’ll review its main functionality and also share our experience with Scrapinghub.

Categories
Miscellaneous SaaS

CloudScrape to transform into Dexi.io

dexi-logo-transparentWe have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name: Dexi.io.

Categories
SaaS

80legs Review – Crawler for rent in the sky

80legs offers a crawling service that allows users to (1) easily compose crawl jobs and (2) cloud run their crawl jobs over the distributed computer network.

The modern web requires you to spend huge amount of processing power to mine it for information. How could a start-up or a small business do comprehensive data crawling without having to build the giant server farms used by major search engines?