Categories
Development

 reCaptcha solving by 2Captcha service in Puppeteer & Selenium

The 2Captcha service has developed practical guides for solving reCaptcha in Puppeteer and Selenium using grid method. See the repos below:

  1. https://github.com/2captcha/puppeteer-recaptcha-solver-using-clicks
  2. https://github.com/2captcha/selenium-recaptcha-solver-using-grid
Categories
Development SaaS

My experience with Zyte AI spiders, part 1

Recently I was given a bunch of sites to scrape, most of them being simple e-commerce. I decided to try Zyte AI powered spiders. To utilize it, I had to apply for a Zyte API subscription and access to Scrapy Cloud. Zyte AI proved to be a good choice for the fast data extraction & delivery thru spiders that are Scrapy Spiders. Below you can see my experience and the results.
I have done another “experience” post on the Zyte platform usage.

Categories
Challenge SaaS

My experience with Zyte AI spiders, part 2

I’ve described my initial experience with Zyte AI spiders leveraging Zype API and Scrapy Cloud Units. You might find it here. Now I’d share more sobering report of what happened with the data aggregator scrape.

Categories
Development

Crawling web pages with Netpeak Spider in conjunction with MarsProxies, NetNut and IPRoyal proxies

NS-owl

Agree, it’s hard to overestimate the importance of information – “Master of information, master of situation”. Nowadays, we have everything to become a “master of situation”. We have all needed tools like spiders and parsers that could scrape various data from websites. Today we will consider scraping the Amazon with a web spider equipped with proxy services.

Categories
Development

Choosing the Best Proxies for Web Scraping

From eCommerce and market research to competitive analysis and more, web scraping has become an integral part of data collection. And for some, it’s the secret sauce for success.

But with great scraping power comes great responsibility. 

Web scraping can result in IP bans and other harsh restrictions. To avoid these issues, many turn to proxies, which act as intermediaries between your requests and the target website. In this article, we’ll explore the top 3 proxy types for web scraping and focus on the key benefits of each proxy. Let’s go!

Categories
Guest posting

Bright Data’s Business Capabilities

Bright Data offers its customers a full suite of real-time data collection tools that help them gain and maintain a competitive market edge. BrightData  prides itself on its ethical and 100% legally compliant approach.

Categories
Review

A Comprehensive Examination of NetNut Proxy Services

When traversing the digital landscape for high-quality proxy services, the name NetNut often emerges as a frontrunner. This review aims to delve into the facets of NetNut’s offerings, emphasizing the distinctive aspects that set it apart in an intensely competitive environment. 

An Introduction to NetNut

In the world of proxies, NetNut.io has built a reputation for speed, reliability, and robustness. Leveraging its partnership with a major US data carrier, NetNut offers a residential proxy network that is both unique and high-performing. The company’s network is expansive, with a coverage that spans millions of IPs globally, presenting an impressive degree of scalability.

Categories
Development Web Scraping Software

My experience of choosing web scraping platform for company critical data feed

Recently we engaged with the online e-commerce startup for the need of gov. tenders/RFP scraping. Since the project size is immense , we have to switch from the hand made scripting extractors to a enterprise grade scraping platform. Below I share my experience of the scraping platforms as a feature table.

OctoparseDexi.ioMozendaSequentum SaaSImport.io
Able to set up robot/agent3 min3 failures in a row"For some insight, we are working with customers in managed service engagements for large scale, mission critical web integration requirements - so we no longer have a SaaS tool offering. We have a heavy focus in digital commerce and work with customers on use cases in ecomm/retail, travel/hospitality, and tickets/events." - customer service
Support response12 hours. It does excellent job.12 hours12 hours12 hours
Base64 encodingnoUsing a JavaScript step; btoa() is a function that takes a string and encodes it to Base64. yes, one can encode the given value in the Transformation Script of any command
Robot/agent development assistance yes
Categories
Development

Backconnect Proxy Service with authorization in JAVA

Working with a Backconnect proxy service (Oxylab.io) we spent a long time looking for a way to authorize it. Originally we used JSoup to get the web pages’ content. The proxy() method can be used there when setting up the connection, yet it only accepts the host and port, no authentication is possible. One of the options that we found, was the following:

 

Categories
Review

DataFlowKit review

data-flow-kit-logoRecently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com
Let’s first highlight some of its outstanding features:

  1. Visual online scraper tool: point, click and extract.
  2. Javascript rendering; any interactive site scrape by headless Chrome run in the cloud
  3. Open-source back-end
  4. Scrape a website behind a login form
  5. Web page interactions: Input, Click, Wait, Scroll, etc.
  6. Proxy support, incl. Geo-target proxying
  7. Scraper API
  8. Follow the direction of robots.txt
  9. Export results to Google drive, DropBox, MS OneDrive.