Categories
Challenge Development

Oxylabs’ Web Scraper API – Experience & more

Experience

We’ve succesfully tested the Web-Scraper-API of Oxylabs. It did well to get data off the highly protected sites. One eg. is Zoro.com protected with Akamai, DataDome, CloudFlare and ReCaptcha! See the numerical results here.

Code

Below we share a JAVA code using CURL for Web Scraper API.
URL, USERNAME и PASSWORD in the code are to be substituted with real values.

import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.BufferedReader;

String command = "curl --user \"USERNAME:PASSWORD\" \"https://realtime.oxylabs.io/v1/queries\" -H \"Content-Type: application/json\" -d \"{ \"source\":  \"universal\", \"url\": \"URL\", \"geo_location\": \"United States\"}\"";
Process process = null;

try {
    StringBuilder builder = new StringBuilder();
    process = Runtime.getRuntime().exec(command);
    String str;
    try(BufferedReader bufferedReader = process.inputReader()) {
        while ((str = bufferedReader.readLine()) != null)
            builder.append(str).append("\n");
    }
    JSONObject jsonObject = new JSONObject(builder.toString());
    return Jsoup.parse(jsonObject.optJSONArray("results").optJSONObject(0).optString("content"));
}
finally {
    if(process != null)
        process.destroy();
}

Extra help

What if one needs to click on buttons (in a browser run time) at the scrape process or perform other actions alike? Does Web Scraper API implement this somehow? It does using Browser Instructions.

Numeric results

Bot protectionSuccess rateAvg. response time, secNotes
Zoro.com• Akamai
• DataDome
• CloudFlare
• ReCaptcha
90%11
Asd-lighting.com• CloudFlare
• ReCaptcha
94%1.4
Bijurdelimon.com• ReCaptcha100%3.7
One can mention that these sites were scraped by Oxylabs' Web Scraper API in the JS not-rendered mode!

Other Web Scraper API useful resources

Python Virtual Env
Windows Mac OS Linux
Free Worth to test
  1. Scheduler (REPO): oxylabs/Oxylabs-Web-Scraper-API-Scheduler (github.com)
  2.  Open Source project: oxylabs/oxylabs-readme: Oxylabs repository collections’ guide. (github.com)

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.