Experience
We’ve succesfully tested the Web-Scraper-API of Oxylabs. It did well to get data off the highly protected sites. One eg. is Zoro.com protected with Akamai, DataDome, CloudFlare and ReCaptcha! See the numerical results here.
Code
Below we share a JAVA code using CURL for Web Scraper API.
URL, USERNAME и PASSWORD in the code are to be substituted with real values.
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.BufferedReader;
String command = "curl --user \"USERNAME:PASSWORD\" \"https://realtime.oxylabs.io/v1/queries\" -H \"Content-Type: application/json\" -d \"{ \"source\": \"universal\", \"url\": \"URL\", \"geo_location\": \"United States\"}\"";
Process process = null;
try {
StringBuilder builder = new StringBuilder();
process = Runtime.getRuntime().exec(command);
String str;
try(BufferedReader bufferedReader = process.inputReader()) {
while ((str = bufferedReader.readLine()) != null)
builder.append(str).append("\n");
}
JSONObject jsonObject = new JSONObject(builder.toString());
return Jsoup.parse(jsonObject.optJSONArray("results").optJSONObject(0).optString("content"));
}
finally {
if(process != null)
process.destroy();
}
Extra help
What if one needs to click on buttons (in a browser run time) at the scrape process or perform other actions alike? Does Web Scraper API implement this somehow? It does using Browser Instructions.
Numeric results
Bot protection | Success rate | Avg. response time, sec | Notes | |
---|---|---|---|---|
Zoro.com | • Akamai • DataDome • CloudFlare • ReCaptcha | 90% | 11 | |
Asd-lighting.com | • CloudFlare • ReCaptcha | 94% | 1.4 | |
Bijurdelimon.com | • ReCaptcha | 100% | 3.7 |
Other Web Scraper API useful resources
Windows
Mac OS
Linux
Free
Worth to test
- Scheduler (REPO): oxylabs/Oxylabs-Web-Scraper-API-Scheduler (github.com)
- Open Source project: oxylabs/oxylabs-readme: Oxylabs repository collections’ guide. (github.com)