Categories
Development

Amazon scrape tip

Recently we’ve met requirements to scrape Amazon data in big quantities. So, first of all I’ve tested the data aggregator for being bot-proof or anti-bot protection. For that I used the Discord server Scraping Enthusiasts, namely Anti-bot channel.

Since Amazon is a hige data aggregator we recommend readers to get acquainted with the post Tips & Tricks for Scraping Business Directories.

Antibot test results

1. Product page  https://www.amazon.com/Columbia-Terminal-Tackle-Sleeve-Vivid/dp/B0787PGJMV

✅ All good, None of those detected:

Shape Security, DataDome, Distil, Imperva, Incapsula, PerimeterX, Akamai, FingerprintJS,
FingerprintJS Pro, Kasada, WhiteOps, ShieldSquare, ThreatMetrix, F5, Cloudflare, Arkose Labs,
Human Security, Sift, Ocule, Cheq, TrafficGuard, Reblaze, Forter Protection, Meetrics Check,
reCAPTCHA, generic fingerprinting & bot detection

2. Category page,  https://www.amazon.com/stores/page/95998422-7946-40A2-9D38-34944D2351BA

Results:

⚠️ Generic Antibot detected:
https://www.amazon.com/stores/page/95998422-7946-40A2-9D38-34944D2351BA

⚠️ Canvas Fingerprinting detected:
https://m.media-amazon.com/images/I/81PUvRgN2sL.js?AUIClients/FWCIMAssets

Tip

Amazon protects category pages rather than product / detail pages. It might be fair to use JS-render for category pages while regular HTTP scraper for product pages.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.