Today, I got in touch with the Node.js [and Python] bots garden/zoo providing modern bots with different kinds of browsers (Firefox, Chrome, Headless/not headless) using different automation frameworks (Puppeteer, Selenium, Playwright) in several programming languages.
The Bots repo of Antoine Vastel contains to the time the following bots:
- Playwright (NodeJS): Chromium, Webkit (Safari), Firefox
- Playwright extra stealth (Nodejs): Chromium (will be updated when it becomes stable)
- Puppeteer (NodeJS): Chromium, Firefox, Android (emulation), iPhone (emulation)
- Puppeteer extra stealth (NodeJS): Chromium
- Pyppeteer stealth (Python): Chromium
- Selenium (NodeJS): Chromium, Firefox
- Selenium stealth (Python): Chrome
- Undetected Chromedriver (Python): Chrome
- Ferrum (Ruby): Chrome
- Watir (Ruby): Chrome, Safari (MacOS)
- Simple HTTP module/library (NodeJS + Cheerio): Sequential, Parallel, Sequential using Nord VPN, HTTP proxies
- Simple HTTP module/library (Python requests/aiohttp + Beautifulsoup): Sequential, Parallel (x2 implementations)
- Simple HTTP module/library (Golang standard library + goquery): Sequential, Parallel
To be added
- Playwright Firefox/WebKit
- Selenium Firefox, both in NodeJS but also in other programming languages like Python.
- Examples for bot frameworks that provide mechanisms against bot detection solutions.
More for browser masking
The headers directory contains data related to HTTP headers. For the moment, it contains:
- A list of ~16K user-agents;
- Accept headers for the main browsers;
- Accept-Encoding headers for the main browsers;
- Header names for the main browsers;
- Fetch metadata request headers.
Bonus: posts on Bot detection.