Selenium Web Scraping in simple words

Question: What is Selenium web scraping?

Answer: A picture is better than 1000 words: selenium main diagram

So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc., target web pages.

2 main advantages

The browsing is done by a real web browser (it might be headless, that is without a graphical user interface, GUI). Thus, all in-page JS is executed and a web page has all the DOM items that it should have.
The web masters do not consider that scraping, as the scraping, hence robots, and scraping activity, are very much hidden, not detected.

2 main disadvantages

Many resources (RAM, CPU) are needed to run a well-equipped browser. See a scrape speed comparison table:

Chromium headless instance by Selenium* HTTP requests

Setup time, ms 45000 5

Log-in time, ms 105000 13

1 page load time, ms 6 10

*based on TripAdvisor scrape
Source.
Some advanced business directories’ sites (eg. Amazon, Linkedin, TripAdvisor) set up a Selenium-like browsing detection, thus preventing the web scraping.

Свежие записи

Свежие комментарии

Архивы

Рубрики

	Chromium headless instance by Selenium*	HTTP requests
Setup time, ms	45000	5
Log-in time, ms	105000	13
1 page load time, ms	6	10

2 main advantages

2 main disadvantages

Leave a Reply Cancel reply