Categories
Development

A Simple Code that Extracts a Hotel List from Booking.com

In this post I will show you how easy it is to write a Python code that extracts hotel list from booking.com. The simplicity of this code is achieved with the help of Selenium Web Driver which acts as the main data extraction means here.


Let’s say we need to extract names of hotels in Berlin. What we need to do here is mainly to fill out the following search form on booking.com and click the Search button:

This time I will start with a complete code snippet, and then I will explain what each part means.

Here is the code that fills the form, clicks the Search button, extracts hotel names and prints the result on the screen:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Firefox()
driver.get('http://booking.com')
driver.find_element_by_css_selector("#destination").send_keys("Berlin")
WebDriverWait(driver, 1, poll_frequency=0.1).\
    until(lambda drv: len(drv.find_elements_by_css_selector("ul.ui-autocomplete li")) > 0)
driver.find_element_by_css_selector("ul.ui-autocomplete li").click()
driver.find_element_by_css_selector("#availcheck").click()
driver.find_element_by_css_selector("#searchbox_btn").submit()
for link in driver.find_elements_by_css_selector("a.hotel_name_link"):
    print(link.text)

1. Opening the home page

First we need to create a WebDriver and go to the home page of booking.com. Let’s use Firefox WebDriver as it’s already available in the standard selenium library:

driver = webdriver.Firefox()
driver.get('http://booking.com')

2. Selecting the destination

Then we need to type “Berlin” in the destination box and select the first item from the drop down list:
booking.com destination dropdown
Here is the code that does it:

driver.find_element_by_css_selector("#destination").send_keys("Berlin")
WebDriverWait(driver, 1, poll_frequency=0.1).\
    until(lambda drv: len(drv.find_elements_by_css_selector("ul.ui-autocomplete li")) > 0)
driver.find_element_by_css_selector("ul.ui-autocomplete li").click()

The first line causes the WebDriver to type “Berlin” into the text box with id=”destination”.

The second line is a bit more complicated. It forces the WebDriver to wait until the auto completion list appears on the page. This is necessary because when you type the name of the city, booking.com makes an AJAX request to the server to get a list of destinations fitting your search request. The WebDriverWait function periodically (each 0.1 sec in our case) checks a certain condition (defined in the lambda function)  and returns when this condition becomes true or when the timeout has expired (1 sec in our case).

The third line simply clicks on the first item in the list of proposed destinations.

3. Checking the checkbox

The next line checks the “I don’t have specific dates yet” checkbox to get a list of hotels independent of any dates:

driver.find_element_by_css_selector("#availcheck").click()

4. Clicking the Search button

Finally we need to click the Search button to start the search:

driver.find_element_by_css_selector("#searchbox_btn").submit()

5. Extracting the hotel names

booking.com search results

After we have submitted the search request, the WebDriver will wait until the page reloads and then we can extract all the hotel names listed in the search results and print them on the screen:

for link in driver.find_elements_by_css_selector("a.hotel_name_link"):
    print(link.text)

Note though that for the purpose of simplicity I didn’t implement moving to the next page of the search results here. You can easily do it by yourself by forcing the WebDriver to click the “Next page” link at the bottom of the page.

How to find web page elements

Probably you have already noticed that to get web page elements to work with, I use the find_element_by_css_selector and find_elements_by_css_selector functions. They receive a CSS selector as a parameter and return a list of elements, a single element or they will throw an exception if nothing is found (it is thrown in case of find_element_by_css_selector only).

You can easily determine the element identifier by looking at web browser’s developer tools:
FireFox Inspector
In Chrome and Firefox you can open this tool set by pressing Ctrl+Shift+I and in Internet Explorer you can get it by hitting F12.

There you are. If you have any questions or suggestions feel free to comment below!

I would also like to offer you a video showing what I was talking about:

5 replies on “A Simple Code that Extracts a Hotel List from Booking.com”

Leave a Reply to Amy Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.