Categories
Guest posting Web Scraping Software

Turn any interactive website into an API with ParseHub

parsehubAnyone should be able to pull data from the web and access it in the format they want. If a website does not have an API available, scraping is one of the only options to get the data you need. But figuring out how to scrape data in the complicated HTML is a pain.

ParseHub is a new web browser extension that you can use to turn any dynamic and poorly structured website into an API, without writing code. ParseHub is a scraping tool that is designed to work on websites with JavaScript and Ajax; it is similar to web scraping tools such as Import.io and Kimono Labs.

The ParseHub tool will identify relationships between elements, extract all of the data and provide it in a spreadsheet or easily accessible API for you. Both scrapers and data are cloud hosted. All you have to do is to download the ParseHub browser extension and start extracting the data you want. Watch this 60 second video of ParseHub.

The challenge of dynamic websites

For static web pages, building your own scraper isn’t that hard. However, It can take weeks to get data that loads dynamically with JavaScript on an interactive website. The websites with the navbar that makes you cringe, the nested dropdowns, the data stuck inside a map, or the government site that’s partying like it’s 1999.

“There are dozens of tools that can extract data from simple, well-structured sites. But we couldn’t find anyone that could tackle the edge cases. So we set out to build the most powerful and flexible tool, that can handle any website thrown at it.” – Serge Toarca, CEO, ParseHub

For developers, ParseHub gives you full control over how you select, structure and modify elements, so you don’t have to hunt through your browser’s web inspector. You can use ParseHub to log into websites, automatically fill out forms, loop through search queries, click on interactive maps, handle infinite scrolling, drop downs and popups. Regex is built in as the handy tool to parse specific text from a webpage. Schedule to update and retrieve your data every minute if you wish, and instantly see the results of your scrape as you build your project.

Check out a few ways you can get data from dynamic websites with ParseHub below. Follow along with this tutorial to build a simple movie listing website using the ParseHub API. ParseHub has packages available in Node.js, PHP and Python built by an awesome community.

Example 1: Easily choose dates and conditions on a search page

search_page_selection

Use ParseHub to search for rentals in 5 different cities. With the ParseHub extension you can enter the check in, check out dates and select the number of guests from a dropdown.

  1. Go to airbnb.com. Click “create new project” in the ParseHub browser extension.
  2. Use the loop tool and enter for each “city” in “locations” in the text boxes.
  3. Click on the settings tab and under the “Starting value” enter your list of cities like so – {“locations”:[“Toronto”,”London”,”San Francisco”, “New York”,” Austin”]}
  4. Select the location box and use the input tool. In in the input value enter “city” and select “expression” from input type.
  5. Select the check out box and use the input tool to enter a date. Now, select the check in box and use the input tool again to enter another date.
  6. Select the guests dropdown. Use the click tool to open the dropdown. Select “1 Guest” and use the click tool again.
  7. Select the “Search” button and use the navigate tool to tell ParseHub to click on the button and show all of the results for your search. Create a new template called “results” and select whatever results you want on the next page.
  8. Click “Get Data” and run once or create a schedule to run multiple times.
  9. Download your data in CSV or JSON or use your API key to interact with the project.

Here is a video that will show you how to get data behind a search box and handle dropdowns.

Example 2: Scrape data from restaurants and pop-ups on a map

pop-ups_on_map

Use ParseHub to select points on the map and get data that appears after you click on each location.

  1. Go to Bing.com Maps and search for restaurants in your area. Click “create new project” in the ParseHub browser extension.
  2. Select one of the blue points on the map, hold SHIFT and select another point on the map. All of the points are now selected. Use the list tool to put all of restaurant data on separate rows or JSON objects.
  3. Use the click tool. Notice how a pop-up appears. Now use the wait tool and set it for 1.5 seconds to wait for the pop-up to load information.
  4. Use the select tool to click on the restaurant name. Use the extract tool to add the text into your results.
  5. Use the select tool again to click on the contact info for each restaurant. Use the extract tool to add the text into your results.
  6. Click “Get Data” and run once or create a schedule to run multiple times.
  7. Download your data in CSV or JSON or use your API key to interact with the project.

Watch the following video to learn how to get data from maps.

https://youtu.be/cA5LSp_cUCc

Have a website in mind that is super laggy, complicated and frustrating to get data from? Send it to the ParseHub team and see if they can do it!

Free plan limit

ParseHub is free to use on five websites/projects at one time. You can use ParseHub to get data from and host 5 websites a month for free. To extract data from more than 5 websites you have to upgrade to one of our other plans. However, you can erase and do more projects when you are done with a website – you can only have 5 websites running at one time. Here you might get acquainted with the paid plans ParseHub offers.

Enterprise solutions

If you want to host hundreds of scrapers with ParseHub, they specialize in enterprise solutions and can get you on a custom plan depending on how often and how fast you need to crawl target websites.

Conclusion

ParseHub is best for complex, JavaScript heavy websites that building a custom scraper for is a pain. Check out this great tutorial on the pros and cons of ParseHub by Damian Cannon and see exactly how ParseHub can handle most websites through these videos. The ParseHub team spent a long time perfecting a powerful tool that can get data from most websites. Now, with the addition of a beautiful UI, anyone can get the world of data at their fingertips.

This is a Guest Post by Angelina Fomina, ParseHub founder.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.