Categories
Miscellaneous

Scraping with import.io Magic – The Future?

Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data. […]

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.

Introducing import.io Magic

1) Paste the web address you want to scrape data from on thier homepage

importio magic
2) It will appear like this:

Magic scraping results

Why is it different?

This is the first fully automatic data scraper, and the only solution for web scraping that requires ZERO input/training/annotation from the user, or for the website to conform to particular page type, pattern or standard.

How to use it:

  1. Enter URL
    1. Download as CSV (limit of 20 pages)
    2. Create API to the data
  2. Access the data
Screen Shot 2014-12-02 at 10.19.39
magic-data-page

Pro’s

  • Faster and easier than other comparable platforms for a lot of ‘basic’ scrapes.
  • It works from any browser (even mobile) and doesn’t need a plugin.
  • Free

Con’s

  • The scraper only works on pages with more than one row of data like a search results page, and category pages etc
  • It seems to have trouble with some javascript pages.
  • Limit of 20 pages of pagination on CSV download (make and use API to get more)

Conclusion

It could be the future for a big portion of web scraping; There are still a lot of improvements to be made before it can replace the traditional desktop scraper apps, but this new direction/technology has good potential if import.io can improve and leverage the already impressive technology in the future.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.