Categories
Web Scraping Software

Meet new OutWit Hub 4.0!

Like every year, OutWit just released a major upgrade of its extraction program OutWit Hub. This version brings a number of interesting new features and some of them I’m going to cover in this post.

[box style=”blue rounded info”]We remind you that OutWit offers a good price for our readers![/box]

Loading Time

The first thing worth noting when you open version 4.0 is that the loading time has been reduced substantially: when opening the application, you hardly notice the reading of your automator library anymore, however large it is. This is good news if you have hundreds of scrapers and macros in your profile.

Contacts

The interface layout itself hasn’t changed much, except for one thing: in the list of views on the left side panel, ’emails’ was replaced by ‘contacts’. If you browse to the contact page of a Website, and click on it, you see why. By default, the program still extracts the emails of the pages you visit, but if you wish, it will also try to associate to each one all the contact information it can recognize (phone/fax number, physical address, name, URL…, even twitter, facebook or skype IDs, if available).

It offers several navigation functions to grab the contacts, from the fast exploration of a site to the systematic browsing of lists of links. Of course, like any automatic recognition system, it is not 100% but the results are pretty convincing (at least for the US, UK, Canada, Australia and Western Europe). If you need to be sure all the fields are grabbed the way you want, you can always create a custom scraper.

Link Highlighting

Another visible change is in the browser: series of links are highlighted as you hover over them in a Web page. (You can disable the function in the view menu, if you wish.) It allows you to access a list of exploration and extraction functions directly from the browser panel. For instance, if you are on a search engine result page and roll over one of the result links with your mouse, all the other result links of the page will be highlighted and you will be able to right-click and ask the program to browse through them automatically.

The right-click menu on the page contains a large number of new functions (too many to list here), including the possibility to outline or indent the page content, to extract the links of the page, or paste links imported from another application.

Export Panel

Located beside the extracted data, the export panel now proposes several new file formats: XML, JSON, vCards or SQL UPDATE queries were added to HTML, CSV, TXT, SQL INSERT and Excel, available in the previous version, all with a real-time preview of what the exported data will look like.

Several new settings have also been added to the preferences. The most useful is a rather complete renaming format for exports and downloaded files which saves a lot of post processing time.

Scraping Toolkit

The scraping toolkit got its share of new directives and functions in the upgrade: more than 20 new features, among which: figure normalization, access to the cookies, automatic scrolling, pauses, etc. allowing for advanced scrapers, to handle tricky cases.

Conclusion

OutWit Hub has been around for many years now and it has become a very solid and powerful tool for a broad range of needs. User-suggested features are added regularly, so it is a good idea to use the feedback system on outwit.com or ask you question and share with your experience right here )).

One reply on “Meet new OutWit Hub 4.0!”

Great post. I loved this software. It did make things easier here.
I´ve found little info on building scrapers. I mean, is there a active forum were people discuss best solutions?

I´ve been trying to build a scraper that would use the #scrollToEnd# directive until a certain pattern would appear, not only once. Still trying. :0)

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.