Like every year, OutWit just released a major upgrade of its extraction program OutWit Hub. This version brings a number of interesting new features and some of them I’m going to cover in this post.
[box style=”blue rounded info”]We remind you that OutWit offers a good price for our readers![/box]
Loading Time
Contacts
The interface layout itself hasn’t changed much, except for one thing: in the list of views on the left side panel, ’emails’ was replaced by ‘contacts’. If you browse to the contact page of a Website, and click on it, you see why. By default, the program still extracts the emails of the pages you visit, but if you wish, it will also try to associate to each one all the contact information it can recognize (phone/fax number, physical address, name, URL…, even twitter, facebook or skype IDs, if available).
It offers several navigation functions to grab the contacts, from the fast exploration of a site to the systematic browsing of lists of links. Of course, like any automatic recognition system, it is not 100% but the results are pretty convincing (at least for the US, UK, Canada, Australia and Western Europe). If you need to be sure all the fields are grabbed the way you want, you can always create a custom scraper.
Link Highlighting
Another visible change is in the browser: series of links are highlighted as you hover over them in a Web page. (You can disable the function in the view menu, if you wish.) It allows you to access a list of exploration and extraction functions directly from the browser panel. For instance, if you are on a search engine result page and roll over one of the result links with your mouse, all the other result links of the page will be highlighted and you will be able to right-click and ask the program to browse through them automatically.
The right-click menu on the page contains a large number of new functions (too many to list here), including the possibility to outline or indent the page content, to extract the links of the page, or paste links imported from another application.
Export Panel
Located beside the extracted data, the export panel now proposes several new file formats: XML, JSON, vCards or SQL UPDATE queries were added to HTML, CSV, TXT, SQL INSERT and Excel, available in the previous version, all with a real-time preview of what the exported data will look like.
Several new settings have also been added to the preferences. The most useful is a rather complete renaming format for exports and downloaded files which saves a lot of post processing time.
One reply on “Meet new OutWit Hub 4.0!”
Great post. I loved this software. It did make things easier here.
I´ve found little info on building scrapers. I mean, is there a active forum were people discuss best solutions?
I´ve been trying to build a scraper that would use the #scrollToEnd# directive until a certain pattern would appear, not only once. Still trying. :0)