Today I got a question from one of my readers asking if there is a good out-of-the-box solution for crawling multiple websites for contact information.
I am looking to extract phone numbers from a list of URLs, and I didn’t find a tool which enable me to look specifically for a phone number. Is there a tool with a good quality/price ratio that can enable me to look for phone number from different websites.
One of the problem being that sometimes we need the phone number is on the homepage, and sometimes it is on a «contact» page.
The tool that I can recommend is Web Data Extractor. This scraper does not impose any user-made defined patterns to be put on website pages. It’s just simple a mass data gathering utility. It blindly gathers data and then sifts through the data (links, emails, phones, etc.) based on patterns; most likely regex patterns.
Its Pro version should work to deal with multiple different URLs: “Pro version of WDE doesn’t have any limits – feel free to process thousands of sites, gigabytes of data“.
And it seems they offer a discount for it if you ask :-).
Another tool that might help is Outwit Hub. It’s more decent tool, yet it’s not quite as straightforward for collecting phone numbers. Besides one probably would need to refine phone numbers from general contact info. So this is plan B. 🙂
In addition to Outwit Hub, there are other powerful web scrapers which I have reviewed here, but they’re not as simple of a solution because you’ll need to program them on each website and then have a system for pulling out the phone numbers the data you get back.