The task of email extraction is quite popular in the sphere of web scraping. Here I want to present you with a review of the GSA Email Spider, a useful program designed for collecting emails, phones and fax numbers from the web.
Some useful features of Email Spider
- Extracts emails starting from a URL as well as from search results for a given keyword
- Phone and fax numbers are collected too
- Automated email sender
- Harvests emails with the help of search engines (300+ included)
- Supports https web sites
- Supports SSL-only email providers (like google mail)
- Allows using proxy in the crawling process
- Can send emails directly using an internal SMTP server
- Can cheat anti-spider protection (e.g. by using a random user agent string)
- Collects emails with related extra information (e.g. addresses)
- Has many filters for conditional extraction (like specifying keywords or excluding some domain names)
How it works
The program has a simple dialog-based interface. First, as I mentioned earlier, you choose between starting with a keyword or with a URL. Then you can tune the extraction process with dozens of settings in the Options tab:
For example, to narrow your email search you can set up an additional filter on what email you need to scrape:
After everything is set up press the Start button and the email extraction process will start. When I ran the demo version I used keywords “php”, “scrape”, “cookie” and the extraction results were following:
- extraction time for 1000 results per search results was approx. 28 hours.
- 227,555 URLs were searched
- 49071 emails & phones were gathered
Though the demo version is limited to only 1000 search results per search engine, I was still impressed with the total number of emails that the spider could extract.
The Email Spider does not only extract email from the web but also can automatically send messages to the extracted emails (this feature is available in the full version only). The settings of this feature are shown on the picture below:
GSA Email Spider is a really good helper in email and phone extraction. Being simple it is smart enough (due to the large number of options) to sift only the relevant information. As an additional feature, the in-built automailer allows you to easily send several emails based on a single template.