The A1 scraper by Microsys is a program that is mainly used to scrape websites to extract data in large quantities for later use in webservices. The scraper works to extract text, URLs etc., using multiple Regexes and saving the output into a CSV file. This tool is can be compared with other web harvesting and web scraping services.
How it works
- Go to the ScanWebsite tab and enter the site’s URL into the Path subtab.
- Press the ‘Start scan‘ button to cause the crawler to find text, links and other data on this website and cache them.
Important: URLs that you scrape data from have to pass filters defined in both analysis filters and output filters. The defining of those filters can be set at the Analysis filters and Output filters subtabs respectively. They must be set at the website analysis stage (mode).
- Go to the Scraper Options tab
- Enter the Regex(es) into the Regex input area.
- Define the name and path of the output CSV file.
- The scraper automatically finds and extracts the data according to Regex patterns.
The result will be stored in one CSV file for all the given URLs.
There is a need to mention that the set of regular expressions will be run against all the pages scraped.
Some more scraper features
Using the scraper as a website crawler also affords:
- URL filtering.
- Adjustment of the speed of crawling according to service needs rather than server load.
If you need to extract data from a complex website, just disable Easy mode: out press the button. A1 Scraper’s full tutorial is available here.
The A1 Scraper is good for mass gathering of URLs, text, etc., with multiple conditions set. However this scraping tool is designed for using only Regex expressions, which can increase the parsing process time greatly.