In the post we share the differences between Crawler, Scraper and Parser.
Crawler is a web bot that visits a stack of web pages (one might call them nodes) and accumulates the links (urls) of the nodes, deriving new urls from each new web page [html] that it visits. Crawler might or might not get pages’ info in a data storage. It does not go deep (e.g. into detail pages) unless programmed explicitly.
Scraper is a bot that visits web pages of a given set of urls. It does not collect new urls (as a crawler does). It rather visits pre-collected urls and retrieves relevant data to store into a data storage.
Parser is an [offline] robot that processes or analyses given data to make of them proper data structures. It retrieves information from [unstructured] data, whether from data storage or directly from the web (eg. HTML). Consider the following html piece supposedly scraped of a certain web page by url=”https://battery-store.com/Batteries+Plus+Calcium-f4d67gh”:
<form id="form-2345609">
<div id="item-2345609" >Batteries Plus Calcium 12V 74Ah 680A battery AK-ZP57412</div>
<label name="price" currency="US" >48.08</label>
<label name="price" currency="CA" hidden >53.00</label>
<input type="hidden" id="sku-YU23809" name="SKU" >
<input type="hidden" id="csrf" value="dca4545878573fe5de89ddffaba5aa051a3b" >
<input type="submit" value="Order" name="submit" >
</from>
Parser may make of it a useful data item:
[{ "id":2345609, "name": "Batteries Plus Calcium 12V 74Ah 680A battery AK-ZP57412", "sku":"YU23809", "price_us":"48.08", "price_ca":"53.00", "url":"https://battery-store.com/Batteries+Plus+Calcium-f4d67gh" }]
Often a scraper includes the parser functionality in itself.
See the examples of simple email crawlers (Python, Java) and a scraping project where the scraper and crawler functionality go side by side. In that project a crawler gathers the [domain] urls and processes them based on whether it is a detail page or a search result.