When working with Apify crawlers, it’s necessary to init RequestQueue. How to fill in RequestQueue from txt file?
Given
A text file with urls to crawl. In our case it’s categories.txt. We’ll use LineReader node package to open and iterate the file line by line.
LineReader to install:
npm i --save line-reader
Since requestQueue methods return Promise, when iterating over the lines of the file we need to apply async function for each line to be added as url into the requestQueue.
The code
const queue_name ='ebinger';
const base_url = 'https://www.ebinger.com/';
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue(queue_name);
const lineReader = require('line-reader');
lineReader.eachLine('categories.txt', async function(line) {
//console.log('adding ', line);
let url = base_url + line.trim();
await requestQueue.addRequest({ url: url });
});
var { totalRequestCount, handledRequestCount, pendingRequestCount, name } = await requestQueue.getInfo();
console.log(`RequestQueue "${name}" with requests:` );
console.log(' handledRequestCount:', handledRequestCount);
console.log(' pendingRequestCount:', pendingRequestCount);
console.log(' totalRequestCount:' , totalRequestCount);
...