Suppose there is a table like below (1 info row only):
Blows Minute (BPM) |
Speed (RPM) | Power, PSI | Flow, PSI | Tool Sys |
---|---|---|---|---|
0-2500 | 0-250 | 1.8 HP | 2.6-13.2 GPM | SDS Max |
How to scrape it using cheerio.js as a parser?
Suppose there is a table like below (1 info row only):
Blows Minute (BPM) |
Speed (RPM) | Power, PSI | Flow, PSI | Tool Sys |
---|---|---|---|---|
0-2500 | 0-250 | 1.8 HP | 2.6-13.2 GPM | SDS Max |
How to scrape it using cheerio.js as a parser?
Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove duplicate elements from an array. The magic is to make it a Set and then use Spread syntax to turn it back to array.
links = [];
$('div.items').each((index, el) => {
let link = $(el).attr('href');
links.push(link);
});
// remove repeating links
links = [...new Set(links)]
See also How to remove from an array empty or undefined elements.
Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove empty elements from it.
images = [];
$('div.image').each((index, el) => {
let url = $(el).attr('src');
images.push(url);
});
// remove invalid images
images = images.filter(function(img){
return img && !img.includes('undefined')
});
function strip_tags(str){
const tags = ['a', 'em', 'div', 'span', 'p', 'i', 'button', 'img' ];
const tagsAndContent = ['picture', 'script', 'noscript', 'source'];
for(tag of tagsAndContent){
let regex = new RegExp( '<' + tag+ '.*?</' + tag + '>', 'gim');
str = str.replace( regex ,"");
}
for(tag of tags){
let regex1 = new RegExp( '<' + tag+ '.*?>', 'gim');
let regex2 = new RegExp( '</' + tag+ '>', 'gim');
str = str.replace(regex1,"").replace(regex2,"");
}
return str;
}