Categories
Development

Cheerio.js, get items from html table into object

Suppose there is a table like below (1 info row only):

Blows
Minute (BPM)
Speed (RPM) Power, PSI Flow, PSI
Tool Sys
0-2500 0-250 1.8 HP 2.6-13.2 GPM SDS Max

How to scrape it using cheerio.js as a parser?

Case 1 (1 row only)

Categories
Development

How to remove from JS array duplicate elements

Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove duplicate elements from an array. The magic is to make it a Set and then use Spread syntax to turn it back to array.

links = [];
$('div.items').each((index, el) => { 
    let link = $(el).attr('href');						     
    links.push(link);
}); 
// remove repeating links
links = [...new Set(links)]

See also How to remove from an array empty or undefined elements.

Categories
Development

How to remove from JS array empty or `undefined` elements

Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove empty elements from it.

images = [];
$('div.image').each((index, el) => { 
    let url = $(el).attr('src');						     
    images.push(url);
}); 
// remove invalid images
images = images.filter(function(img){
    return img && !img.includes('undefined')
});

See also How to remove from array duplicate elements.

Categories
Development

Strip HTML tags with and without inner content in JavaScript

function strip_tags(str){
   const tags = ['a', 'em', 'div', 'span', 'p', 'i', 'button', 'img' ];
   const tagsAndContent = ['picture', 'script', 'noscript', 'source'];  	 
   for(tag of tagsAndContent){ 
      let regex = new RegExp( '<' + tag+ '.*?</' + tag + '>', 'gim');
      str = str.replace( regex ,"");
   }
   for(tag of tags){
      let regex1 = new RegExp( '<' + tag+ '.*?>', 'gim');
      let regex2 = new RegExp( '</' + tag+ '>', 'gim');
      str = str.replace(regex1,"").replace(regex2,""); 
   } 
   return str;
}