Tag: HTML

Cheerio.js, get items from html table into object

Post author By admin
Post date April 16, 2021
2 Comments on Cheerio.js, get items from html table into object

Suppose there is a table like below (1 info row only):

Blows Minute (BPM)	Speed (RPM)	Power, PSI	Flow, PSI	Tool Sys
0-2500	0-250	1.8 HP	2.6-13.2 GPM	SDS Max

How to scrape it using cheerio.js as a parser?

Case 1 (1 row only)

Tags Cheerio, HTML, parse

Development

How to remove from JS array duplicate elements

Post author By admin
Post date January 26, 2021
No Comments on How to remove from JS array duplicate elements

Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove duplicate elements from an array. The magic is to make it a Set and then use Spread syntax to turn it back to array.

links = [];
$('div.items').each((index, el) => { 
    let link = $(el).attr('href');						     
    links.push(link);
}); 
// remove repeating links
links = [...new Set(links)]

Tags HTML, Javascript

Development

How to remove from JS array empty or `undefined` elements

Post author By admin
Post date January 26, 2021
No Comments on How to remove from JS array empty or `undefined` elements

Often at scraping I collect images or other sequential elements into an array. Yet, afterwards I need to remove empty elements from it.

images = [];
$('div.image').each((index, el) => { 
    let url = $(el).attr('src');						     
    images.push(url);
}); 
// remove invalid images
images = images.filter(function(img){
    return img && !img.includes('undefined')
});

Tags HTML, Javascript

Development

Strip HTML tags with and without inner content in JavaScript

Post author By admin
Post date December 15, 2020
No Comments on Strip HTML tags with and without inner content in JavaScript

function strip_tags(str){
   const tags = ['a', 'em', 'div', 'span', 'p', 'i', 'button', 'img' ];
   const tagsAndContent = ['picture', 'script', 'noscript', 'source'];  	 
   for(tag of tagsAndContent){ 
      let regex = new RegExp( '<' + tag+ '.*?</' + tag + '>', 'gim');
      str = str.replace( regex ,"");
   }
   for(tag of tags){
      let regex1 = new RegExp( '<' + tag+ '.*?>', 'gim');
      let regex2 = new RegExp( '</' + tag+ '>', 'gim');
      str = str.replace(regex1,"").replace(regex2,""); 
   } 
   return str;
}

Tags HTML, Javascript