Recently I’ve received a request on how to sum the total hours of a Youtube videos in a search result. I’ve made the simple JS iterator that fetches hours/min/sec from browser html info and sums them up.
See the code below:
// set a global var to accumulate total play hours var hours = 0; function delay(ms) { ms += new Date().getTime(); while (new Date() < ms){} } // count hours for a single page results thru span selector values function domCounter(selector){ var a = document.querySelectorAll(selector); var hour = 0, min = 0, sec = 0; for(var i=0; i < a.length;i++){ var time = a[i].innerHTML.split(':'); // console.log(time); if (time.length==2) { /* only mins and secs */ min += parseInt(time[0]); sec += parseInt(time[1]); hour=0; } else if (time.length==3) { /* hours, mins and secs are present */ hour += parseInt(time[0]); min += parseInt(time[1]); sec += parseInt(time[2]); } } return hour + Math.round((min + Math.round(sec/60))/60); } // click "Next" pagination button function NextBtnClick(){ var list = document.getElementsByClassName("yt-uix-button-content") for (var i = 0; i < list.length; i++) { if (list[i].textContent == 'Next »') { list[i].click(); break; } } return 1; } // main counter that is set to an interval var loop=0, prev_href = location.href; var count = setInterval(function(){ loop++; if (prev_href==location.href) { console.log('The same url(' + prev_href + '), no move of the scraper.'); // you might add some more of the logic } else { 90 var newhours = domCounter('span.video-time'); prev_href=location.href; hours += newhours; console.log('loop '+ loop +';\ntotal '+ hours + ' hours counted;'); } NextBtnClick(); }, 3000);
You may try the console scraper on the search results for “north pole fauna“. Open the Youtube search results, open web developer tools (F12, Ctrl+Shift+I in Opera), and insert the code by the way similar to described in this post.
Large search results amount – Proxying needed
Since Youtube does not tolerate frequent requests from a single IP, there is a need to add proxies- if scraping many search pages (over 30). If no proxy are used, Youtube will change the page layout after about 33 loops. It removes the Next button and limits the number of results on a page.
This scraper is good to extract play length of up to 600 search results. There is room for improvement. Your question and suggestions are welcomed.
6 replies on “Traverse and count youtube videos total play length”
Why did you code in Javascript?
I coded in JS since it’s simple client-side language, absolutely server independent.
Do you have a method to implement proxying, or you’re simply noting that it’s required after 33-ish loops?
So far I have not applied proxying but I wish I would apply. Can you help, Matt, for proxying in JS with any suggestion/code?
Also, looks like your link to “this post” is broken, above.
Matt, thank you. Fixed.