Try the code
See the scripts below.
See the scripts below.
function areCookiesEnabled()
{
var cookieEnabled = (navigator.cookieEnabled) ? true : false;
if (typeof navigator.cookieEnabled == "undefined" && !cookieEnabled)
{
document.cookie = "test";
cookieEnabled = (document.cookie.indexOf("test") != -1) ? true : false;
}
return cookieEnabled;
}
Navigator is the interface represents the state and the identity of the user agent. It allows scripts to query it and to register themselves to carry on some activities.
A Navigator
object can be retrieved using the read-only window.navigator
property.
function isStorageEnabled() {
try{
sessionStorage.setItem("test","value");
if(sessionStorage.getItem("test") == "value") {
sessionStorage.removeItem("test");
return true;
} else {
return false;
}
} catch(err) {
return false;
}
}
I’ve made a simple node.js server at VDS:
var http = require('http');
http.createServer(function (req, res) {
let port = 9999;
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(post, '0.0.0.0');
console.log('Server running at port ' + port);
It works outputting:
Server running at port 9999
yet I can’t reach it at VPS/VDS IP where the code is residing: http://webscraping.pro:9999/ How to solve that?
Content is most basic way to attract traffic – without a certain amount of quality content, neither Google nor visitors would be interested in your website because there is little value they can get browsing it.
There are 2 main coding-free solutions for extracting content from websites to build your content base: choose one or a combination of themand have a try!
We recently composed a scraper that works to extract data of a static site. By a static site, we mean such a site that does not utilize JS scripting that loads or transforms on-site data.
When working with Apify crawlers, it’s necessary to init RequestQueue. How to fill in RequestQueue from txt file?
A text file with urls to crawl. In our case it’s categories.txt. We’ll use LineReader node package to open and iterate the file line by line.
LineReader to install:
npm i --save line-reader
Since requestQueue methods return Promise, when iterating over the lines of the file we need to apply async function for each line to be added as url into the requestQueue.
const queue_name ='ebinger';
const base_url = 'https://www.ebinger.com/';
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue(queue_name);
const lineReader = require('line-reader');
lineReader.eachLine('categories.txt', async function(line) {
//console.log('adding ', line);
let url = base_url + line.trim();
await requestQueue.addRequest({ url: url });
});
var { totalRequestCount, handledRequestCount, pendingRequestCount, name } = await requestQueue.getInfo();
console.log(`RequestQueue "${name}" with requests:` );
console.log(' handledRequestCount:', handledRequestCount);
console.log(' pendingRequestCount:', pendingRequestCount);
console.log(' totalRequestCount:' , totalRequestCount);
...
When we use Selenium or Node.js + Puppeteer to run [headless] Chrome/Chromium we might need to add some extra functionality/conditions to launch browsers with. Below you’ll find all kinds of Conditions and their explanations.
The Chromium Team has made a page on which they briefly explain how to use these switches.
Recently I noticed the question about extracting emails, phones, links(urls) from text fragments and immediately I decided to write this short post.
Each of the following: email, phones, link, form a category that falls under/matches a certain text pattern. What are the text patterns ? These are regexes, aka regex patterns, short for regular expressions. Eg. most emails fit into the following regex pattern:
^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
The post will share with you the difference between the production and development builds of PWA. If you are not familiar to PWA (Progressive Web Application) please visit that blog post.