Categories
Challenge

How to insert and configure reCAPTCHA v2 code in php

We’ve already introduced you to the theory behind the new NO CAPTCHA reCAPTCHA v2, but now we come to the practical integration part. Here we’ll share how to insert and configure “NO CAPTCHA reCAPTCHA” into a web page.

Categories
Uncategorized

Search queries in a search engine for scraping

Recently I’ve got a note with the question on search engine queries through the web scraping software.

“I’m looking for a scraper program that can initiate search queries in a search engine automatically, using proxies would be an added benefit if possible.”  – Mike
Categories
Development

How To Automaticlly Convert files in Google App Script


The other day I was challenged to do some cloud converting following the web scraping project with Google App Script(GAS)[1]. Namely to get a google doc file and to convert it into MS Word format. 

Categories
Uncategorized

Writing next generation scraping scripts with Web Robots IDE

http cookie
Most scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug.

Categories
Miscellaneous

Ethical issues of using employee monitoring software

Employee monitoring software has become commonplace. Many apps take monitor screenshots, capture keystrokes and mouse movements, monitor active applications and visited sites and, in extreme cases, can even take pictures using webcam. It seems to be fair to track what your employees do when they are being paid for their time. After all, if they exchange their time for money, it seems fair for the employer to know what they are paying for. So, why does it still feel morally inappropriate in some cases? The question is far from being just theoretical. If a wrong decision is made, a company may suffer from lawsuits, experience a backlash and overall productivity drop (opposite from what was intended) from their employees or suffer damage to the company’s image. Let’s review in more detail what employee monitoring practices can be considered valid and what should be avoided.

Categories
Uncategorized

Using Data Mining in Web Traffic Analysis

This short essay is about data mining methods applied in web traffic analysis and other business intelligence. It also provides a  modern look at data mining in light of the Big Data era.
For a site owner, business blogger or e-commerce entity, there are always some variables of interest concerning web traffic and statistics. How would you predict future values of variables of interest? Variables of interest might include the number of visitors to a target website, the time each visitor spends on the site, and whether or not the visitor reaches the site’s goals. One needs to mention that these web traffic and site performance analyses are not imposed with stringent time constraints. Data mining techniques seek to identify relationships between the variable of interest and the variables in a data sample. There are at least 3 analysis models for data mining that we consider here.

Categories
Uncategorized

Data, Information, Knowledge: what’s the difference?

Have you ever thought that there is a difference between such terms as “data”, “information” and “knowledge”? Often people mix and misuse them and it’s not a problem in our daily life, but when we come to Data Mining it’s good to distinguish them. Here I’ll try to show the difference in an comprehensible way.

data-information-knowledge

Categories
Development

Selenium IDE and Web Scraping

Selenium is a web application testing framework that supports for a wide variety of browsers and platforms including Java, .Net, Ruby, Python and other. In this post we touch on the basic structure of the framework and how it can be applied in Web Scraping.

Categories
SaaS

80legs Review – Crawler for rent in the sky

80legs offers a crawling service that allows users to (1) easily compose crawl jobs and (2) cloud run their crawl jobs over the distributed computer network.

The modern web requires you to spend huge amount of processing power to mine it for information. How could a start-up or a small business do comprehensive data crawling without having to build the giant server farms used by major search engines?

Categories
SEO and Growth Hacking

How to leverage Web Scraping for SEO

Eppie Vojt at the SEOmoz Meetup on the scrape leverage for the site SEO. Techniques: XPath and Regex in Google Docs to fetch links and more.