Categories
Development Miscellaneous Web Scraping Software

Import.Io Magic Method API

Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:

  • The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
  • It seems to have trouble with some javascript pages.

But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes.

Categories
Miscellaneous

Simple way HTML change monitoring

html_change_mnitoring_logo1I recently came across this question in the Q&A section of a forum I belong to:

“I want to run once a day a script that will check whether the specific part of code has been changed, and if it did, we would get some return message (ideally directly to my email). What would be the easiest, simplest way to do that? I’ve read about web crawlers, web scrappers, but they seem to be doing far more than we need.”

Sure, if all you want to do is something as lightweight as monitoring a set of target pages for changes, then using a ready monitoring tool is probably way more than you need. You need to keep it simple. So, here’s a quick solution with Google spreadsheet.

Categories
Miscellaneous

Scraping with import.io Magic – The Future?

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.

Categories
Miscellaneous Web Scraping Software

7 Ways to Protect Website from Scraping and How to Bypass this Protection

stop-scrape In this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.

Categories
Miscellaneous

An Independent Test of 7 Hosting Providers

Choosing a provider is not an easy task, you always want to find something «cheap and cheerful». However, quite often it is hard to find a golden mean and you have to choose between computing power, speed, and cost, not mentioning additional features such as DNS-servers, control panel, etc. In this article, I will present you test results for several providers of various sizes, and I’m hoping that it will guide you in a decision-making process of choosing a hosting.

Categories
Miscellaneous

What is import•io from the user’s point of view?

Import•io is a big data cloud platform that has the ambitious goal of turning the web into a database.  It was founded in March, 2012, and a year later it received $1.3M in seed funding from Wellington PartnersLouis Monier and Emmanuel Javal.

Categories
Miscellaneous

Free Website Backup

For simple web scraping jobs I often prefer a php + mysql bundle putting the project right to the web and working online. But as you work online a problem appears: how to backup your work results?

Categories
Miscellaneous

Ethical issues of using employee monitoring software

Employee monitoring software has become commonplace. Many apps take monitor screenshots, capture keystrokes and mouse movements, monitor active applications and visited sites and, in extreme cases, can even take pictures using webcam. It seems to be fair to track what your employees do when they are being paid for their time. After all, if they exchange their time for money, it seems fair for the employer to know what they are paying for. So, why does it still feel morally inappropriate in some cases? The question is far from being just theoretical. If a wrong decision is made, a company may suffer from lawsuits, experience a backlash and overall productivity drop (opposite from what was intended) from their employees or suffer damage to the company’s image. Let’s review in more detail what employee monitoring practices can be considered valid and what should be avoided.

Categories
Miscellaneous

An easy way to backup your MS SQL Server database

If you use Microsoft SQL Server to store and process your data, you are probably in the market for a convenient backup and restore tool. In this post, I’d like to share a very nice tool for backup and restoring of your MS SQL database in “three clicks”.

Categories
Miscellaneous

Data extraction using iMacros plugin for IE

In this video i share how you might automate data extraction using the iMacros plugin for IE browser.

The iMacros plugin for IE has the most visual interface compare to equal iMacros plugins for FF or Chrome browsers. Yet, the same macro might be run at the iMacros plugins at any of the browsers. A data extraction is only one of the niches the plugin is of use, see the short description of all its usage here. The code of the macro from the video above you might see down here:

VERSION BUILD=8021970
TAB T=1
TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://www.londonstockexchange.com/exchange
/prices-and-markets/stocks/indices/summary
/summary-indices-constituents.html?index=AIM1
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:1