Categories
Web Scraping Software

Import.io Enter the Enterprise DaaS Market

Import.io Enterprise
Recently, import.io (a free scraping online tool) announced that they are adding another way to get data from the web: they’ll build it for you. This new “Data as a Service” program is targeted at businesses and organizations who need data, but don’t have the time or resources to devote to using the import.io tool to build it themselves. For these clients, import will curate custom datasets based on their specific requirements as well as develop custom data implementation solutions based on the organization’s in-house software.

Categories
Monetize

My Experience in Choosing a Web Scraping Service

 Recently I decided to outsource a web scraping project to another company. I typed “web scraping service” in Google, chose six services from the first two search result pages and sent the project specifications to all of them to get quotes. Eventually I decided to go another way and did not order the services, but my experience may be useful for others who want to entrust web scraping jobs to third party services.

Categories
Development

An Independent Review of RegViz (Regex Online Tester)

regviz.org logoRecently I was asked to look at a brand- new online regex tester, regviz.org, developed as a collaboration of VISUS, University of Stuttgart and University of Trier. Though there are a lot of regex online testers on the market today, and many of them are quite good, let’s look at what is special about regviz.org and what it lacks.

Categories
Development

Where is NoSQL practically used?

For over four decades now, Relational Database Management Systems (RDMS) have dominated the enterprise market. However, the trend seems to change with the introduction of NoSQL databases. In this article, we are going to highlight practical examples where NoSQL systems have been deployed. We will also go further and point out other applications where implementation of such systems might be necessary.

Categories
Development

How to change WebDriver’s IP address

I have already written several articles on how to use WebDriver for web scraping, but I have never touched on the topic of changing WebDriver’s IP address. Nevertheless, this topic is quite crucial when you come to web scraping, and here I’d like to show you an example of using proxies with WebDriver in Python (and you can easily convert it into your language API).

Categories
Development

What is Cassandra Database?

Cassandra LogoApache Cassandra is a data management system designed and developed to handle huge amounts of data across multiple servers. It is open source, meaning its source code is freely available for anyone to study, modify and use.

Categories
Development

What is MongoDB?

MongoDB LogoMongoDB, an open-source document database written in C++, is classified as a NoSQL database. Because it avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), it facilitates quick-and-easy data integration in various applications.

Categories
Development

What is NoSQL?

nosql-expertThe term “database” was long synonymous with SQL, and for a while there seemed to be no viable alternative. Recently, however, the realm of data storage has welcomed a new option: NoSQL. This article offers you a brief overview of what NoSQL is and when it may be applied.

Categories
Development

MongoDB in a minute

MongoDB Logo
Have you ever heard about MongoDB? It’s a document-oriented NoSQL database. Instead of keeping data in familiar SQL-tables MongoDB keeps them as collections of JSON-like documents.

Intrigued? Read this tutorial and you will get a general impression about this database in just a minute.

Categories
Miscellaneous Web Scraping Software

7 Ways to Protect Website from Scraping and How to Bypass this Protection

stop-scrape In this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.