Big Data Basics

Power of Big Data: capabilities and perspectives

As everyone knows, technological development is evolving flash like. At the same time software requirements, approaches and algorithms are growing with equal speed.  In particular, relatively recently, developers have faced the problem of huge data volume processing – making it necessary to create a new, effective approach, a new paradigm of data storage. The solution was not long in coming – in 2011 huge companies all over the world started using the Big Data concept. In this article we will talk about this engaging approach.

Big Data, Data Analytics, Data Analysis, Data Mining, Data Science & Machine Learning

In this post, we’d like to share some of the most interesting terms that are used in today’s science and IT world. We think you will benefit from getting familiar with these modern tech-age expressions.

Testing the Filter by TheWebMiner for advanced web content filtering

thewebminer_logoRecently I came across an interesting new tool from TheWebMiner called Filter. The Filter is an attempt by TheWebMiner to sort (categorize) indexed websites and deliver them to users as a content filtering service.

Audio Captcha Solving Algorithm for XBox

I want to share how I’ve done the audio captcha recognize-er. The audio captcha recognize-er was designed to solve captcha at xbox.com back in 2012. 

Easy Data Visualisation with Silk.co

Data Visualisation with Silk.coThis is a guest post by Daniel Cave.

With the rise of social media sharing, collaboration and a increasingly interested market for data, there are more and more people wanting to ‘play with data’ and learn using some basics free tools. So recently I’ve been trying to find a technically advanced and interesting combination of free tools to collect and visualise web data that will allow enthusiasts and students to get those all important initial quick and easy wins.

Distributed File System Implementations and MapReduce strategy

We have already mentioned the MapReduce distributed computation style in data analysis for computing clusters in the previous post. Here we want to touch more on the matter of implementation of this strategy for distributed hardware.

Implementing frequent itemsets algorithm thru MapReduce

The problem of finding frequent itemsets in data analysis is described in this post, and here i state the practical steps for finding the frequent itemsets thru MapReduce.

Data Mining: The AdWords Problem Review

This post is a continuation of the previous post on Advertising on the Web and Data mining. Here we conclude by reviewing some basic algorithms for placing ads on the web.

Advertising on the Web and Data mining

The challenge of effective web advertisement primarily involves placing relevant ads on user requested web pages. Those ads must be relevant to a page receiver, that is relevant to the page context and/or directly to the user. What algorithms are being used for this? What trends are there now in business intelligence and data mining for digital advertisement solutions?

Clustering in a Parallel Environment and MapReduce

As we have touched on some basics on Clusters in Data Mining, we want to consider the computation techniques applied for clusters. Those techniques stand in line with the data mining for web traffic analysis.