Categories
Data Mining

Classification vs Clustering in Machine Learning

In the post we share some basics of classification and clustering in Machine learning. We also review some of the cluster analysis methods and algorithms.

Categories
Data Mining

Weibull distribution & sample averages approximation using Python and scipy

In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.

One may get the full .ipynb file here.

Categories
Data Mining

Simple text analysis with Python

Finding the most similar sentence(s) to a given sentence in a text in less than 40 lines of code 🙂

Categories
Data Mining

Big Data Basics

Power of Big Data: capabilities and perspectives

As everyone knows, technological development is evolving flash like. At the same time software requirements, approaches and algorithms are growing with equal speed.  In particular, relatively recently, developers have faced the problem of huge data volume processing – making it necessary to create a new, effective approach, a new paradigm of data storage. The solution was not long in coming – in 2011 huge companies all over the world started using the Big Data concept. In this article we will talk about this engaging approach.

Categories
Data Mining

Big Data, Data Analytics, Data Analysis, Data Mining, Data Science & Machine Learning

In this post, we’d like to share some of the most interesting terms that are used in today’s science and IT world. We think you will benefit from getting familiar with these modern tech-age expressions.

Categories
Data Mining

Testing the Filter by TheWebMiner for advanced web content filtering

thewebminer_logoRecently I came across an interesting new tool from TheWebMiner called Filter. The Filter is an attempt by TheWebMiner to sort (categorize) indexed websites and deliver them to users as a content filtering service.

Categories
Data Mining Development Guest posting

Audio Captcha Solving Algorithm for XBox

I want to share how I’ve done the audio captcha recognize-er. The audio captcha recognize-er was designed to solve captcha at xbox.com back in 2012. 

Categories
Data Mining Web Scraping Software

Easy Data Visualisation with Silk.co

This post is outdated. The silk.co service is no more awailable.

This is a guest post by Daniel Cave.

With the rise of social media sharing, collaboration and a increasingly interested market for data, there are more and more people wanting to ‘play with data’ and learn using some basics free tools. So recently I’ve been trying to find a technically advanced and interesting combination of free tools to collect and visualise web data that will allow enthusiasts and students to get those all important initial quick and easy wins.

Categories
Data Mining

Distributed File System Implementations and MapReduce strategy

We have already mentioned the MapReduce distributed computation style in data analysis for computing clusters in the previous post. Here we want to touch more on the matter of implementation of this strategy for distributed hardware.

Categories
Data Mining

Implementing frequent itemsets algorithm thru MapReduce

The problem of finding frequent itemsets in data analysis is described in this post, and here i state the practical steps for finding the frequent itemsets thru MapReduce.