In the post we share some basics of classification and clustering in Machine learning. We also review some of the cluster analysis methods and algorithms.
Tag: data mining
In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.
One may get the full .ipynb file here.
Invalid data, what it is?
Often we see “invalid data”, “clean data”, “normalize data”. What does it mean as to practical data extraction and how does one deal with that? One shot is better than 1000 words though:

Simple text analysis with Python
Finding the most similar sentence(s) to a given sentence in a text in less than 40 lines of code 🙂
In this post, we’d like to share some of the most interesting terms that are used in today’s science and IT world. We think you will benefit from getting familiar with these modern tech-age expressions.
We have already mentioned the MapReduce distributed computation style in data analysis for computing clusters in the previous post. Here we want to touch more on the matter of implementation of this strategy for distributed hardware.
The problem of finding frequent itemsets in data analysis is described in this post, and here i state the practical steps for finding the frequent itemsets thru MapReduce.
This post is a continuation of the previous post on Advertising on the Web and Data mining. Here we conclude by reviewing some basic algorithms for placing ads on the web.
The challenge of effective web advertisement primarily involves placing relevant ads on user requested web pages. Those ads must be relevant to a page receiver, that is relevant to the page context and/or directly to the user. What algorithms are being used for this? What trends are there now in business intelligence and data mining for digital advertisement solutions?
As we have touched on some basics on Clusters in Data Mining, we want to consider the computation techniques applied for clusters. Those techniques stand in line with the data mining for web traffic analysis.