In this post, we’d like to share some of the most interesting terms that are used in today’s science and IT world. We think you will benefit from getting familiar with these modern tech-age expressions.
|Big Data||Data Analytics|
|Data Mining||Data Analysis|
|Data Science||Machine Learning|
Big Data (in our age) is mostly digital unstructured data that today’s society tries to structure, unify, and gain insights. The amount of unstructured data grows exponentially, and the means to process them needs to be of higher complexity compared to data analytics tools focused on small data sets. Big Data implies data sets that are too large to store in a single computer’s memory and must be both stored and processed distributively. For the latter, new algorithmic distribution models are to be applied.
Another problem working with Big data is that most of it is not openly accessible. A big chunk of today’s data are kept secured. See the open info index (related to government data only). Regardless of this, the remaining public data is still huge and falls under Big Data concept.
Data Analysis is a heuristic activity, where scanning through all the data the analyst gains some insight (makes it useful info). Data Analysis leverages statistical methods to analyze aggregated or non-aggregated data.
Analytics is about applying a mechanical or algorithmic process to find insights. For example, running through various data sets with a purpose of finding meaningful correlation between them. This takes the use of statistics and data science tools. Analytics are the result of analysis and the form of presentation of the analysis results; might imply prediction interest.
Data mining (term coined in business world) is analyzing data for the purpose of discovering unforeseen patterns or properties. It makes messy unstructured data into useful info.
It is the computational process of discovering patterns in large data sets (involving Big Data abstraction) involving methods at the intersection of artificial intelligence, machine learning, and database systems.
Data mining closely relates to data analysis. One can say that Data mining is data analytics operating on big data sets, because no small data sets would issue meaningful analytics insights. Data mining, shortly speaking, is the process of transforming data into useful information.
Data mining is more rooted on the database (static, already stored data) point of view, whereas machine learning has been originated from a desire to make an Artificial Intelligence (AI). Classical algorithms that I would classify Data mining ones include: Apriori (finding associations), DBSCAN (finding clusters) and Decision trees.
Data Science is a science field that includes methods and processes for operating over data. It’s a cluster of mathematics, statistics, programming, and ingenious ways of capturing data that may not be being captured right now. Data Science includes Machine Learning and other methods:
- problem formulation
- exploratory data analysis
- data model compiling
- data visualization
- data extraction
I’ve put the following picture to relate the above-mentioned terms to Data Science:
Machine Learning finds patterns in [big] data that useful for researchers and are not visible from human point of view. Machine learning implies an algorithmic model that describes a certain process and its issues, makes prediction for the subject of the model [real world systems], and self-adjusts. Derived model provides recommendations/insights and monitors the results once those recommendations are implemented. The acquired results contribute to improve the model. See the following picture, the Feedback dotted branch demonstrating Machine Learning drastically differ from the plain data analysis:
Machine Learning is not a static, hard-coded model but a self-learning, self-adjusting model (machine learns and changes itself). An example of Machine Learning is weather prediction models that accumulate information from year to year, compare it to previous years info, and re-calculate average (mean) values to provide better insights into weather forecasts.
Machine Learning or Artificial Intelligence compared to Data Mining is more on incorporating acquired knowledge into the framework for further (i.e future) use in analysis.
The difference between Analytics and Data Mining
- When you know the questions and where to find the data, you are using Data Analytics.
- When you don’t know exact questions or where to look for answers, you have to rely on Data Mining.
In statistics, aggregate data is data combined from several measurements. When data is aggregated, groups of observations are replaced with summary statistics based on those observations. In data warehouse, the use of aggregated data dramatically reduces the time to query large sets of data.
Heuristic or heuristic technique is a strategy derived from experience with similar problems, using readily accessible information, to control problem solving in human beings, machines, and abstract issues. The most fundamental heuristic is Trial and error.