If you shop at Zalando, you leave behind data. Data is created when cars are tested, whenever you use a lift, and when somebody checks incoming goods at a warehouse. This data contains knowledge that can be valuable for your success.
You do not need luck to strike gold. You need data mining, and discovering a pattern in large quantities of data can be worth more than its weight. Such information can help small and medium sized enterprises serve their customers better, make their production more efficient, streamline their supply chain, improve product quality and reduce downtimes.
Amazon, for instance, uses data mining to suggest products: customers who bought a certain book bought this one too. Suggestions like these boost the online retailer’s sales by around a third.
Lift manufacturer Otis analyses data in conjunction with machine learning to perform ‘predictive maintenance’. This new service improves lift life cycles and increases customer satisfaction.
Data mining definition
Data mining is a computer-aided method which utilises concepts from information technology, statistics and mathematics to analyse data. Data mining algorithms reveal logical links as patterns or trends in data. This helps you identify and work on correlations, regularities, problems and weak points.
Statistics help to check hypotheses using small random samples and sample sizes, whereas data mining automatically generates new hypotheses using an endless quantity of data. Artificial Intelligence (AI) and machine learning are also used to analyse data.
‘Mining’, therefore, is not about accumulating data, it is about extracting knowledge from data and generating knowledge. That goes way beyond processes like evaluating KPIs in controlling.
Text mining is a related method which is about information in long text documents. It uses unstructured data, whereas data mining usually uses structured data from databases.
The kind of text that might be analysed includes e-mails, memos of discussions, news feeds, Web forms, online discussions and open-ended responses in surveys.
These can be recorded and made useful by means of text mining, for things like research and development, marketing and customer services. Some data mining services include a text mining feature.
Discovering knowledge in databases
Computer-aided mining is part of a complex process. Database specialists defined it as a standard in 1989 and called it ‘Knowledge Discovery in Databases’, or KDD for short.
This model aims to avoid making a source out of ‘primitive data sets’ – data containing no correlations. The phases of KDD constitute a ‘non-trivial process’, as specialists point out. They can be reiterated to increase the quality of analysis.
KDD produces valid, new, potentially useful and clear-to-follow patterns that are derived from the data
The knowledge discovery process
No data mining without Big Data
If you want to use data mining, you need ‘Big Data’, which means a large and relevant quantity of data sets. A simplified definition of Big Data is: an amount of data that no longer fits in an Excel table. Excel reaches its limit at 1,048,576 rows and 16,384 columns.
Data is created in so many places nowadays that Excel can be outgrown in mere minutes in certain businesses.
Data mining does not require any specific amount of data, it requires relevant data. But it can deal with plenty of bits and bytes. That is why we can safely say that Big Data is the right place for data mining.
The technical definition of Big Data is the systematic collection and storage of large, complex, fast-changing quantities of data.
These 6 Vs characterise Big Data:
- Velocity – the speed of collecting, processing and evaluating
- Volume – the quantity of data
- Variety – the diversity of complex data sets
- Veracity – truthfulness and credibility of data
- Value – how valuable data is to business
- Validity – securing data quality
A regular data server is not really big enough for storing and processing these quantities of data. It is worth working in collaboration with a data warehouse to process Big Data quickly and obtain real-time analyses.
CRM – a good source for data mining
If you document your customer relationships comprehensively and carefully in a Customer Relationship Management System (CRM), that is the best scenario for using data mining.
You can search for patterns in the data, and these can help you acquire new customers or animate customers who have not been active for a long time. You may even find ideas in the data about how to get back customers you have lost.
Data mining also helps you make better strategic decisions. The new knowledge influences campaigns and customer programmes as well as production processes and security concepts – not just once, but over and over again. If you analyse data in real time, you will respond much quicker to warning signs and successes.
Directly or indirectly, new knowledge derived from the data will boost sales, and therefore profits. It will help create value. The insights gained will help you develop new products and services and even new business models.
That is why data mining software is very useful and important for small and medium sized enterprises – even allowing them to overtake large businesses and corporations.
Check first, then analyse
Before you can begin data mining, you have to inspect and check the data material. Data often comes from a wide variety of sources such as databases, sensors and tracking.
This is the phase in which original data is gathered into data sets, making it more suitable for data mining. The key thing is to eliminate sources of error from the data collected.
That may include missing figures and wrong information. Data of that kind is called ‘noisy’. Inconsistent data also harms evaluations. It may include contradictory figures, such as an age that contradicts a date of birth.
Preparing data takes more time than the data mining itself. They often speak of a ratio of 80:20: 80 percent of the time is taken for preparation, 20 percent for analysis. The preparation of data depends very much on the question that is being investigated using data mining.