Data Mining and Datawarehousing

Data mining is highly valuable as a tool to assess the consumer behaviour. It is applied to develop the products, to price them and to promote them. Banking, finance and insurance (BFI) use data mining traditionally. The next extensive use is in the retail sector. The technique can be applied in the management of the supply chain. Supply chain has to integrate the supply and demand chain end-to-end. In this investigation, there is always an element of uncertainty leading to a mismatch between the demand and supply. Organisations take recourse to software to balance these. Uncertainty occurs in demand, supply and the process matching these two. We have to predict this uncertainty accurately.

In the simplest form, the supply chain looks like

Supplier— Supplier— Producer—- Distributor— Retail– Customer

Marketing flow

Information flow

There are two components. The suppliers on the one end. The distributors and the customers on the other end. In between, there is production process. Thus the supply and demand — both require forecasting to lessen the uncertainty. The method of forecasting conventionally used have limitations. A model can extrapolate the past data into future, based on known parameters and using statistical techniques. The limitations are inaccuracies in forecasting, the number of parameters used and the coefficients of these parameters. If data mining is used, these limitations can be overcome. Here the model is built and rebuilt repeatedly to approximate the reality. Traditional methods use imagination. In data mining the minor effects of some parameters can also be detected which imagination possibly could not.

Data mining is an interdisciplinary subject. Many definitions are possible. As we know, gold is mined from rocks or sand. This is called gold mining and not rock mining. As data mining results into knowledge, it could have been called knowledge mining from data. But this is too long. And if we shorten it to knowledge mining, it undermines the importance of a large volume of data. Even then mining retains the flavour of the process of extracting knowledge from a large amount of data. Some other terms are used to denote data mining — knowledge mining from data, knowledge extraction, data analysis, data archaeology and data dredging. Datamining is used synonymously with knowledge discovery from data (KDD).

Datamining ferrets out a pattern or relationship in the data. The data is subjected to statistical analysis and modeling techniques. Data mining is associated with knowledge discovery. There is a difference between datamining and online analytical data processing — OLAP. In OLAP, a hypothesis is tested by the user. However, auto mining itself generates the hypothesis. The financial implications are considered before applying the results of the detected pattern. OLAP answers myriad of questions. In the initial stages, OLAP is useful to understand data. The important variables are identified. The exceptions are noted. The interactions are discovered. These operations enhance our understanding of data. Datamining process has four stages:

I. Data Warehousing

Here data is managed for decision support. Data is collected, cleaned and converted from systems and other third-party sources. This constitutes the data warehouse. It is the foundation for data mining. The quality of data matters for good results. Data from legacy system is transferred, cleaned and analysed by storing it on central repository and making it available. The conversion of data must free it from irregularities.

As an alternative, data mart is used. It is a functional or departmental repository. The data is sourced from the systems critical to the unit possessing data mart and from select external agencies. These could be individual components.

The problem is that of inconsistency between the architecture of different data marts and data warehouse.

Though a data ware house is not a precondition for datamining, it is better to use it for larger data bases. A data warehouse partitions an online transaction processing system — OLTP of an organisation from its decision support system – DSSs. The data warehouse data is subjected to analysis by the executives. The data then becomes information that aids decision making. The data is classified subject-wise. The data is used for comparisons, trends and forecasting.

2 Data Mining Tools

Data extracted from a data warehouse generates a predictive model/ rule set. We can use algos to do this. The customers can be classified on the basis of characteristics and can be subjected to neural networks or decision trees. The segmentation analysis or clustering can also be subjected neural networks or decision trees. The probability of a customer preferring one product and also preferring another is a problem of association and sequencing. Here statistical techniques or rule induction can be applied. In forecasting, regression can be used. There should be a team effort.

3 Predictive Modelling

We have to optimise by selecting either one or a combination of the predictive models. These models could be developed statistically or may be derived from datamining. These could be done by the modellers or sourced from the external agencies. The predictive models are aggregated. The techniques can be blended sequentially or concurrently.

In the last few decades staticians and computer scientists have produced a dazzling arsenal of extremely powerful tools to help managers translate data into business decisions. Managers have to pick a golden model — one that is neither too complicated nor too simple. The simplest model can be run on Excel. The most complicated is a full-blown Hidden Markov Model — HMM which generally requires the use of specialised programming languages and takes much longer to run.

4. Predictive Scoring

Here the scoring is done for operational data. Banking customers who could not keep the minimum balance in their accounts could be extracted through datamining. These customer profiles could be used for business purposes.

print

Leave a Reply

Your email address will not be published. Required fields are marked *