Methods

Modeling Aspects of Modeling

Modeling

Modeling is an integral part of how data mining works, and can be a simple process involving only a few variables or a complex one utilizing many variables in the procedure. 

Two types of modeling approach data sets in different ways to bring forth hidden value.

    1. Theory driven modeling (or hypothesis testing)
Theory driven modeling involves testing a predetermined theory. This theory, built from prior knowledge, is applied to the new data set to draw out valuable information to confirm of refute the theory.

    2. Data driven models
Data driven models work in the opposite manner, using tools to create models based on patterns found in the new data. The new model is then tested for accuracy and completeness in the data set.

The final model for a given set of data is often a combination of both modeling strategies, bringing together prior knowledge and new information from the data.

Back to top


Aspects of Modeling

Classification

    *    Widely used to categorize data and map data according to attributes.
Regression
A theory driven tool used in two ways:

    *    Linear - used to find relationships between variables and apply a line of best fit to show these relationships (the steeper the slope, the greater the impact of changing the independent variable.)
    *    Logistic - estimates the probability of a certain event's future occurrence by examining observed factors and occurrences.
Clustering

    *    A data driven tool designed to divide data sets into groups by identifying similar characteristics within the data.
Summarization

    *    Describes the relationships between the information and the desired outcome, stating the strength of the correlation.
Change & Deviation Detection

    *    Examines data over time to determine patterns of change and unexpected events.
Decision Trees/Rules

    *    A data driven tool used to separate out data by sets of rules likely to have different effects on targeted outcome; particularly good for analyzing attrition, promotions analysis, credit-risk, and detecting fraud.

Back to top