**Data is everywhere**. We generate data when using an ATM, browsing the Internet, calling our friends, buying shoes in our favourite e-shop or posting on Facebook. Companies **collect this data en masse in order to make more informed business decisions**, such as:

- Which customers should participate in our promotional campaign for a given product in order to
**maximize response**? - Which customers should be paid special attention to, as they
**might be considering resigning from using our services**? - Is a particular customer trustworthy and
**does he/she qualify for a mortgage loan**?

It is not always easy to obtain answers to questions like the ones above. In such situations it is worthwhile to resort to** predictive analytics, which provides valuable information that can help make the right decisions**. In simple terms, **predictive analytics let us predict the future on the basis of historical data**.

For example, **if we know which customers stopped using our products in the past, we can build an analytical model describing the patterns of their behavior and characteristics**. If we observe similar behavior in other customers, especially the ones which are the most valuable to us (as they generate the largest sales), we may try to prevent them from departing.**Predictive analytics will provide us with the ranking of our customers according to their risk of departure** (this is the so called score – in our example, the higher the score, the higher the departure risk).

However, to build such an analytical model we need historical data …

Most often the data required for modeling are obtained from databases or flat files. Below, we discuss an example table with source data and ways of interpreting it.

Table columns are referred to as **variables or attributes**, while table rows are called **records, observations or objects**.

Variables can be:

**numerical (also called quantitative or continuous)**– for example age, income, temperature,**categorical (also called discrete, qualitative, nominal)**– such as gender, occupation, eye color.

We distinguish two basic roles a variable can have:

**independent (also called predictor, explanatory, feature)**– these variables describe the properties of objects which we want to use as the basis for making inferences,**dependent (also called response, explained, target)**– these variables describe the features of the object which we want to make inferences about.

It is worth remembering, that **information about the target variable should not be used when calculating the values of explanatory variables**.

Depending on the industry, task, there can be a lot of variables available for analysis.** We have worked on databases that had tens of thousands of variables**. In addition, **variable names are not always understandable**. For example, who would guess that POP901, MARR1, IC10 means respectively number of persons, the percentage of married , the percentage of households with an income $ 50,000 – $ 74.999 ? Therefore, during data analysis one should have a data dictionary with variables description.

**A predictive model describes the dependencies between explanatory variables and the target**. It lets us to predict the target value on the basis of explanatory variables. There are many types of models. The most popular ones include:

**regression (with the dependency expressed using a mathematical formula).**An example:

**decision tree (where the dependency is encoded using a tree-resembling graph).**An example:

Models can have the following roles:

**classification**– the target variable is discrete (i.e. decision trees, logistic regression),**approximation**– the target is continuous (i.e. linear regression, neural networks),**association**– co-occurrence of values (i.e. A-Priori algorithms, associative networks),**segmentation**– division into subgroups (i.e. k-means algorithm, Kohonen networks).

In our next post we are going to focus on the model building process. We will show it on an example of classification task. Classification is the process of assigning every object from a collection to exactly one class from a known set of classes. Examples of classification tasks are: assigning a patient (the object) to a group of healthy or ill (the classes) people on the basis of his or her medical record or determining the customer’s (the object) credibility during credit application using, for example, demographic and financial data; in this case the classes are „credible” and „not credible”.

The third part will be devoted to the notions of scoring and cut-off point.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central