.

We have witnessed the rise of **Key & Value pair**, since the emergence of Big Data. We certainly can explore the relationship of such two variables in terms of X & Y, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:

**Independent** Variables & **Dependent** Variables

Let us take behavior of users of a financial institution. We take a hypothetical data (random sample) of 6 users visiting one specific website of a Line of Business or LOB in specific one hour.

**User**: 1, 2, 3, 4, 5, 6 **Visits**: 5, 17, 11, 8, 14, 5

The user behavior of another user or we can say User # 7, we need to predict his/her behavior of visiting one specific website, we will be using the statistical technique, which is called "Mean", which is the adding all visits by first randomly selected users, which becomes to total visits to be divided by the total number of users, which we can also say as Mean (Visits) = 60/6 = 10. This is the prediction we can do in terms of best estimate for user # 7 to visit the same site. This can also be considered Internal LOB Forensics of User Behavior. This can also be called the Measure of Variability. Let us now find the distance between our data on the good fit that we got after calculating the mean, which is 10 for users usage deviations ( Mean - Visit):

**Residuals(Error)**: -5, 7, 1, -2, 4, -5 {Let us add all + & - = -5-2-5 = -12 & 7+1+4 = 12}

This means -12+12 = 0, our value is most likely the value of the next user's visit to the website, we have chosen the sample for. Let us now do a Sum of Squared Residuals or Errors, which is 120. This entire example is based on one dependent variable only, which is the visits of one specific website by some users with in a Line of Business. This predictive analytic discussion has introduced the idea of usage of a website, by some users using **Simple Linear Regression**. We certainly can explore more, if we know, the time users have spent on that specific website or the number of pages the visited, in this case, we now can have both Independent and Dependent Variables available for us to work with to have our prediction on a better note.

Linear Regression is a continuity of **Correlation** and **Anova**. While working with Correlation we work with two variables as we discussed in this article X & Y, and there are points plotted on these X & Y on a graph.There is a relationship that we have explored between these plotted points. We can also say that the value of one variable is the function of another variable. It can also be shown as:

**y = f(x) { **the value of y is a function of x **}**

The value of dependent variable **y** is always dependent on the value of dependent variable **x**.

It is hoped that this article sheds some light on the basic use investigative forensics within a department or a Line of Business within an organization, which may be looking at the internal users' behavior to serve some clients using one single resource.

*Originally posted on LinkedIn*

- Dremio accelerates data lake operations with Dart Initiative
- Bundesliga delivering insight to fans via AWS
- How 5G will augment Wi-Fi in 3 industries
- AI, new skills and self-defense code emerge as app-dev musts
- Q&A: Inside data catalog vendor Alation's $110M in funding
- Cribl aims to ease data observability with LogStream update
- AI capabilities a target for merger and acquisition activity
- L'Oréal to 'revolutionize' beauty services using Google Vertex
- Evolution of analytics sped up by pandemic
- Is it legal to record virtual meetings and video conferences?

Posted 7 June 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central