We have witnessed the rise of Key & Value pair, since the emergence of Big Data. We certainly can explore the relationship of such two variables in terms of X & Y, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:

Independent Variables & Dependent Variables

Let us take behavior of users of a financial institution. We take a hypothetical data (random sample) of 6 users visiting one specific website of a Line of Business or LOB in specific one hour.

User: 1, 2, 3, 4, 5, 6 Visits: 5, 17, 11, 8, 14, 5

The user behavior of another user or we can say User # 7, we need to predict his/her behavior of visiting one specific website, we will be using the statistical technique, which is called "Mean", which is the adding all visits by first randomly selected users, which becomes to total visits to be divided by the total number of users, which we can also say as Mean (Visits) = 60/6 = 10. This is the prediction we can do in terms of best estimate for user # 7 to visit the same site. This can also be considered Internal LOB Forensics of User Behavior. This can also be called the Measure of Variability. Let us now find the distance between our data on the good fit that we got after calculating the mean, which is 10 for users usage deviations ( Mean - Visit):

Residuals(Error): -5, 7, 1, -2, 4, -5 {Let us add all + & - = -5-2-5 = -12 & 7+1+4 = 12}

This means -12+12 = 0, our value is most likely the value of the next user's visit to the website, we have chosen the sample for. Let us now do a Sum of Squared Residuals or Errors, which is 120. This entire example is based on one dependent variable only, which is the visits of one specific website by some users with in a Line of Business. This predictive analytic discussion has introduced the idea of usage of a website, by some users using Simple Linear Regression. We certainly can explore more, if we know, the time users have spent on that specific website or the number of pages the visited, in this case, we now can have both Independent and Dependent Variables available for us to work with to have our prediction on a better note.

Linear Regression is a continuity of Correlation and Anova. While working with Correlation we work with two variables as we discussed in this article X & Y, and there are points plotted on these X & Y on a graph.There is a relationship that we have explored between these plotted points. We can also say that the value of one variable is the function of another variable. It can also be shown as:

y = f(x) { the value of y is a function of x }

The value of dependent variable y is always dependent on the value of dependent variable x.

It is hoped that this article sheds some light on the basic use investigative forensics within a department or a Line of Business within an organization, which may be looking at the internal users' behavior to serve some clients using one single resource.

Originally posted on LinkedIn

Views: 3024

Tags: Analytics, Big, Data, Regression


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service