Linear Model better known as linear regression is one of the most common and flexible analysis framework to identify relationship between two or more variables. The widely used linear model is represented by drawing the best fit line through a series of data points represented on a scatter plot.

For any budding business analyst this should be the starting point to understand how model works at the very core of its design.

Selecting the Variables in Deducer GUI:

**Outcome Variable:**Y or the dependent variable should be put in this list**As Numeric:**Independent variable that should be treated as covariates should be put in this section. Deducer automatically converts a factor into a numeric variable, so make sure that the order of the factor level is correct**As Factor:**Categorical independent variables (Language, Ethnicity etc.)**Weights:**This option allows the users to apply sampling weights to the regression model**Subset:**Helps to define if the analysis needs to be done within a subset of the whole data set

Note: Only one outcome is allowed. It can also be transformed by double clicking on it. Example Log transform weight for the analysis, can be changed to log(weight).

**Model Tab**

The users can add terms to the model by selecting one of more variables from the variable list.

**2-way**Add all two way and lower interactions between the selected variables.**3-way**Add all three way and lower interactions between the selected variables.**+**Add main effects for all the selected variables**:**Add interaction between selected models*****Add interaction in between the selected terms, as well as any lower order interactions with them**-**Remove Term**In**Add nested terms**Poly**Add orthogonal polynomial terms to the model

**Exploring the Model**

Post Model creation, using this tab the features of the model can be explored. The preview panel displays a preview of what will be displayed in the console when the model is run. In the upper left hand portion of the dialog there are icons representing the assumptions that are being made by the model.

**Option:**This controls the main tests and diagnostic summaries of the model- ANOVA Table
- Summary Table
- Unequal Variance
- Diagnostics – VIF (Variance Inflation Factors), Influence Summary

**Post Hoc:**Helps to compare between the levels of factors- Post Hoc: The factors for which it should be calculated
- Type: Comparison Type. Example Tukey does all the pairwise comparisons
- Estimate CI: Should confidence intervals be calculated
- Corrections: Correct the p-values and CI if the factor has >2 levels

**Tests:**Customer hypothesis test based on the model parameters**Plots:**Visualize the marginal effects of the model- Point wise intervals: Plot point wise CI
- Y-axis labels: labels for the y-axis plots
- Multiple Lines Per Panel: If the effect is an interaction effect, this option decides if the interaction should be plotted on multiple lines with in the same panel or as separate panels
- Rug: small lines on the x axis denoting the data distributions
- # of Levels: Number of levels for which the effect should be calculated

**Means (Marginal Means):**Just like the effects plots, the marginal means are the estimated means based on the model’s outcome variable across the levels of a terms given the other terms are static or at the typical level.**Export:**Linear model export allows its users to export number of relevant variables related to the model

**Diagnostic Tab** (Top of the preview window)

This panel contains 6 plots evaluating the outlier, influence and equality of variance

The above two plots show the distribution of the residuals and ideally these should be normal.

**Resdidual vs. Fitted:** Shows the residuals of the model plotted against the predicted values. If the red line is not flat, then the model may have significant non-linearity.

**Scale Location:** Plots the predicted values vs. the square root of the standardized residuals. Also, known as Spread vs. Level

**Cooks Distance:** Linear model is sensitive to outliers that can unduly influence the results of the model. Therefore, the cooks distance helps the analysts to identify observations with Cook’s values that are greater than 1.

**Residuals vs. Leverage:** Another plot to examine outliers and influence

**Term Plots:** Also known as Component or Partial Residual Plots

For models without interactions, component residual plots are given. These can be used to examine the linearity of the relationship in between the predictor and outcome variables.

- For numeric variables a scatter plot is produced
- For factors a box plot is generated

**Added Variable Plots**

Just like the term plots, added variable plots are used to examine the linearity of covariates. It is highly recommended when there are no term plots available.

In a nutshell Deducer is one of the most functional GUIs with the potential of mass appeal. The ease of use that Deducer offers to its users is second to none. Deducer continues to amaze everyone by accepting file formats for the leading statistical software like:

- Minitab
- SPSS
- SAS
- Dbase
- Excel

Being a Java based GUI it competes with its rivals like SAS and SPSS without compromising on the quality of output. Especially for businesses and individuals with tight budgets, Deducer can be deployed without spending hundreds and thousands of dollars.

Views: 2576

Tags: Analytics, Best, Cp, Data, Healthcare, Mallows, Pattern, Prediction, R-Sqaured, Recognition, More…Regression, analysis, subset

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**New Books and Resources for DSC Members** - [See Full List]

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central