# Deducer Tutorial: Creating Linear Model using R Deducer Package

Linear Model better known as linear regression is one of the most common and flexible analysis framework to identify relationship between two or more variables. The widely used linear model is represented by drawing the best fit line through a series of data points represented on a scatter plot.

For any budding business analyst this should be the starting point to understand how model works at the very core of its design.

Selecting the Variables in Deducer GUI:

1. Outcome Variable: Y or the dependent variable should be put in this list
2. As Numeric: Independent variable that should be treated as covariates should be put in this section. Deducer automatically converts a factor into a numeric variable, so make sure that the order of the factor level is correct
3. As Factor: Categorical independent variables (Language, Ethnicity etc.)
4. Weights: This option allows the users to apply sampling weights to the regression model
5. Subset: Helps to define if the analysis needs to be done within a subset of the whole data set Note: Only one outcome is allowed. It can also be transformed by double clicking on it. Example Log transform weight for the analysis, can be changed to log(weight).

Model Tab

The users can add terms to the model by selecting one of more variables from the variable list. • 2-way Add all two way and lower interactions between the selected variables.
• 3-way Add all three way and lower interactions between the selected variables.
• + Add main effects for all the selected variables
• : Add interaction between selected models
• * Add interaction in between the selected terms, as well as any lower order interactions with them
• Remove Term
• Poly Add orthogonal polynomial terms to the model

Exploring the Model

Post Model creation, using this tab the features of the model can be explored. The preview panel displays a preview of what will be displayed in the console when the model is run. In the upper left hand portion of the dialog there are icons representing the assumptions that are being made by the model.

1. Option: This controls the main tests and diagnostic summaries of the model
1. ANOVA Table
2. Summary Table
3. Unequal Variance
4. Diagnostics – VIF (Variance Inflation Factors), Influence Summary
2. Post Hoc: Helps to compare between the levels of factors
1. Post Hoc: The factors for which it should be calculated
2. Type: Comparison Type. Example Tukey does all the pairwise comparisons
3. Estimate CI: Should confidence intervals be calculated
4. Corrections: Correct the p-values and CI if the factor has >2 levels
3. Tests: Customer hypothesis test based on the model parameters
4. Plots: Visualize the marginal effects of the model
1. Point wise intervals: Plot point wise CI
2. Y-axis labels: labels for the y-axis plots
3. Multiple Lines Per Panel: If the effect is an interaction effect, this option decides if the interaction should be plotted on multiple lines with in the same panel or as separate panels
4. Rug: small lines on the x axis denoting the data distributions
5. # of Levels: Number of levels for which the effect should be calculated
5. Means (Marginal Means): Just like the effects plots, the marginal means are the estimated means based on the model’s outcome variable across the levels of a terms given the other terms are static or at the typical level.
6. Export: Linear model export allows its users to export number of relevant variables related to the model

Diagnostic Tab (Top of the preview window)

This panel contains 6 plots evaluating the outlier, influence and equality of variance The above two plots show the distribution of the residuals and ideally these should be normal.

Resdidual vs. Fitted: Shows the residuals of the model plotted against the predicted values. If the red line is not flat, then the model may have significant non-linearity.

Scale Location: Plots the predicted values vs. the square root of the standardized residuals. Also, known as Spread vs. Level

Cooks Distance: Linear model is sensitive to outliers that can unduly influence the results of the model. Therefore, the cooks distance helps the analysts to identify observations with Cook’s values that are greater than 1.

Residuals vs. Leverage: Another plot to examine outliers and influence

Term Plots: Also known as Component or Partial Residual Plots For models without interactions, component residual plots are given. These can be used to examine the linearity of the relationship in between the predictor and outcome variables.

• For numeric variables a scatter plot is produced
• For factors a box plot is generated