price = -55089.98 + 87.34 engineSize + 60.93 horse power + 770.42 width
The model predicts or estimates price (target) as a function of engine size, horsepower, and width (predictors). The model has all the predictors as numeric values.
What if there are qualitative variables? How can the qualitative variables be used in enhancing the models? How are the qualitative variables interpreted?
These are the few questions this blog post will answer.
Fernando gets two such qualitative variables:
The data set looks like this.
Fernando wants to find out the impact these qualitative variables have on the price of the car.
Qualitative variables are variables that are not numerical. It fits the data into categories. They are also called as categorical variables or factors.
Factors have levels. Levels are nothing but unique values of the specific qualitative variables.
Let us look at an example. The sample data has 5 cars, and each car has a diesel or gas fuel type.
The fuel type is a qualitative variable. It has two levels (diesel or gas). The statistical package creates one dummy variable. It creates a dummy variable named fuelTypegas. This variable takes 0 or 1 value. If the fuel type is gas, then the dummy variable is 1 else it is 0.
Mathematically, it can be written as:
The number of dummy variables created by the regression model is one less than the number factor values in the qualitative variables.
Let us examine how does it manifest in a regression model. A simple regression model with only price and fuel type as input provides the following coefficients:
It says the following:
The way qualitative variables with two-factor levels is treated is clear. How about variables with more than two levels? Let us examine another example to understand it.
The drive wheel is a qualitative variable with three factors. In this case, the regression model creates two dummy variables. Let us look at an example. The sample data has 4 cars.
Two dummy variables are created:
Mathematically, it can be written as:
Note that there is no dummy variable for 4WD.
How do they manifest in the regression model? The way the regression model treats them is as follows:
All the qualitative variables with more than two-factor values are treated similarly.
Now that the mechanics of the treatment of qualitative variables. Let us see how does Fernando apply it to his model. His original model was the following:
price = -55089.98 + 87.34 engineSize + 60.93 horse power + 770.42 width
He adds two more qualitative variables into the mix. Fuel type and wheel drive. The general form of the model is written as:
price = β0 + β1.engineSize + β2.horsePower + β3.width + β4.fuelTypegas +β5.driveWheelsfwd + β6.driveWheelsrwd.
Fernando trains the model in his statistical package and gets the following coefficients.
The equation of the model is:
price = -76404.83 + 57.20 * engineSize + 23.72 * horsePower + 1214.42 * width – 1381.47 * fuelTypegas -344.62 * driveWheelsfwd + 2189.16 * driveWheelsrwd
Here there is a mix of quantitative and qualitative variables. The variables are independent of each other. Let us now interpret the coefficients:
This model is not better than the original model created. However, it has done its job. We understand the way qualitative variables are interpreted in a regression model. It is evident that the original model with horsepower, engine size and width is better. However, he wonders: horsepower, engine size, and width are treated independently.
What if there are relations between horsepower, engine size and width? Can these relationships be modeled?
The next blog post of this series will address these questions. It will explain the concept of interactions.
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central