Nonlinear Regression and Generalized Linear Models:
Regression is nonlinear when at least one of its parameters appears nonlinearly. It commonly sorts and analyzes data of various industries like retail and banking sectors. It also helps to draw conclusions and predict future trends on the basis of user’s activities on the net.
The nonlinear regression analysis is the process of building a nonlinear function. On the basis of independent variables, this process predicts the outcome of a dependent variable with the help of model parameters that depend on the degree of relationship among variables.
Generalized linear models (GLMs) calculates nonlinear regression when the variance in sample data is not constant or when errors are not normally distributed.
Generalized Linear Model commonly applies to the following types of regressions when:
In statistics, logistic regression is one of the most commonly used of nonlinear regression. It is used to estimate the probability of an event based on one or more independent variables. Logistic regression identifies the relationships between the enumerated variables and independent variablesusing the probability theory.
A variable is said to be enumerated if it can possess only one value from a given set of values.
Logistic regression models are generally used in cases when the rate of growth does not remain constant over a period of time. For example when a new technology is introduced in the market, firstly its demand increases at a faster rate but then gradually slows down.
Logistic Regression Types:
Logistic Regression is defined using logit() function:
f(x) = logit(x) = log(x/(1x))
Suppose p(x) represents the probability of the occurrence of an event, such as diabetes, on the basis of an independent variable, such as age of a person. The probability p(x) will be given as follows:
P(x)=exp(β0+ β1x1 )/(1+ exp(β0+ β1x1)))
Here β is a regression coefficient.
On the taking of logit of above equation, we get:
logit(P(x))=log(1/(1P(x)))
On solving the above equation, we get:
logit(P(x))=β0+ β1x1
The logistic function that is represented by an Sshaped curve is known as theSigmoid Function.
When a new technology comes in the market, usually its demand increases at a fast rate in the first few months and then gradually slows down over a period of time. This is an example of logistic regression. Logistic regression models are generally used in cases where the rate of growth does not remain constant over a period of time.
Multivariate logit() Function
In case of multiple predictor variables, following equation represent logistic function:
p = exp(β0+ β1x1+ β2x2+—– βnxn)/(1+exp(β0+ β1x1+ β2x2+…+βnxn))
Here, p is the expected probability; x1,x2,x3,…,xn are independent variables; and β0, β1, β2,…βn are the regression coefficients.
Estimating β Coefficients manually is an errorprone and timeconsuming process, as it involves lots of complex and lengthy calculations. Therefore, such estimates are generally made by using the sophisticated statistical software.
β coefficients need to be calculated in statistical analysis. For this, use the following steps:
Interaction is a relationship among three or more variables to specify the simultaneous effect of 2 or more interacting variables on a dependent variable. We can calculate the logistic regression with interacting variables, i.e. three or more variables in relation where two or more independent variables affect the dependent variable.
In logistic regression, an enumerated variable can have an order but it cannot have magnitude. This makes arrays unsuitable for storing enumerated variables because arrays possess both order and magnitude. Thus, enumerated variables are stored by using dummy or indicator variables. These dummy or indicator variables can have two values: 0 or 1.
After developing a Logistic Regression Model, you have to check its accuracy for predictions. Some of the Useful Logistic Regression Model Adequacy Checking Techniques are as below:
Fitting the Regression Model
1

model<glm(occupied~resources,binomial) 
It fits a logistic regression model for a given set of values. Here binomial parameter specifies the type of error distribution and is associated with situations involving 2 outcomes.
Drawing Logistic Regression Line
1

lines(xv,yv) 
Now to draw regression line for the model to check the validity of the prediction, let’s cut the ranked values on the xaxis is into 5 categories and then work out the mean and standard error of the proportions in each group.
Dividing the Range of Given Values
1

cutr<cut(resources,5) 
It divides the range of values in 5 sections.
Calculating the Probabilities for Divided Range
Calculate the actual probabilities for the divided range by using the following command:
1
2
3

probs< as .vector(probs) resmeans<tapply(resources,cutr,mean) resmeans< as .vector(resmeans) 
Plotting the Generated Points
1

points(resmeans,probs,pch=16,cex=2) 
It is used to plot the generated points for logistic regression line.
Statistical interpretation is performed by government agencies and business organizations to draw inferences and derive conclusions on the basis of research data. Such conclusions help organizations analyze the efficacy of current measures as well as decide future trends for further growth and profit.
Regression lines for models are generated on the basis of the parameter values that appear in the regression model. So first you need to estimate the parameters for the regression model. Parameter estimation is used to improve the accuracy of linear and nonlinear statistical models.
The process of estimating the parameters of a regression model is called as Maximum Likelihood Estimation (MLE).
We can estimate the parameters in any of the following ways:
The presence of bias while collecting data for parameter estimation might lead to uneven and misleading results. Bias can occur while selecting the sample or collecting the data.
Linear least square method fits data points of a model in a straight line. However, in many cases, data points form a curve.
Nonlinear models are sometimes fitted into linear models by using certain techniques as linear models are easy to use. Consider the following equation which is a nonlinear equation for exponential growth rate:
y=cebxu
Here b is the growth rate while u is the random error term and c is a constant.
We can plot a graph of above equation by using linear regression method. Use the following steps to transform the above nonlinear equation into a linear equation, as follows:
There are several models for specifying the relationship between y and x and estimate the parameters and standard errors of parameters of a specific nonlinear equation from data.
Some of the most frequently appearing nonlinear regression models are:
We can change the working directory R, as follows:
a) MichaelisMenten
y=ax/(1+bx)
b) 2parameter asymptotic exponential
y=a(1−ebx)
c) 3parameter asymptotic exponential
y=a−be−cx
Below are few Sshaped Functions:
d) 2parameter logistic
y=( ea+bx)/(1+ea+bx)
e) 3paramerter logistic
y=a/(1+be−cx)
f) 3parameter asymptotic exponential
y=a/(1+be−cx)
g) Weibull
y=a be(cx2)
Below are few humped Curves:
h) Ricker curve
y=axe−bx
i) Firstorder compartment
y=kexp(−exp(a)x)−exp(−exp(b)x)
j) Bellshaped
y=a exp(−bx2)
k) Biexponential
y=aebx −ce−dx
The accuracy of a statistical interpretation largely depends on the correctness of the statistical model on which it depends.
The following are the most common statistical models:
An example of nonlinear regression: This example is based on the relationship between jaw bone length and age in deers.
a) Reading the Dataset from jaws.txt file; Path of the file acts as an argument.
1

deer<read.table( "c:\\temp\\jaws.txt" ,header=T) 
b) Fitting the Model – Nonlinear equation is an argument in nls() command with starting values of a, b and c parameters. The Result goes in the model object.
1

model<nls(bone~ab* exp (c*age),start=list(a=120,b=110,c=0.064)) 
c) Displaying Information about model Object using the summary() command. Model object is an argument to the summary() command as shown below:
1

summary(model) 
d) Fitting a Simpler Model
y=a (1−e−cx)
e) Applying nls() Command to the New Model for modified regression model. The result goes in the model2 object.
1

model2<nls(bone~a*(1 exp (c*age)),start=list(a=120,c=0.064)) 
f) Comparing the Models as below – Use Anova() command to compare result objectsmodel1 and model2. These objects then act as arguments to anova()command.
1

anova(model,model2) 
g) Fitting the Logistic Regression Line – Generate the curve by passing av and bv objects to the lines() command.
1

lines(av,bv) 
h) Viewing the Components of the New Model2 as below:
1

summary(model2) 
Sometimes we can see that the relationship between y and x is nonlinear but we don’t have any theory or any mechanistic model to suggest a particular functional form (mathematical equation) to describe the relationship. In such circumstances, Generalized Additive Models (GAMs) are particularly useful because they fit a nonparametric curve to the data without requiring us to specify any particular mathematical model to describe the nonlinearity.
GAMs are useful because they allow you to identify the relationship between y and x without choosing a particular parametric form. Generalized additive models implemented in R by the function gam() command.
The gam() command has many of the attributes of both glm() and lm(), and the output can be modified using update() command. You can use all of the familiar methods such as print, plot, summary, anova, predict, and fitted after a GAM has been fitted to data. The gam function is available in the mgcv library.
In nonlinear regression analysis, nonlinear least squares method becomes insufficient because the initial guesses by users for the starting parameter values may be wrong. The simplest solution is to use R’s selfstarting models.
Selfstarting models work out the starting values automatically and nonlinear regression analysis make use of this to overcome the chances of the initial guesses, which the user tends to make, being wrong.
Some of the most frequently used selfstarting functions are:
a) MichaelisMenten model(SSmicmen)
R has a selfstarting version called SSmicmen that is as follows:
y=ax/(b+x)
Here, a and b are two parameters, indicating the asymptotic value of y and x (value at which we get half of the maximum response a/2) respectively.
b) Asymptotic regression model (SSasymp)
Below gives the selfstarting version of asymptotic regression model:
3 parameter asymptotic exponential equation can be as:
y=a−be−cx
Here, a is a horizontal asymptote, b=aR0 where R0 is the intercept (response when x is 0), and c is rate constant.
c) Four parameter logistic model (SSfpl)
y=A+(BA)/(1+e(Dx)/c)
Here, A is horizontal asymptote on left (for low values of x), B is horizontal asymptote on right (for large values of x), D is the value of x at the point of inflection of the curve, and c is a numeric scale parameter on the Xaxis. It gives the selfstarting version of fourparameter logistic regression.
d) SelfStarting FirstOrder Compartment Function (SSfol)
This function is given as follows:
y=k exp(−exp(a)x)−exp(−exp(b)x)
Here, k=Dose*exp(a+b−c)/(exp(b) exp(a)) and Dose is a vector of identical values provided to the fit. It gives the selfstarting version of firstorder compartment function.
e) SelfStarting Weibull Growth Function (SSweibull)
R’s parameterization of the Weibull growth function is as follows:
AsymDrop*exp(exp(lrc)*x^pwr)
It gives the selfstarting version of Weibull growth function.
Here, Asym is the horizontal asymptote on the right
Drop is the difference between the asymptote and the intercept (the value of y at x=0)
lrc is the natural logarithm of the rate constant
pwr is the power to which x is raised.
When performing nonlinear regression analysis, many times we have only 1 sample data that is not sufficient. In this case, we need to create new sample data by using the existing sample.
Bootstrapping is the method of creating new samples from the existing sample datasets. It finds the following two broad applications to the parameter estimation in nonlinear models:
Code for Bootstrapping Nonlinear Regression
1

> bv<numeric(1000) 
It creates bv vector with a capacity of storing 1000 values.
1

> cv<numeric(1000) 
It creates cv vector with a capacity of storing 1000 values.
1
2
3
4
5
6
7
8
9
10

> for (i in 1:1000) //Creates for loop that will execute 1000 times. > { > ss<sample(1:23, replace=T) // Samples the indices of 23 cases at random with replacement. > y< Time[ss] // Stores value in y variable located at ss indices in time vector > x1<Viscosity[ss] // Stores value in x1 variable located at ss indices in Viscosity vector > x2<Wt[ss] // Stores value in x2 variable located at ss indices in Wt vector > model<nls(y~b*x1/(x2c), start=list(b=29, c=2)) //Models regression analysis equation with starting values of b and c > bv[i]<coef(model)[1] //Stores value of b coefficient in bv vector at I indices. > cv[i]<coef(model)[2] //Stores value of b coefficient in cv vector at I indices and close the for loop } 
By using the code as above, you can generate 1000 different samples of the given data. The idea is very simple. You can have a single sample of n measurements but you can sample this in many ways so long as you allow some values to appear more than once and other samples to be left out.
Logistic regression is the most commonly used form of regression analysis in real life. As a result they are quite useful for classifying new cases into one of the two outcome categories.
A few applications, for example, are as follows:
© 2020 Data Science Central ® Powered by
Badges  Report an Issue  Privacy Policy  Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 20082014  20152016  20172019  Book 1  Book 2  More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central