The Newton method is obtained by replacing the Direction matrix in the steepest decent update equation by inverse of the Hessian. The steepest decent algorithm,
where theta is the vector of independent parameters, D is the direction matrix and g represents the gradient of the cost functional I(theta) not shown in the equation.
The gradient decent is very slow. For convex cost functionals a faster method is the Newtons method given below:
Above equation for Newtons method Becomes,
where H is the hessian
If the first and second derivatives of a function exist then strict convexity implies that the Hessian matrix is positive definite and vice versa.
Drawback of Newton method:
To prevent these problems several modifications that approximate the hessian and its inverse have been developed
