.

*This article was written by Alberto Quesada.*

* *

The procedure used to carry out the learning process in a neural network is called the optimization algorithm (or optimizer). There are many different optimization algorithms. All have different characteristics and performance in terms of memory requirements, speed and precision.

**Problem formulation**

The learning problem is formulated in terms of the minimization of a loss index, f. This is a function which measures the performance of a neural network on a data set.

The loss index is, in general, composed of an error and a regularization terms. The error term evaluates how a neural network fits the data set. The regularization term is used to prevent overfitting, by controlling the effective complexity of the neural network.

The loss function depends on the adaptive parameters (biases and synaptic weights) in the neural network. We can conveniently group them together into a single n-dimensional weight vector w.

The problem of minimizing continuous and differentiable functions of many variables has been widely studied. Many of the conventional approaches to this problem are directly applicable to that of training neural networks.

**One-dimensional optimization**

Although the loss function depends on many parameters, one-dimensional optimization methods are of great importance here. Indeed, they are very often used in the training process of a neural network.

Certainly, many training algorithms first compute a training direction d and then a training rate η; that minimizes the loss in that direction, f(η).

In this regard, one-dimensional optimization methods search for the minimum of a given one-dimensional function. Some of the algorithms which are widely used are the golden section method and Brent's method. Both reduce the bracket of a minimum until the distance between the two outer points in the bracket is less than a defined tolerance.

** **

**Multidimensional optimization**

The learning problem for neural networks is formulated as searching of a parameter vector w∗ at which the loss function f takes a minimum value. The necessary condition states that if the neural network is at a minimum of the loss function, then the gradient is the zero vector.

The loss function is, in general, a non-linear function of the parameters. As a consequence, it is not possible to find closed training algorithms for the minima. Instead, we consider a search through the parameter space consisting of a succession of steps. At each step, the loss will decrease by adjusting the neural network parameters.

In this way, to train a neural network we start with some parameter vector (often chosen at random). Then, we generate a sequence of parameters, so that the loss function is reduced at each iteration of the algorithm. The change of loss between two steps is called the loss decrement. The training algorithm stops when a specified condition, or stopping criterion, is satisfied.

These are the main training algorithms for neural networks:

- Gradient descent
- Newton method
- Conjugate gradient
- Quasi-Newton method
- Levenberg-Marquardt algorithm

* *

*To read the whole article, with formulas and illustrations, click here.*

- 10 Scrum benefits and how to realize them
- 5 proven patterns for resilient software architecture design
- Microsoft's acquisition of Nuance clears antitrust hurdle
- The right to disconnect vs. America's always-on culture
- The enterprise advantages of automated data collection
- Today's blockchain use cases and industry applications
- Salesforce Industries rolls out banking AI compliance tools
- DataRobot integrates AI modeling tools with Snowflake
- M&A wave rising in hot AI market
- AWS analytics helps nonprofit fight climate change

Posted 17 June 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central