This was posted as a question on StackExchange. The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages? I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?
Below is the best answer.
Advantage:
Disadvantage:
Sigmoid: tend to vanish gradient (cause there is a mechanism to reduce the gradient as "a" increases, where "a" is the input of a sigmoid function. Gradient of Sigmoid: S′(a)=S(a)(1−S(a)). When "a" grows to infinite large, S′(a)=S(a)(1−S(a))=1×(1−1)=0.
Relu : tend to blow up activation (there is no mechanism to constrain the output of the neuron, as "a" itself is the output)
Read full discussion here.
DSC Resources
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central