Deep Learning Networks: Advantages of ReLU over Sigmoid Function

This was posted as a question on StackExchange. The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages? I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

• Sigmoid: not blowing up activation
• Relu : not vanishing gradient
• Relu : More computationally efficient to compute than Sigmoid like functions since Relu just needs to pick max(0, x) and not perform expensive exponential operations as in Sigmoids
• Relu : In practice, networks with Relu tend to show better convergence performance than sigmoid. (Krizhevsky et al.)