Neural networks use activation functions to determine whether each node in the network should be activated or not, returning a non-zero values for the output of the layer. Given a certain set of inputs, the neuron output is either activated or not depending on the type of activation function, and whether the input reaches a certain criteria to return a non-zero value, thus allowing the neural network to find the best combination of the model to reduce the error on the predicted values of the final output.
Non-linear activation functions are often better suited to neural networks than linear ones. However, linear ridge activation functions have their uses, such as linear and ReLu, where the slope of the line doesn’t trend up until after a period of 0 values, thus only causing an activation after a certain threshold.
The most common activation functions used in neural networks are the Sigmoid and TanH functions. The sigmoid function is kind of S shape, while the TanH function is similarly S shaped but much flatter. The sigmoid function allows for linear regression to normalize between a range of [0, 1]. As a matter of fact, most activation functions have this property. The sigmoid shape of the activation function allows for more saturation at higher values, and less activations at lower values. With TanH, the same effect occurs although with fewe left unactivated at the lower bound. The sigmoid function is often used in logistic regression, activating linear regression to binary outcomes.
The softmax activation function is an interesting one, as it allows for generalization to multiple classes. This is used in multi-class classification problems where there are more than two possible outcomes or classes. It is similar to logistic regression, but instead of using the sigmoid activation function, you’d take a soft version of the max, instead of the hard version of argmax (0 or 1), softmax allows for a softer mapping of the values based on their significance, between [0, 1].