Softmax
Softmax is an activation function that computes a probability-like output for logistic outputs. Generally given in the form
The softmax function with temperature that you're asking about can be expressed in the following form:
Where: - p_i is the probability for the i-th class - z_i is the i-th logit (input to softmax) - \theta is the temperature parameter - K is the number of classes
Key points about this equation:
-
The temperature parameter \theta appears in both the numerator and denominator, dividing each logit z_i.
-
As \theta approaches 0, the distribution becomes more peaked (harder), concentrating most of the probability mass on the largest logit.
- As T increases, the distribution becomes more uniform (softer), spreading probability mass more evenly across all classes.
- When T = 1, this reduces to the standard softmax function.
- The term e^{z_i/T} is equivalent to (e^{z_i})^{1/T}, which shows how the temperature acts as an exponent to the exponential term.
By tuning the temperature parameter, you can effectively control the "peakiness" or "softness" of the output probability distribution, which can be crucial for various machine learning applications.
Key properties of the softmax function include:
-
Normalization: The output values sum to 1, making them interpretable as probabilities.
-
Exponentiation: The use of e^{z_i} ensures all outputs are positive.
-
Relative scale: Larger input values result in larger probabilities, while preserving the relative ordering of the inputs.
-
Non-linear transformation: The softmax function introduces non-linearity, which is crucial for modeling complex relationships in neural networks.
But is softmax int he present form completely apparent. Here is some research indiating that some modifications may be preferred.