In the last post Aliens and Self-Information we derived the formula for self-information
The self-information I(x) of an event x with probability p(x) is defined as:
I(x)=−logp(x)
It quantifies how surprising or informative an event is. Less probable events carry more information.
Entropy H(X) measures the average uncertainty or information content in a probability distribution p(x) over all possible outcomes of a random variable X. It is defined as the expected value of self-information:
H(X)=E[I(x)]
Substituting I(x)=−logp(x):
H(X)=x∈X∑p(x)⋅I(x)
H(X)=−x∈X∑p(x)logp(x)
Interpretation
The units depend on the base of the logarithm:
- Base 2: Bits (common in information theory).
- Base e: Nats (used in natural sciences).
H(X) is maximized when all outcomes are equally likely (p(x)=∣X∣1).
Intuition
- Self-information measures the surprise of an individual event.
- Entropy aggregates this measure, weighted by the probability of each event. Events with higher probabilities contribute less to entropy because they are less surprising.
Examples:
- If p(x)=1 (certainty), I(x)=0, and H(X)=0.
- If p(x) is uniform, H(X) is maximized because uncertainty is highest.
This derivation links the concept of individual surprise (self-information) to the broader idea of total uncertainty in a system (entropy).