Aliens and Self-Information

I wrote a little story for myself while reading up on Information Content. What follow was written just to help give myself an intuitive understanding of textbook information theory.

I landed on a friendly alien planet. They sent a delegation to come inside my large spaceship. I knew my spaceship well, so I was familiar with the height of the doorway.

The first ten aliens entered my ship, passing through the door without needing to duck, and their heads came close to the top of the doorframe.

“Fine,” I thought to myself. “Adult aliens seem to be about one doorway tall.”
We know human height follows a normal distribution. I dared not assume the same for these aliens, but I started building a mental model of their height.

Then, a tall alien entered and needed to duck under the doorframe. I was surprised and updated my mental model with this new information:
“Aliens are, on average, as tall as my doorway, but they can be taller.”

More aliens came in, most around one doorway tall. I was surprised again when a shorter, yet clearly adult, alien entered, standing at half a doorway tall. Once more, I revised my understanding and updated my mental model:
“Aliens are, on average, as tall as my doorway, but they can be shorter or taller.”

As more aliens continued to enter my ship, I kept cataloging their heights, updating my mental model with each surprise. Each new alien entering through the doorway provided me with information about the distribution of alien heights.

How much information does each one offer?

Information and Surprise

Initially, most aliens seemed to be about a doorway tall. When their heights matched my expectations, they provided no surprises. These encounters felt predictable, and predictable events didn’t reveal much I didn’t already know.

But when the taller alien ducked under the door, I was surprised. That surprise carried information that my model needed updating.

I realized: information is tied to surprise. The more surprising an event, the more information it provides.

Linking Information to Probability

The taller alien was surprising because, based on my observations so far, tall aliens were rare. Conversely, when an alien was short, I wasn’t surprised at all because most aliens I’d seen matched this height.

I reasoned: the information I gain from an alien’s height is linked to how likely or unlikely that height is. If an alien’s height is common, it carries less information about the distribution of alien heights. If it’s rare, it carries more information.

Quantifying Information

To go further, I wondered if there was a way to quantify this “surprise.” How could I relate the probability of an event to the information it conveys?

I made some observations:

Information decreases as probability increases.
Rare events convey more information.
If an event is highly probable, it provides less new information.
If an event is certain, it provides no new information.
When probabilities multiply, the information adds up.

Logarithms and Information

Logarithms have the properties I needed:

They convert multiplication into addition.
They decrease as their argument increases, reflecting less information for more probable events.

I decided on a rule: the information gained from observing an event is tied to how unlikely the event is. Higher probabilities, meaning events that are more likely to happen, result in less information, while lower probabilities, indicating rarer events, provide more information.

My definition of self-information became:

I = -\log(P)

where:

$I$ is the information conveyed by an event.
$P$ is the probability of the event.

Conclusion

When the logarithm is Base 2 the unit of information is in bits. For Base e information is measured in nats. For Base information is measured in Hartleys (see Definition section for the Wikipedia page in the next link).

Observing these aliens gave me a quick, intuitive understanding of the link between probability and information, and it helped me derive the formula for self-information.