Proof that Mutual Information Is Symmetric

The mutual information (MI) of two random variables quantifies how much information (in bits or nats) is obtained about one random variable by observing the other. I discuss MI and show it is symmetric.

The mutual information (MI) between two random variables captures how much information entropy is obtained about one random variable by observing the other. Since that definition does not specify which is the observed random variable, we might suspect this is a symmetric quantity. In fact, it is; and I claimed that without proof in my previous post. The goal of this post is to show why this definition is indeed symmetric. The proof will highlight a useful interpretation of MI.

Let and be continuous random variables with densities and respectively. The MI of and is

In the last line of Eq. , “KL” refers to the KL divergence. Since the KL divergence is non-negative, mutual information is also non-negative. Furthermore, the mutual information is zero if and only if and are independent. This makes intuitive sense: if two random variables are independent, observing one tells you nothing about the other.

Finally, it’s pretty easy to see that we can simply reverse our calculations to get the mutual information of and :

It’s easy to see that these derivations hold if and are both discrete. The tricky case is if is discrete and is continuous. Certainly, the derivations still work if we can interchange integrals and sums, which is true for finite sums. However, when the sums are infinite, we are effectively interchanging limits and integration. I don’t know enough measure theory to know when this is possible; but my very loose understanding is that the main value of Lebesgue integration is that it is easier to know when such regularity conditions hold.