Member-only story

All About Entropy

Severin Perez
22 min readFeb 1, 2025

--

In this article, we’re going to do a deep dive into the idea of entropy, one of the most important concepts in information theory, and a critical tool in text analytics. We’ll review how to calculate entropy, some nuances related to measuring entropy in texts, and of course, lots of code along the way. And if you’re interested in a quick and easy way to calculate the entropy of a text, you can always head over to the Lemmalytica and try out our text analytics tools.

What is Entropy?

Put simply, entropy is a measure of uncertainty. Imagine picking numbers from a lottery machine. If every ball in the machine had the number 4 written on it, then you would know with 100% certainty that no matter which ball you pick, it would be the number 4. In other words, the uncertainty would be zero, and the entropy would also be zero. As you add more balls, with different numbers, and in different quantities, the uncertainty goes up. And so does the entropy. To use slightly more technical terms, entropy measures the uncertainty in a probability distribution. If you make a random observation of the distribution, entropy tells you how likely you are to be able to guess the outcome ahead of time. With low entropy, it's very easy to guess (as with the lottery machine that only has 4's). With high entropy, it's very difficult.

--

--

Responses (7)