The Probability of a Head

Karl Pearson. Also, blood.

My father-in-law recently sent me a quote he thought I’d enjoy:

“Mathematicians are like Frenchmen: whatever you say to them, they translate it into their own language, and forthwith it means something entirely different.”

- Johan Wolfgang von Goethe

And Goethe, as per usual, was spot on.

If a statistics degree has taught me anything it’s that most of the numbers we see day to day are being used against us – oftentimes incorrectly, or at least with a heavy dose of lying by omission. Sometimes their meanings are so radically misused that it makes me want to bash my head in.

Speaking of bashing in heads, let us discuss The Deadliest Warrior. Check out that segue.

The Deadliest Warrior, for those unacquainted, is a television series that attempts to fill in the blanks and then answer the question “Who would win in a fight between ______ and ______ ?”

Two types of warriors (for example: Apache and Gladiator or Spartan and Ninja) are chosen, and a group of men take sharp and/or heavy objects used by those types of warriors and proceed to cause as much damage as possible to a battery of unnecessarily-blood-filled target dummies made of materials that match, as closely as possible, human tissue and bone. That or pig carcasses. Numbers regarding their speed, force, accuracy, and kills per second are tallied and entered into a computer program – along with the subjective opinions of three unqualified individuals – which then simulates one thousand matches between the two fighters to see who wins.

After a season and a half, I have yet to see a woman on the program.

The show is acceptable as an entertaining waste of time, or gratifying if you really have it out for target dummies, but the problem with the show is that after 45 minutes of arguing over who is deadliest, and pretending to be as scientific and objective as possible, they sometimes don’t prove anything.

Goethe is right. When they say, “Check it out! The samurai beat the viking by winning 522 matches out of a thousand,” I translate it into my own language and instead hear: “At the alpha = .05 level, we do not have enough statistical evidence to prove that one warrior is deadlier than the other.”

They don’t maintain statistical accuracy because, as is assuredly obvious, it makes for quite an anti-climactic ending. I have the luxury, however, of not having entertainment as a goal.

The following is a bit mathy, but hopefully not frighteningly so.

If you flip a fair coin (meaning a coin that is balanced correctly such that you have a 50% chance of getting heads and a 50% chance of getting tails) 1000 times, you have a mere 2.5% chance of getting exactly 500 heads and 500 tails. This is intuitive to most people; randomness is a part of everyday life.

One thing that you can do with statistics is, given how many heads or tails you flipped, determine if the coin is fair; that is, figure out if the chances of getting heads or tails is 50%. This is akin to what they are attempting to do in The Deadliest Warrior: they have 1000 battles, record the victor of each, and then try and determine if one or the other warrior is more likely to win.

It is not possible to be exact. If you flip a coin 1000 times and get 510 heads, then the coin is probably a fair coin, but there’s always a chance that the coin is weighted in such a way that the true probability of getting heads is actually 51%. The trick is in determining, given that you got 510 heads, in what range of numbers the true probability of heads could be. If 50% is within that range then we doublenegatively can’t say that the true probability is not 50%; that is, it is possible to get 510 heads out of 1000 flips when flipping a fair coin.

Let’s take a look at the episode of The Deadliest Warrior that pitted the Green Berets versus the Spetsnaz. The Spetsnaz were declared the victors with a total of 519 kills out of 1000. So the question is: is 519 far enough away from 500 to say that the Spetsnaz have a better than 50% chance of winning? What is the true probability that the Spetsnaz get a head? Ahem.

In order to decide how likely it is to get 519 kills when the true probability of winning is 50%, we have to find a statistic that follows a pattern that we know. This pattern is called a distribution, and the distribution that we use is called the chi-squared distribution. Once we have this number we can use it to determine how likely we were to get 519 kills even if evenly matched.

One statistic is called the Wald statistic. We take the probability of winning that we observed (.519), and subtract from that the probability we’re testing it against (.5). We divide that by the standard error that we calculate assuming that the true probability of winning is exactly the same as what we saw: that the Spetsnaz are more likely to win. That number follows a pattern we call the normal distribution, but if we square the whole thing it follows a chi-squared distribution. It looks like this:

Wald Statistic

The score statistic looks almost exactly the same except that we calculate the standard error under the assumption that the probability of winning is equal or 50%. It looks like this:

Score Statistic

The last method of getting this number is called the likelihood ratio, which is a comparison of how likely we were to get 519 kills when the true probability of winning is 50% compared to when it is 51.9%. If we take the natural logarithm of that number and multiply it by -2, we get yet another number that follows the chi-squared distribution. The likelihood ratio statistic looks like this:

Likelihood Ratio Statistic

So all three methods agree that the relevant statistic is about 1.44. Because this number follows the chi-squared distribution, we can calculate that there is a 23% chance of getting this number or an even higher one. This number (.23) is called the p-value. If the p-value were lower, say, 5% or less, then that means there would have been a very small chance of the Spetsnaz winning as often as they did without actually being the deadliest warrior, and can conclude that they must have a better than 50% chance of winning. Since our p-value is much greater than .05 we can say that it is likely that they could win 519 times when their true chance of winning is 50%. In fact, it is possible that they could have had as low as a 48.8% chance of winning (and therefore be less deadly than the Green Berets), and still win 519 out of 1000 battles.

Put another way: you start with the idea that both warriors have an equal chance of winning (.5). And then you observe what actually happens. If what happens seems incredibly unlikely (say only a 5% chance of happening), you would give credence to the idea that, in fact, there isn’t an equal chance of winning. In this case, you start with the idea that both warriors have a 50% chance of winning, and observe that one warrior won 519 out of 1000 battles. Using the above calculations, we can see that this particular outcome isn’t a rare one; you can get outcomes like this (or more extreme than this) 23% of the time. This is not enough to convince us that there isn’t actually a 50% chance of winning.

So how many wins does a team need in order to show that their chance of winning was actually greater than the other teams’? Well, it depends on which of the three statistics you choose to use. We have to set up the formula to see at what point the p-value dips below .05. I won’t do the math, but, using the score statistic, it happens when one team wins at least 531 of the matches. Win 531 or more, and there’s enough evidence that your probability of winning is greater than 50%.

In the first two seasons there are a total of seven warriors that the show deemed deadliest without significant statistical evidence to back it up.

Sitting on the couch. Watching UFC title-winning champions punch dead pigs with spiked brass knuckles, Yakuza gang member descendants throw grenades into mannequin-filled rooms, and former secret special forces operatives shoot knives through blood-filled glass balls (you heard me), this math is what I translate it into.

That, and it makes me wonder who would win in a bloody fight to the death between Johan Wolfgang von Goethe and Karl Pearson.

Jack

Posted by Jack on Tue, 21 Jun 2011
tags: rant , non-boardgame .