How to Crack a Cheating Scandal with 1 Line of Code
Cheating is as old as humanity itself. It manages to creep into every corner of life, even into something as simple as a coin flip.
Suppose your friend Dave has a coin, and he makes you a bet: if he flips the coin and it lands on tails, you win $5, but if it lands on heads, you lose $5. You take the bet. The coin lands on heads. Dave wins. You take the bet again. It lands on heads again. You keep taking the bet a dozen times, and you get really suspicious of the coin. The coin doesn’t always land on heads, but it lands on heads way more than you think it should. Out of 20 flips, it lands on heads 18 times. Bullshit! You suspect Dave of cheating, but how do you know he’s not just really lucky?
To bystanders, this bet between you and Dave may not seem like a big deal. But cheating scandals are real, and the consequences are often much bigger. In 2009, the Atlanta Journal-Constitution accused the Atlanta Public School district of modifying students’ test scores on the Georgia state standardized tests (called the CRCT). Over the next 6 years, the district was engulfed in what the New York Times described as “The largest cheating scandal in the nation’s history,” finally ending with the conviction of 11 people in 2015. Throughout the scandal, many school officials and teachers insisted that they did not cheat. There was still that lurking question: how lucky is too lucky?
I’ll be honest with you, I don’t think Dave got lucky. I think Dave cheated. So I’m gonna help you get your money back. To help you understand what’s going on here, we will mainly go over three things:
- How to understand binomial distributions
- How cheating scandals and binomial distributions are related
- How to simulate a binomial distribution (and something called the CDF) in R
We’ll see that the technique used to help you can also be applied to cases like the Atlanta cheating scandal and elsewhere. You can’t see it, but I just put on a Sherlock Holmes hat¹. Let’s see how “lucky” Dave really is.
What’s a Binomial Distribution?
Okay, we’ll come back to your coin-flip fiasco in a moment, but first we need to talk about statistics.
If you flip a fair coin, the chance of the coin landing on heads is 50% (or maybe not?). Let’s call one flip a trial, and if the coin lands on heads let’s call that a success. If you flip a fair coin 10 times, you will probably get 5 heads and 5 tails. In other words, you are most likely to get 5 successes after 10 trials with a 50% success rate.
But a coin doesn’t have to have a 50% chance of landing on heads; it could be a weighted coin. Maybe you have a coin with only a 10% chance of landing on heads. If you flip that coin 10 times, you will probably get 1 head and 9 tails. In other words, you will probably get 1 success from 10 trials with a 10% success rate.
Doing a bunch of coin flips like this is an example of a binomial distribution. In statistics, a distribution is a way of modeling random outcomes. The idea is that there are some outcomes (in this case 0 heads, 1 heads, 2 heads, and so on all the way up to 10 heads) and probabilities associated with each outcome. A binomial distribution has a chance of success (in decimal form) called p and a number of trials called n. It’s sometimes written as B(n,p). As we will see, binomial distributions can be applied in many places besides coin flipping.
These conditions also need to be true in binomial distributions:
- Each trial must have only two possible outcomes: success or failure. The point is that there are only 2 options in your trials. For example, asking 100 people if they’ve watched Kim’s Convenience works because the only answers are yes/no. Asking 100 people their age doesn’t work, because there are lots of different answers.
- Each trial has to have the same success rate. A coin does not have a 50% success rate on one flip, then suddenly switches to an 80% success rate on the next flip. The coin is consistent from trial to trial.
- Each trial has to be independent. This means that the result of one trial does not affect the outcome of another trial. If a coin lands on heads 5 times in a row, that does not change the probability of the coin landing on heads or tails on the 6th flip. A coin has no memory.
So B(10, 0.5) can be thought of as flipping a coin 10 times where the chance of heads is 50%. Below is a graph of this distribution: the x-axis represents the number of heads, and the y-axis represents probability. For example, the probability of getting exactly 7 heads out of 10 is about 11.7%.
Each bar represents the probability of getting a specific number of heads. We’re going to come back to this graph later, to talk about concepts called the PMF and CDF. But for now, what we want to remember is that binomial distributions help us model a series of random independent trials, where each trial has the same probability of success.
Cheating Scandals and Binomial Distributions
So that’s the binomial distribution, but are they useful in the real world?
Back to Atlanta. As this paper from Educational and Psychological Measurement (EPM) points out, there are mainly two ways to detect test tampering. The first is to see if schools made unusually large gains in their average test scores from year to year. This is what the AJC noticed: many schools in Atlanta had among the worst test scores in 2008, but suddenly got some of the best scores in 2009.
However, the second way to detect test tampering is to look at erasure markings. On a multiple-choice test done on a bubble sheet, it’s normal to occasionally erase your answer and bubble in a new one. Some of these changed answers are wrong-to-right answers (called WTR). Of course, erasing your answer will leave pencil markings, so if we had a copy of your bubble sheet and the answer sheet, we can tell how many WTR answers there were. If there’s a suspiciously large amount of WTR answers, that’s a sign that someone tampered with it and fixed your answers.
It’s hard to find the AJC’s original analysis, because the story they published online is slim on details and their website is littered with broken links. But they mention that they used “linear regression,” which is a technique used for normal distributions, which are different from binomial distributions². In other words, AJC (probably) didn’t use a binomial distribution in their analysis, because data on WTR answers was not public.
The EPM paper mentions that when researchers ran the numbers on bubble sheets, they found that genuine WTR answers make up about 2% of all questions on a test. In other words, if a bubble sheet has 100 answers, it’s likely that about 2 of them were originally wrong, erased, and changed to the right answer. This follows a binomial distribution: the number of trials is the number of questions on a test, and the probability of success is 2%. On a test with 100 questions, the number of WTR answers is represented by B(100, 0.02), which looks like this:
Binomial distributions also show up all the time in video games when there’s some sort of random-number generation (or RNG for short). RNG especially shows up in role-playing games and strategy games. For example, in the Pokemon series, a big part of the game is spent running through patches of grass fighting and collecting cute creatures called Pokemon. There are something like 20 trillion different Pokemon, but the specific Pokemon that appears in any given moment is randomized.
As esports and meta-games like speedrunning grow more popular, cheating by manipulating RNG is a real concern. In December 2020, a speedrunner for the game Minecraft was accused of manipulating the game’s random-number-generation to make himself luckier. As part of the game, he had to collect an in-game item called Ender Pearls, which normally have a 4.7% drop rate (like a coin flip with a 4.7% chance of landing on heads). Out of 262 attempts, he received 42 pearls, which is much more than expected.
This is also a binomial distribution! The distribution is B(262, 0.047), and it turns out the chance of getting 42 successes out of 262 is about 1 in 177 billion³, which is absurdly low. I recommend Matt Parker’s video on that controversy for a deeper look.
Once you know about the binomial distribution, you start to see it everywhere. It shows up in schools, in video games, and in mere flips of a coin. Knowing how binomial distributions work can help us shed light on lies in the real world.
PMFs and CDFs
Okay, so we know what a binomial distribution is, and we know that binomial distributions can help unravel real cheating scandals, but how are we supposed to get your money back from Dave, the coin-flipping fiend? How do we actually show that he is suspiciously likely?
Remember, Dave got 18 heads out of 20 flips. Since there are 20 flips, we can write this out as B(20, 0.5), and plot it like this:
At this point, I should introduce you to my two friends, Paolo Marcelo Fernandes and Carlos Danilo Fernandes (they’re brothers). They’re investigators, and they’re going to help us get you your money back.
Paolo suggests that we find the probability of Dave getting exactly 18 out of 20 heads:
When we calculate the probability of an exact outcome like this, it’s called the PMF (it stands for probability mass function, not Paolo Marcelo Fernandes’s initials, just a coincidence)
Carlos doesn’t like this idea. He points out that we’re trying to find out the chance of someone getting as lucky or luckier than Dave, not the chance of getting specifically 18 heads. Carlos says that we should add up the probabilities of getting 18 heads, 19 heads, and 20 heads; the chance of getting 18 heads or better:
Carlos is right: we want to find the probability of someone getting 18 or more heads out of 20. When we calculate the probability of getting some outcome or better/worse, it’s called the CDF (it stands for cumulative density function, not Carlos Danilo Fernandes’s initials, just a coincidence)
- PMF (probability mass function): gives the probability of getting an exact outcome (exactly 18 heads out of 20)⁴
- CDF (cumulative density function): gives the probability of getting an exact outcome or better/worse (18 or more heads out of 20)
I’m not going to go over how to calculate the PMF and CDF by hand here, but the formula for a binomial distribution’s PMF is explained here. To find the CDF, you would find the PMF of a bunch of different probabilities and add all the results up.
Instead of doing that by hand, we’re going to calculate the CDF with R.
Cracking the Cheating Scandal: The 1 Line of Code
Open RStudio (or whatever IDE you use) and punch in this line of code:
print(1-pbinom(q=17, n=20, p=0.5))
This is all we need to find out how “lucky” Dave got. Let’s break down this line of code:
print()— Prints something to your console.
pbinom(q,n,p)— Calculates the CDF of a binomial distribution. Specifically, pbinom calculates the probability that the number of successes is less than or equal to q.
n— trials in the binomial distribution
p— probability of success in the binomial distribution
pbinom(q,n,p) finds the probability that the number of successes is less than or equal to q. However, we want to find the probability of getting 18 successes or more, not less.
Take a look at Carlos’s plot again. Notice that we can split the graph into two sections: outcomes where heads ≤ 17 and those where heads ≥ 18:
Call the probability of getting 17 or fewer heads S, and the probability of 18 or more heads T. Both S and T add up to 100% or 1, so S+T=1, which also means T=1-S. The probability S can be found with pbinom:
pbinom(q=17, n=20, p=0.5)
And since T=1-S, we can find T like this:
1-pbinom(q=17, n=20, p=0.5)
This is why in our line of R code, we write
1-pbinom(q=17, n=20, p=0.5). It’s because we find (1-S). We first find S, the probability of getting 17 or fewer heads, and then subtract that probability from 1 to get the probability of 18 or more heads⁵.
If you run the line of code for Dave’s coin-flipping:
print(1-pbinom(q=17, n=20, p=0.5))
Then you get this answer: 0.0002012253. There is roughly a 0.02% chance, 1 in 5000, that Dave could get that lucky with a fair coin.
This isn’t actually that improbable compared to events like, say, a lightning strike. If you live in the U.S., the odds of getting struck by lightning in your lifetime are about 1 in 15,300. Still, it looks suspicious. You ask Dave to flip the coin 10 more times. He nervously agrees, and gets 8 more heads. Now the total is 26 out of 30 heads.
Run the 1 line of code again, updating the number of trials and successes:
print(1-pbinom(q=25, n=30, p=0.5))
Then you get this answer: 0.000029738, or 0.00297%. About 1 in 33,300. Less than the chance of getting struck by lightning.
At this point, Dave does what anyone with a conscience would do: he gives up. He admits that the coin is weighted. You get your money back.
Where there is pressure, there is unfortunately the urge to cheat. That can be pressure to perform, as in Atlanta, or just the pressure not to lose a bet, as with our fake-coin-flipping foe.
I should warn you: getting extremely lucky does not automatically mean you cheated. Extremely unlikely events can make people suspicious, and merits investigation, but it does not make you guilty. Unlikely events still happen (like lightning strikes). I should note that the Atlanta school officials were not arrested just because the AJC pointed out their unlikely test gains. It was because after the AJC pointed it out, more investigations were opened which uncovered a cutthroat workplace culture, orders to destroy documents, silencing of dissenters, and so on.
But statistical concepts like the binomial distribution help us draw the line between “lucky” and “WTF suspiciously lucky”. We can even get that help with just 1 line of code. I hope you enjoy your money, and make sure to thank Carlos and Paolo for me on your way out.
¹: It’s actually 2 baseball caps with one of them turned backwards. Close enough.
²: Normal and binomial distributions are similar, but normal deals with continuous variables and binomial deals with discrete variables. Coin flips are discrete: there is no such thing as “0.5 flips”. However, the percentage that a school’s score increased is continuous: 5% is a valid outcome, along with 0.436%, 33.79%, and every number in between. So it makes sense that AJC used linear regression: they were dealing with a continuous variable.
³: This is (roughly) the raw probability of getting 42 or more successes out of 262 trials with a 4.7% success rate. However, the accusers wanted to make sure they accounted for possible bias in their data, so they estimated the speedrunner’s odds more generously at 1 in 85 billion, which is still absurdly low. You can read their original paper here, or watch this video explanation they made for the paper.
⁴: PMFs only apply to discrete variables, not continuous variables, but continuous variables have a similar concept called the PDF (probability density function, not to be confused with the document type or the initials of my other friend, Phillip Dominic Fantini. Just a coincidence.).
⁵: Another way to write this line of code is
print(pbinom(q=17, n=20, p=0.5, lower.tail=FALSE)).