Statistically Hating Everyone
One reason the 2016 U.S. Presidential election was… erm, remarkable?… was that it was the first election where both major-party candidates had an unfavorability rating of over 50% (more accurately, “first election” as in the first held after pollsters started recording unfavorability).
The other day someone told me this: “It was the first election where a majority of Americans disliked both candidates.”
No. No! That’s not what this statistic means. It could be the case that a majority of Americans disliked both candidates, but it’s much more likely that most Americans disliked one candidate and had a neutral/favorable rating of the other, while a small but significant number of people disliked both candidates. Here’s why it’s wrong to interpret the statistic as “a majority of people disliked both candidates.”
(Sidenote: of course, people also pointed out this mistake in 2016. The percentage of people who disliked both major-party candidates in 2016 was more like 24 percent, not a majority)
A Trip to Examplia
Let’s take a flight to the fictitious but quite beautiful nation of Examplia. Examplia has two major-party candidates: Johnny Johnsperdon of the Purple Party and Dave “The Dave” Davidson of the Orange Party.
Suppose citizens of Examplia (Examplians) are divided like this:
- 50% love Johnny and hate Dave
- 50% love Dave and hate Johnny
Dave has an unfavorability rating of 50%. Johnny also has an unfavorability rating of 50%. However, the percentage of Examplians who think unfavorably of both candidates is 0%. There is not a single person in this Examplian election who hates both candidates.
In the next election, Johnny’s son Jimmy Johnsperdon and Dave’s daughter Davette “Not Dave” Davidson are the two-major party candidates. This time, Examplians are split like this:
- 40% love Jimmy and hate Davette
- 40% love Davette and hate Jimmy
- 20% hate both candidates
Davette and Jimmy both have an unfavorability rate of 60%. Woah! But the percentage of Examplians who actively hate both candidates is 20%, definitely not a majority.
Here are three statements that you would probably hear Examplians say after looking at the polls:
- Davette is disliked by a majority of Examplians. True
2. Jimmy is disliked by a majority of Examplians. True
3. A majority of Examplians dislike both Davette and Jimmy. False
A bar graph showing the unfavorability of two political candidates is showing two separate statistics side-by-side: the percentage of people who like candidate A, and the percentage of people who like candidate B. These are two separate results, and we must be careful not to make quick conclusions about both statistics at the same time.
This is not the same as saying that the two statistics are independent of each other. In statistics, two events are independent if knowing the outcome of one event does not help you predict the outcome of the other. If I buy boba on a Sunday, that does not make it more or less likely that I will win the lottery that same day. So “buying boba” and “winning the lottery” are independent events (and I didn’t win, by the way).
The unfavorability of Jimmy and Davette however are dependent, not independent. Knowing one statistic does help you predict the other. If we chose a random Examplian person, what’s the chance that they will hate Jimmy? That would be 60%. Now let’s choose a different Examplian, but this time we already know that they hate Davette. What’s the chance of them also hating Jimmy given that they hate Davette? It would be 33.33% or 1 in 3. This is because 60% of all Examplians hate Davette and 20% hate both, so among people who hate Davette, 1 in 3 (20/60) also hate Jimmy.
(To put this in stats notation: event J is “hates Jimmy,” event D is “hates Davette.” P(J)=6/10, but P(J|D)=1/3. If the two events were independent, then P(J) and P(J|D) would equal each other.)
The key takeaway is that even though two events can be dependent on each other, we still need to be careful not to make false conclusions about several things at once. Just because 60% of people dislike candidate A, and 60% dislike of people dislike candidate B, that does not tell us much about the percentage of people who dislike both candidates.
One final example: Examplia also has its own baseball league. Two of those baseball teams are the Hyenas and the Wolves. If I ask Examplians if they like either of those two teams, the results could look like this:
So each team has a “favorability” rating of 20%. However:
- Does this mean that 20% of people like both the Hyenas and Wolves, and the other 80% of people like neither team?
- Does this mean 20% of people like the Wolves, a separate 20% like the Hyenas, and the other 60% like neither team?
- Or is it somewhere in the middle, where maybe 15% of people like the Wolves, 15% like the Hyenas, 10% like both, and 60% like neither?
We don’t know for sure without getting more data. Any of these three options could be true. But given how bitter sports rivalries get, I wouldn’t bet money on that first option.