Monday, February 23, 2009

Letter to the Editor: Wired Magazine

It's not every day that I get to correct a major publication on some of the finer points of probability, so when the chance comes up, I take it. The full article is here, and the erroneous passage is contained in the fourth paragraph from the bottom of this page. I wrote:
"A Formula for Disaster" is a relatively informative article on one of the causes of the current financial mess, but it bears pointing out that one of Mr. Salmon's illustrations of the concept of mathematical correlation is misleading. In particular, Salmon writes, "And if Britney wins the class spelling bee, the chance of Alice winning it is zero, which means the correlation is negative: -1." It's true that this correlation is always negative, but the numerical value of the correlation is dependent upon the number of students in the class with Alice and Britney. In fact, the correlation of A and B will only be equal to -1 in the degenerate case where Alice and Britney are the only two students in the class. In general, assuming that every student in the class has an equal probability of winning the spelling bee (and that these probabilities can be accurately modeled as Bernoulli random variables), the correlation between A and B will be equal to 1/(1-n), where n is the total number of students in the class.
Wired is one publication where writing in geekspeak might actually increase your chances of getting your letter published, so I went all out. Sean, would you mind checking my math?

10 comments:

  1. I agree, but I think I would have called Mr. Salmon's example an oversimplification, rather than misleading, since it could be argued that he defined his context clearly enough. I think it's interesting how you've presented a case not entirely dissimilar to your last post about comic racism. How much is it the responsibility of the author to ensure his works are not misinterpreted? It's always so much harder than it looks.

    ReplyDelete
  2. I'm not sure I understand the connection you're making between the Wired article and the Post comic. Like you say, the Post comic could be a classic example of interpretive variability. But I don't think there was anything much to interpret about Salmon's claim on this point: he made a factual claim about the correlation between two random variables, and his claim was just plain false except in the degenerate case. If I was going to use any word other than "misleading" to characterize his explanation, it wouldn't be "oversimplification"; it would be "wrong."

    As far as I can tell, the only thing left up to interpretation about this claim is whether Salmon intends for the word "correlation" to have its technical meaning. But in context, I don't think the passage works unless you understand the word technically. First of all, the article as a whole is mathematically non-naive, so that I'd assume technical terms are used in their technical senses. Secondly, the non-technical meaning of "correlation" wouldn't allow you to place a numerical value on the relationship between two variables at all, but Salmon repeatedly uses numbers to describe the relationship between his variables.

    Since I can only assume that the technical definition is the one he had in mind, I think it's appropriate at least to demand that the numbers used to illustrate the point be accurate, and in this case, they weren't. If he wanted to illustrate a case in which the correlation actually is equal to -1, then the Alice and Britney example was just a poor choice. He could have used the correlation between heads and tails on a single flip of a coin, or between odds and evens on a single roll of a die.

    If I've missed your point entirely though, please explain.

    ReplyDelete
  3. When I read the paragraph, it didn't occur to me that he was trying to make a claim about any other case than the one he explicitly defined. In other words, I didn't read the paragraph to mean that the correlation would still be -1 regardless of the number of students in the classroom. I didn't even think it was implied that it would still be -1. When I read the sentence in question, I still had the context of only two students in mind, so it didn't strike me as wrong at all. To me, the sentence might as well have started, "In this case..."

    I did think it was a poor job of explaining correlation, but that had more to do with Salmon's offering of simple examples in lieu of anything resembling a definition.

    ReplyDelete
  4. But I think the case that has been explicitly defined is one wherein Alice and Britney are two students in the same class, and that the class is participating in a spelling bee. The natural assumption is that the class contains more than two students.

    This is important because in the case of Bernoulli random variables, the correlation of A and B is only -1 if the following statement is true in general: A if and only if not-B. "A only if not-B," is true no matter how few children are in the class. But, because there is a non-negligible probability that neither Alice nor Britney will win the spelling bee, "A if not-B" is not generrally true. It's true only when Alice and Britney are the only students in the class.

    ReplyDelete
  5. Also, the illustration could definitely mislead a lay reader who's never been exposed to the technical concept of probability. On the strength of Salmon's explanation, someone would be likely to determine that any time two events are related in such a way that the occurence of one excludes the occurence of the other, their correlation is -1. But this would only be true if, in addition, the non-occurence of the first always entails the occurence of the second.

    ReplyDelete
  6. I think you've convinced me.

    "But something important happens when we start looking at two kids rather than one—not just Alice but also the girl she sits next to, Britney."

    I wasn't looking at this from a realistic perspective. I'm used to reading math stuff where examples are built from scratch, so in my head, there were only two kids in the class. I'm further interested. It seems like in general, I'd assume more than a two student classroom, but in the context of the math discussion, it appears that assuming a class of greater than two students wasn't natural to me at all.

    ReplyDelete
  7. Heh. One of those 2-is-a-sufficiently-close-approximation-to-infinity moments. I hear you.

    ReplyDelete
  8. http://funquotes.witticism.org/quotes-f.html

    It's a reference to an old quote from the UNIX fortune file: "Five is a sufficiently close approximation to infinity." I've always thought of it as referring to the fact that mathematicians are frequently able to get away with assumptions that sound dubious to laypeople. Of course the unspoken corrolary to this notion is that you can only get away with those dubious assumptions in certain cases. So I was just sort of riffing on how your mathematical background led you to interpret the article in a way that a layperson would have probably found implausible.

    The quote is attributed to Robert Firth, who also apparently said, "... one of the main causes of the fall of the Roman Empire was that, lacking zero, they had no way to indicate successful termination of their C programs."

    ReplyDelete
  9. I remember the quote, though I was sure it was six instead of five. I just missed the part about the dubious assumptions. It just didn't occur to me that he was trying to explain anything more than the case for N=2. Before I even got to the spelling bee, I had pictured a classroom with only two girls in two desks, laughing at their teacher tripping over a banana.

    ReplyDelete