Saturday, January 03, 2009

The question is wrong

On "Coding Horror", Jeff Atwood asked this question:
Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?

He then argues that our intuition leads us to the "wrong" answer (50%) instead of the "correct" answer (2/3 or 67%).

However, the question does not include enough information to determine which of these answers is actually correct, so the only truly correct answer is, "I don't know" or "it depends". I skimmed though the comments on the post (there are about a million), and didn't see anyone addressing this issue (though someone probably did). They mostly argued about BG vs GB for some reason.

The reason that this question is wrong is because it doesn't specify the "algorithm" for posing the question.

If we assume that boys and girls are born with equal probability (50/50, like flipping a coin), then families with two children will have two girls 25% of the time, two boys 25% of the time, and a boy and a girl 50% of the time.

If the algorithm for posing the question is:
  1. Choose a random parent that has exactly two children
  2. If the parent has two boys, eliminate him and choose another random parent
  3. Ask about the odds that the parent has both a boy and a girl
Then we can see that step two eliminated the "two boy" possibility, which leaves the 25% probability of two girls and the 50% probability of both a boy and a girl. Of course probabilities should add up to 100%, so the final probabilities are  25/75 (1/3) for two girls and 50/75 (2/3) for both a boy and girl. This is the "correct" answer described by Jeff, and it occurs because of the elimination performed at step two.

However, if the algorithm for posing the question was instead:
  1. Choose a random parent that has exactly two children
  2. Arbitrarily announce the gender of one of the children
  3. Ask about the odds that the parent has both a boy and a girl
Now we're back to having a 50% probability of there being both a boy and a girl. The difference is that there was no elimination at step two, and simply announcing the gender of one of the children does not affect the gender of the other child or change the probability distribution.

The problem with the question as originally posed was that it didn't specify which of these algorithms was being used. Were we arbitrarily told about the girl, or was a selective process applied?

By the way, if we're applying a selective process, then 100% is also a possibly correct answer, because at step two we could have eliminated all parents that don't have both a boy and a girl. Likewise, all other probabilities are also potentially correct depending on the algorithm applied.

Update: Surprisingly, some people are still thinking that my second algorithm yields 2/3 instead of 1/2 (see the confused discussion on news.yc). I think part of the reason is that I was somewhat imprecise with the concept of "elimination". The second algorithm does not eliminate any of the families, but if I announce that there is a boy, that does eliminate the possibility of two girls. This is where some people are getting lost and thinking that the boy+girl probability has become 2/3. The catch is that announcing the boy also reduced the boy+girl probability by an equal amount, so the result is still the same (it eliminated either BG or GB, I don't know which, but it doesn't matter).