The paradox lies in the fact that a conditional distribution with respect to such an event is ambiguous unless it is viewed as an observation from a continuous random variable. Furthermore, it is dependent on how this random variable is defined.
The canonical example of the paradox is as follows:
Consider a random point distributed uniformly over the surface of an assumed spherical "earth", and compare the following two conditional distributions:
- The longitude (X) given that the observed point is on the great circle along the equator (latitude Y = 0).
- The latitude (Y) given that the observed point is on the prime meridian (longitude X = 0).
By intuition, the conditional distribution in (1) is uniform on the interval (–π,π). The paradox arises with (2).
One could appeal to symmetry to argue that the great circle along the Prime meridian should behave like the great circle along the equator, and therefore that (2) should also be uniformly distributed, this time on the interval (–π/2,π/2).
However, note that the longitude and latitude are independent random variables, and thus the conditional distribution in (2) is just equal to the marginal distribution of the latitude, with density:
![p_Y(y) =\frac{1}{2} \cos(y) ;\quad -\tfrac{\pi}{2} < y < \tfrac{\pi}{2},]()
which is clearly not uniform.
In case (1) above, the conditional probability that the longitude X lies in a set E given that Y = 0 can be written P(X ∈ E | Y = 0). Elementary probability theory suggests this can be computed as P(X ∈ E and Y=0)/P(Y=0), but that expression is not well-defined since P(Y=0) = 0. Measure theory provides a way to define a conditional probability, using the family of events Rab = {Y : a < Y < b} which are horizontal rings consisting of all points with latitude between a and b. Rab can be used to construct a function fE(y) = P(X ∈ E|Y=y), which can then be evaluated at fE(0) to give P(X ∈ E|Y=0). See conditional expectation for more information.
The resolution of the paradox is to notice that in case (2), P(Y ∈ F | X=x) is defined using the events Lab = {X : a < X < b}, which are vertical wedges (more precisely lunes), consisting of all points whose longitude varies between a and b. So although P(X|Y=0) and P(Y|X=0) each provide a probability distribution on a great circle, one of them is defined using rings, and the other using lunes. Thus it is not surprising after all that P(X|Y=0) and P(Y|X=0) have different distributions.
The concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible. For we can obtain a probability distribution for [the latitude] on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface onto meridian circles with the given poles
… the term 'great circle' is ambiguous until we specify what limiting operation is to produce it. The intuitive symmetry argument presupposes the equatorial limit; yet one eating slices of an orange might presuppose the other.
An implication is that conditional density functions are not invariant under coordinate transformation of the conditioning variable.
Consider two continuous random variables (U,V) with joint density pUV. Now, let W = V / g(U) for some positive-valued, continuous function g. By change of variables, the joint density of (U,W) is:
![p_{UW}(u,w) = p_{UV} \big(u,w\, g(u)\big) \left|\frac{\partial (u,v)}{\partial (u,w)}\right| = p_{UV} \big(u,w\, g(u)\big) \,g(u)]()
Note that W = 0 if and only if V = 0, so it would appear that the conditional distribution of U should be the same under each of these events. However:
![p_{U|W=0}(u) \propto p_{UV}(u,0) \, g(u)]()
whereas
![p_{U|V=0}(u) \propto p_{UV}(u,0)]()
which are not equal unless g is constant.