2009-12-10

The improperness of improper priors

Over the past ten of days or so I've once again been diving into Bayesian statistics; let us say I'm experiencing a serious, personal paradigm shift wrt statistical inference. My main starting point as a practicing database guy is the easy correspondence between embedded multivalued dependency (EMVD) on the one hand and conditional independence (CI) in Bayesian networks on the other—EMVD has for the first time made the basic ideas of BN's accessible to me at the intuitive level, and of course it's pretty nice in other ways, since we're now only talking about the easy, finite, always continuous cases.

Still, you have to mind the infinite frame of mind as well. And since I happen to think like I do, I'm from the start strongly attached to the objective Bayesian viewpoint, in preference to the more conventional subjective one. In fact, my revival is intimately tied with my finally having learned that the Bayesian framework can also be described in purely objective, information theoretical, measurement-of-ignorance terms instead of vague references to "degrees of belief".

Here, the nastiest counter-example seems to be the problem with unnormalized (improper) priors, and the subsequent marginalization paradox which is absent from the theory dealing purely with proper priors/normalized probability measures. At least to me it implies that there might not be a coherent description of complete uncertainty with respect to the Bayesian framework. That is rather bad, because evenas we already know that Bayes's theorem is just about the neatest framework for consodlidating uncertain information in a provably coherent way (no Dutch Book! eventual stabilization among a group of Bayesian learners with shared priors, modulo common knowledge concerns and the like! a neat generalization of Popperian falsificationism!!!), we do always need a coherent starting point which has an objective, not a purely subjective, interpretation.

Now, I had a little bit of an intuitive flash there, as I'm prone to. While it is true that we cannot normalize many of the most natural "distributions" that would go along with common improper priors, perhaps that has less to do with the impossibility of bringing formal rigor to bear to them than we might think. I mean, sure, we cannot even make the simplest of such priors, the flat one, normalize into a probability measure in the infinite base set case; not even using the theory of general distributions. But sure enough we can make it work if we let the prior become a general linear functional, restricted to a class of arguments where it is continuous.

So, maybe the marginalization problem, or the common use of improper priors, wasn't so much about the impossibility of representing noninformative priors "cleanly", after all. Maybe it was just about our choice of representation?

I haven't gone through the particularities, but I have a strong sense that this sort of approach could lead to a natural, structural encoding of the special nature of not-knowing-shit versus sorta-probabilistically-knowing-something—structural restrictions wrt the resulting operator algebra would simply make certain kinds of calculations inadmissible, those restrictions would propagate in the intuitively proper manner through things like Bayes's rule, and things that manifest themselves as the marginalization paradox (or others alike it?) would probably be forced out into the open as topological limitations in the compatibility of operators which act not only on distributions proper, but the dual space as well. In particular, Bayes's rule with the usual pointwise limit interpretation would only apply to the density function side of things, and would probably be qualified in case of functional/function, and certainly functional/functional, interactions (e.g. you can't reasonably deal with products of functions and functionals, you always have to go to the inner product, i.e. in conventional terms marginalize when both are present; and in the case of functionals, the topology of the dual function space is rather different from the conventional one, seriously affecting the theory of integration).