On MRF factorization
Recently, I figured out I didn't understand two simple things about MRFs:
- should we consider all the cliques in the network or only maximal ones?When we are talking about MRF factorization, in the formulation of the likelihood
- Generally, why do we have a right to factorize the likelihood this way?
So, let's start with the second question. For those of you who like names, it is exactly the necessity part of the Hammersley-Clifford theorem. As Bishop put it:
If we consider two nodes xi and xj that are not connected by a link, then these variables must be conditionally independent given all other nodes in the graph. This follows from the fact that there is no direct path between the two nodes, and all other paths pass through nodes that are observed, and hence those paths are blocked. This conditional independence property can be expressed aswhere x\{i,j} denotes the set x of all variables with xi and xj removed. The factorization of the joint distribution must therefore be such that xi and xj do not appear in the same factor in order for the conditional independence property to hold for all possible distributions belonging to the graph.
One could find the answer to the first question in Bishop's book as well. Normally, we should include all the cliques to the product. Actually, every function of variables X is also a function of any Y that is a superset of X. Based on that, a potential for a non-maximal clique could be merged into a potential for any maximal clique such that it is a supergraph of the first clique! Thus, both formulations (with all cliques and only maximal cliques) are equally expressive.
29 January 2010 at 10:24
Hell yeah, one more theorem with more than one name in its title!
Nothing to add about the remaining contents of your post. You are correct =)