RJMCMC: What if you have an unknown number of parameters in your models? What if you don’t know which model is actually your model?
Published:
Heyo, in this post I’m going to describe how you can explore parameter spaces using MCMC in cases where you have a set of models with different numbers of parameters or more simply where you have a model with an unknown number of parameters using Reversible Jump Markov Chain Monte Carlo (RJMCMC). UNDER CONSTRUCTION
UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION UNDER CONSTRUCTION
Resources
- “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model” - Peter Green (1995)
- “Advanced Simulation Methods || Chapter 7 - Reversible Jump MCMC” - Patrick Rebeschini (~2004 not sure)
- “Lecture 22. Reversible Jump MCMC” - N Zabaras, University of Notre Dame (2017)
- “Reversible jump Markov chain Monte Carlo and multi-model samplers” - Fan, Sisson and Davies (v1 2010/ v2 2024), arXiv:1001.2055v2
Table of Contents
- Statement of the problem
- Describing the mathematics of jumping between different dimensional spaces
- Examples
- Conclusion
Statement of the problem
It is quite common that when performing inference, a statistician doesn’t know a priori how many parameters they need to describe her data or which one of a collection of models actually describes the data. This falls outside the typical region in which general sampling techniques (such as MCMC), apply and thus requires some specialised treatment in the form of ‘trans-dimensional probability theory’ and ‘reversible-jump MCMC’.
Describing the mathematics of jumping between different dimensional spaces
Following a similar introduction but Fan et al. (2024), let’s say we have a set of data \(\mathcal{D}\) that comes from some model within a countably finite set of models \(\{\mathcal{M}_1, \mathcal{M}_2, \mathcal{M}_3, \dots, \mathcal{M}_k, \dots\}\) with \(k \in \mathcal{K}\) being used to indicate which model is currently being investigated and can be thought of as a new variable. Each model then has an \(n_k\)-dimensional set of parameters \(\theta_k \in \mathbb{R}^{n_k}\), where it’s not necessarily true that \(n_k = n_{k'}\).
Treating the variable \(k\) as an indicator for the (discrete) model distribution, the probability of a given model \(\mathcal{M}_k\) and it’s set of parameters \(\theta_k\) given the data \(\mathcal{D}\) is,
\[\begin{align} \pi(k, \theta_k \vert \mathcal{D}) = \frac{\mathcal{L}(\mathcal{D}|k, \theta_k) p(\theta_k, k)}{\sum_{k' \in \mathcal{K}} \mathcal{L}(\mathcal{D}|k', \theta_{k'}) p(\theta_{k'}, k')} = \frac{\mathcal{L}(\mathcal{D}|k, \theta_k) p(\theta_k \vert k) p(k)}{\sum_{k' \in \mathcal{K}} \mathcal{L}(\mathcal{D}|k', \theta_{k'}) p(\theta_{k'}\vert k')p(k')}. \end{align}\]Where the second equality is expanding \(p(\theta_k, k)\) to \(p(\theta_k \vert k) p(k)\), which is typically easier to encode. In essence, \(p(\theta_k \vert k)\) is just the prior on the set of parameters for the given model \(\mathcal{M}_k\), and \(p(k)\) is your prior or set of assumptions on which models describe your data apriori.