SuPReMo
0.1.1
|
The theoretical background is explained in the paper by Jamie McClelland et al.. Some more specifics which are relevant to the implementation of SuPReMo are provided on this page.
Symbol | Description |
---|---|
\( \mathbf{I}_{t},\;t\in [1,\ldots, N_i] \) | motion images (or dynamic images, in image space) |
\( t \) | time point |
\( N_i \) | total number of motion images |
\( \mathbf{I}_0 \) | reference state image |
\( \mathbf{M}_t = [m_{t,1}, \ldots, m_{t,N_m}]^\mathrm{T} \) | motion parameters |
\( N_m \) | number of motion parameters |
\( \mathbf{S}_t = [s_{t,1}, \ldots, s_{t,N_s} ]^\mathrm{T} \) | surrogate signal |
\( N_s \) | number of surrogate signals per time point |
\( \mathbf{R}=[\mathbf{R}_1,\ldots,\mathbf{R}_{N_r}]^\mathrm{T} \) | model parameters |
\( N_r \) | number of model parameters per motion parameter |
\( \mathbf{P}_{\negthickspace A_t} \) | simulated motion image (in acquisition space) |
\( \mathbf{P}_{\negthickspace t}\) | motion image (in acquisition space) |
\( N_{p} \) | number of voxels of the simulated dynamic image |
The unified framework directly optimises the correspondence model parameters \(\mathbf{R}\) on all motion images simultaneously.
\[ \mathcal{C}_t = (1-\lambda) \mathcal{S}( \mathbf{P}_{\negthickspace t}, A_t(\mathbf{T}(\mathbf{I}, \mathbf{M}_t))) + \lambda\mathcal{R}(\mathbf{M}_t) \]
Often the image similarity and regularisation are weighted with a parameter \(\lambda \in ]0,1]\), such that a balance between data fidelity and smoothness of the solution can be controlled.
In order to make use of gradient based optimisation methods, the partial derivative of the cost function \(\mathcal{C}\) needs to be known with respect to those model parameters.
\[ \frac{\partial\mathcal{C}_t}{\partial \mathbf{R}} = \frac{\partial \mathbf{M}_t}{\partial\mathbf{R}} \frac{\partial\mathcal{C}_t}{\partial\mathbf{M}_t} \]
The correspondence model relates the surrogate signal and the model parameters with the motion parameters, i.e.
\[ \mathbf{M}_t = F(\mathbf{S}_t,\mathbf{R}). \]
For instance a 2nd order polynomial model can be expressed as
\[ \mathbf{M}_t = \mathbf{R}_1 s_t^2 +\mathbf{R}_2 s_t + \mathbf{R}_3 \]
with the corresponding partial derivatives
\begin{eqnarray*} \frac{\partial \mathbf{M}_t}{\partial \mathbf{R}_1} = s_t^2, \;\; \frac{\partial \mathbf{M}_t}{\partial \mathbf{R}_2} = s_t, \;\; \frac{\partial \mathbf{M}_t}{\partial \mathbf{R}_3} = 1. \end{eqnarray*}
The objective function value for a single motion-image time point is computed on the basis of the objective function. The total objective function value is the sum over all time points
\[ \mathcal{C}_\text{total} = \sum_{t=1}^{N_i} \mathcal{C}_t. \]
A procedural description of how the objective function value is calculated is presented in listing...
Here, first the gradient of the total cost function is calculated by expansion of the derivative of the cost function with respect to the model parameters and the cost function summed over all time points The total gradient is given by
\begin{align} \frac{\partial \mathcal{C}_\text{total}}{\partial \mathbf{R}} & = \sum_{t=1}^{N_i} \frac{\partial \mathcal{C}_t}{\partial \mathbf{R}} \\ & = \sum_{t=1}^{N_i} \frac{\partial \mathbf{M}_t}{\partial\mathbf{R}} \frac{\partial\mathcal{C}_t}{\partial\mathbf{M}_t}. \end{align}
Substituting \(\frac{\partial\mathcal{C}_t}{\partial \mathbf{M}_t}\) with the objective function yields
\begin{align} \frac{\partial \mathcal{C}_\text{total}}{\partial \mathbf{R}} & = \sum_{t=1}^{N_i} \frac{\partial \mathbf{M}_t}{\partial\mathbf{R}} \left( (1-\lambda) \frac{\partial \mathcal{S}_t}{\partial\mathbf{M}_t} + \lambda \frac{\partial \mathcal{R}_t}{\partial\mathbf{M}_t} \right). \end{align}
Again, using the chain-rule to substitute \(\frac{\partial \mathcal{S}_t}{\partial\mathbf{M}_t}\),
\begin{align} \frac{\partial \mathcal{C}_\text{total}}{\partial \mathbf{R}} & = \sum_{t=1}^{N_i} \frac{ \partial \mathbf{M}_t}{\partial\mathbf{R} } \left( (1-\lambda) \frac{\partial \mathbf{I}_{T_t}}{\partial\mathbf{M}_t}\frac{\partial \mathcal{S}_t}{\partial\mathbf{I}_{T_t}} + \lambda \frac{\partial \mathcal{R}_t}{\partial\mathbf{M}_t} \right)\\ & = \sum_{t=1}^{N_i} \frac{ \partial \mathbf{M}_t}{\partial\mathbf{R} } \left( (1-\lambda) \frac{\partial \mathbf{I}_{T_t}}{\partial\mathbf{M}_t} \frac{\partial \mathbf{P}_{\negthickspace A_t}}{\partial \mathbf{I}_{T_t}} \frac{\partial \mathcal{S}_t}{\partial\mathbf{P}_{\negthickspace A_t}} + \lambda \frac{\partial \mathcal{R}_t}{\partial\mathbf{M}_t} \right)\\ & = \sum_{t=1}^{N_i} \frac{ \partial \mathbf{M}_t}{\partial\mathbf{R} } \left( (1-\lambda) \frac{\partial \mathbf{I}_{T_t}}{\partial\mathbf{M}_t} A^*_t \left( \frac{\partial \mathcal{S}_t}{\partial\mathbf{P}_{\negthickspace A_t}} \right) +\lambda \frac{\partial \mathcal{R}_t}{\partial\mathbf{M}_t} \right). \end{align}
In the next section the different elements of the derivative required for gradient-based optimisation are described in more detail.
This term \(\frac{\partial \mathcal{R}_t}{\partial\mathbf{M}_t}\) is well known from image registration and any existing implementation can be utilised here. Popular regularisation methods include for instance bending energy, linear elastic potential, and determinant of the Jacobian.
\[ \frac{\partial \mathcal{S}_t}{\partial\mathbf{P}_{\negthickspace A_t}} = \left[ \frac{\partial \mathcal{S}_t}{\partial p_{A_t,1}}, \ldots, \frac{\partial \mathcal{S}_t}{\partial p_{A_t,N_p}} \right]^\mathrm{T} \]
is a column vector of size \(N_p \times 1\), where \(N_p\) is the number of voxels in the simulated motion image \(\mathbf{P}_{\negthickspace A_t}\). Each element of this vector is the gradient of the similarity measure with respect to that voxel (i.e. image intensity).
If \(\mathcal{S}\) measures similarity between the motion image and the simulated motion image in terms of the sum of squared differences as
\[ \mathcal{S}_\text{SSD} = \sum_{\mathbf{x\in\Omega}}\left(\mathbf{P}_{\negthickspace t}(\mathbf{x}) - \mathbf{P}_{\negthickspace A_t}(\mathbf{x})\right)^2, \]
then the derivative with respect to the simulated motion image intensity for each voxel is given by
\[ \frac{\partial \mathcal{S}}{\partial p_{A_t,y}} = -2\left(p_{A_t,y} - p_{t,y}\right). \]
The derivative of the image similarity with respect to the partial image data measures the gradient direction in the space of the raw (or partial, or unsorted) image data. In order transform this into the space of the full, reconstructed image space, the adjoint of the image acquisition function \(A\) may be used.
The image acquisition process can be described by a time-dependent function \(A_t\), which maps the unknown underlying image \(\mathbf{I}_t\) – or the transformed reference state image \(\mathbf{I}_{T_t}\) into the acquisition space.
\[ \mathbf{P}_{\negthickspace t} = A_t(\mathbf{I}_t) + \varepsilon_t \]
As a consequence, the image acquisition \(A_t\) can be expressed as an \(N_p \times N_v\) matrix
\[ A_t = \begin{pmatrix} a_{t,1,1} & \ldots & a_{t,1,N_v} \\ \vdots & \ddots & \vdots \\ a_{t,N_p,1} & \ldots & a_{t,N_p,N_v} \end{pmatrix} \]
where each element \(a_{t,i,j}\) is the contribution of voxel \(j\) in the transformed reference state image to the partial image data voxel at position \(i\).
\[ a_{t,i,j} = \frac{\partial p_{A_t,i}}{\partial i_{T_t,j}} \]
Using the hermitian adjoint, i.e. complex conjugate, \(A^*_t = \left(\bar{A_t}\right)^\mathrm{T}\), the similarity difference measured in the partial image space can be mapped back to full image space. Consequently, \(A_t^*\) is of size \(N_v\times N_p\) when written in matrix form.
\[ A_t^* = \frac{\partial \mathbf{P}_{\negthickspace A_t}}{\partial \mathbf{I}_{T_t}} \]
Note, appropriate masking is required in the implementation of this mapping, depending on the reach of the partial image data on the full image data.
This derivative , \(\frac{\partial \mathbf{I}_{T_t}}{\partial\mathbf{M}_t}\) gives the changes of the intensity of the transformed reference image with respect to the motion parameters and is thus dependent on the transformation model used. In the special case of non-parametric registration, this is given by the warped spatial image gradient.
Hence, the derivative given in matrix notation is of size \(N_m \times N_v\).
\[ \frac{\partial \mathbf{I}_{T_t}}{\partial \mathbf{M}_t} = \begin{pmatrix} \frac{\partial i_{T_t,1}}{\partial m_{t,1}} & \ldots & \frac{\partial i_{T_t,N_v}}{\partial m_{t,1}}\\ \vdots & \ddots & \vdots \\ \frac{\partial i_{T_t,1}}{\partial m_{t,N_m}} & \ldots & \frac{\partial i_{T_t,N_v}}{\partial m_{t,N_m}} \end{pmatrix} \]
Note that \(N_m\) is the number of motion parameters which, in case of a parametric transformation is the product of the number of motion parameters (e.g. control points for a B-spline transformation), and in case of a non-parametric transformation the number of deformation vectors times the number of dimensions.
In case of a parametric transformation, such as a B-spline transformation, the chain rule is applied once more
\[ \frac{\partial \mathbf{I_{T_t}}}{\partial \mathbf{M}_t} = \frac{\partial \textbf{DVF}_t}{\partial \mathbf{M_t}} \frac{\partial \mathbf{I}_{T_t}}{\partial \textbf{DVF}_t}. \]
which means for each matrix entry
\[ \frac{\partial i_{T_t,j}}{\partial m_{t,i}} = \frac{\partial i_{T_t,j}}{\partial \textit{dvf}_{t,x}} \frac{\partial \textit{dvf}_{t,x}}{\partial m_{t,i}}. \]
This is the multiplication of the warped image gradient with the contribution of each control point to the transformation at a given position \(x\). The latter one can be calculated by convolving the deformation vector field with the B-spline base function and sampling accordingly.
The quadratic correspondence model can be generalised for any polynomial function as
\[ \mathbf{M}_t = \sum_{i=1}^{N_s} \mathbf{R}_i s_{i,t} \]
with the corresponding partial derivative with respect to the model parameters
\[ \frac{\partial\mathbf{M}_t}{\partial\mathbf{R}_i} = s_{i,t}. \]
In matrix notation the above equation can be written as a vertical stack of diagonal matrices:
\[ \frac{\partial \mathbf{M}_t}{\partial\mathbf{R}} =\begin{pmatrix} \frac{\partial\mathbf{M}_t}{\partial\mathbf{R}_1}\\ \frac{\partial\mathbf{M}_t}{\partial\mathbf{R}_2}\\ \vdots\\ \frac{\partial\mathbf{M}_t}{\partial\mathbf{R}_{N_s}} \end{pmatrix} = \begin{pmatrix} s_{1,t} \\ &\ddots\\ &&s_{1,t}\\ &\vdots\\ s_{N_{s},t} \\ &\ddots\\ &&s_{N_{s},t}\\ \end{pmatrix} \]
This matrix is of size \(N_r \times N_m\) and completes the collection of partial derivatives required for the gradient calculation. A summary of the complete procedure how to compute the gradient is presented in listing ...