Constrained Local Models

A Constrained Local Model (CLM) is class of methods of locating sets of points (constrained by a statistical shape model) on a target image.
The general approach is to

Sample a region from the image around the current estimate, projecting it into a reference frame
For each point, generate a "response image" giving a cost for having the point at each pixel
Searching for a combination of points which optimises the total cost, by manipulating the shape model parameters

Fig.1 Sampling into the reference frame, then applying local models to compute response images R(x)

The best fit is found by finding the shape and pose parameters to minimise:

The term "Constrained Local Model" originally referred to a particular type of model, in which the response images were generated by applying normalised correlation with a local patch, where the model patches are modified to fit the current face but constrained by a global texture model [1,2].

However, the term has come to mean any method in which a set of local models are used to generate response images, then a shape model is used to search for the best combined response - thus earlier work by Cristinacce would also come under this revised definition [3,4].
The approach has been adopted by many others, most notably Saragih et al. who have used more sophisticated local models and a mean-shift shape matching strategy to get good results [5].

Random Forest Regression Voting

We have recently demonstrated that impressive performance can be achieved by using Random Forests to vote for the best position for each point [6][7][8].
In particular, Random Forest regressors are trained independently for each model point.
Each tree is trained on patches sampled at many random displacements:

Use Haar wavelets as features (fast when using integral images)
Each tree predicts the position of the target point given an image patch

During model matching, each is scanned across the region of interest:

Each patch gives multiple votes (one per tree) for the point position
The votes are accumulated to create response image, R(x)

Example from BioID, using models trained on AFLW:

References

[1] D. Cristinacce and T.F.Cootes, "Feature Detection and Tracking with Constrained Local Models", Proc. British Machine Vision Conference, Vol. 3, pp.929-938, 2006 (PDF)

[2] D.Cristinacce and T.F.Cootes, "Automatic Feature Localisation with Constrained Local Models", Pattern Recognition Vol.41, No.10, pp.3054-3067

[3] D.Cristinacce and T.F.Cootes, "A comparison of shape constrained facial feature detectors", Proc. Int.Conf on Face and Gesture Recognition, 2004, pp.375-380. (PDF)

[4] D. Cristinacce and T.F. Cootes, "Facial Feature Detection and Tracking with Automatic Template Selection", Proc. 7th IEEE International Conference on Automatic Face and Gesture Recognition 2006, pp. 429-434. (PDF)

[5] J.M.Saragih and S.Lucey and J.F.Cohn, "Deformable Model Fitting by Regularized Mean-Shifts", International Journal of Computer Vision, pp.200-215, 2011.

[6] T.F.Cootes, M.Ionita, C.Lindner and P.Sauer, "Robust and Accurate Shape Model Fitting using Random Forest Regression Voting", ECCV 2012 (PDF)

[7] C.Lindner, P.A.Bromiley, M.C.Ionita and T.F. Cootes,"Robust and Accurate Shape Model Matching using Random Forest Regression-Voting", IEEE Trans. PAMI, Vol.37, No.9, pp.1862-1874, 2015 (here)

[8] C. Lindner, S. Thiagarajah, J.M.Wilkinson, The arcOGEN Consortium, G.A. Wallis and T.F.Cootes, "Fully Automatic Segmentation of the Proximal Femur Using Random Forest Regression Voting",IEEE Trans. Medical Imaging, Vol. 32, No. 8, pages 1462-1472, 2013. (doi) , 2013