Background Reading

C. E. Rasmussen and C. K. I. Williams. (2006) "Gaussian processes for machine learning", MIT Press, Cambridge, MA.


Gaussian Processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decase, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targetd at researchers and students in machine learning and applied statistics.

The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines, and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

J.Quiñonero Candela and C. E. Rasmussen. (2005) "A unifying view of sparse approximate Gaussian process regression" in Journal of Machine Learning Research 6, pp 1939--1959 [PDF][Google Scholar Search]


We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relashionship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.

Y. Engel, P. Szabo and D. Volkinshtein. (2006) "Learning to control an Octopus arm with Gaussian process temporal difference methods" in Y. Weiss, B. Schölkopf and J. C. Platt (eds) Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA. [Postscript][PDF][Google Scholar Search]


The Octopus arm is a highly versatile and complex limb. How the Octopus controls such a hyper-redundant arm (not to mention eight of them!) is as yet unknown. Robotic arms based on the same mechanical principles may render present day robotic arms obsolete. In this paper, we tackle this control problem using an online reinforcement learning algorithm, based on a Bayesian approach to policy evaluation known as Gaussian process temporal difference (GPTD) learning. Our substitute for the real arm is a computer simulation of a 2-dimensional model of an Octopus arm. Even with the simplifications inherent to this model, the state space we face is a high-dimensional one. We apply a GPTD-based algorithm to this domain, and demonstrate its operation on several learning tasks of varying degrees of difficulty.

R. Urtasun, D. J. Fleet, A. Hertzmann and P. Fua. (2005) "Priors for people tracking from small training sets" in IEEE International Conference on Computer Vision (ICCV), IEEE Computer Society Press, Bejing, China. [IEEE Xplore][PDF][Google Scholar Search]


We advocate the use of Scaled Gaussian Process Latent Variable Models (SGPLVM) to learn prior models of 3D human pose for 3D people tracking. The SGPLVM simultaneously optimizes a low-dimensional embedding of the high dimensional pose data and a density function that both gives higher probability to points close to training data and provides a nonlinear probabilistic mapping from the low dimensional latent space to the full-dimensional pose space. The SGPLVM is a natural choice when only small amounts of training data are available. We demonstrate our approach with two distinct motions, golfing and walking. We show that the SGPLVM sufficiently constrains the problem such that tracking can be accomplished with straightforward deterministic optimization.

K. Grochow, S. L. Martin, A. Hertzmann and Z. Popovic. (2004) "Style-based inverse kinematics" in ACM Transactions on Graphics (SIGGRAPH 2004), [PDF][Web page][Google Scholar Search]


This paper presents an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in real time. Training the model on different input data leads to different styles of IK. The model is represented as a probability distribution over the space of all possible poses. This means that our IK system can generate any pose, but prefers poses that are most similar to the space of poses in the training data. We represent the probability with a novel model called a Scaled Gaussian Process Latent Variable Model. The parameters of the model are all learned automatically; no manual tuning is required for the learning component of the system. We additionally describe a novel procedure for interpolating between styles.

Our style-based IK can replace conventional IK, wherever it is used in computer animation and computer vision. We demonstrate our system in the context of a number of applications: interactive character posing, trajectory keyframing, real-time motion capture with missing markers, and posing from a 2D image.

C. K. I. Williams (1998) "Prediction with Gaussian processes: from linear regression to linear prediction and beyond" in M. I. Jordan (ed.) Learning in Graphical Models, Kluwer, Dordrecht, The Netherlands. [Gzipped Postscript][Google Scholar Search]


The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads into a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions fro neural network models and the use of Gaussian processes in classification problems.

D. J. C.MacKay (1998) "Introduction to Gaussian Processes" in C. M. Bishop (ed.) Neural Networks and Machine Learning, Springer-Verlag, Berlin. [Gzipped Postscript][Google Scholar Search]


Feedforward neural networks such as multilayer perceptrons are popular tools for nonlinear regression and classification problems. From a Bayesian perspective, a choice of a neural network model can be viewed as defining a prior probability distribution over non-linear functions, and the neural network's learning process can be interpreted in terms of the posterior probability distribution over the unknown function. (Some learning algorithms search for the function with maximum posterior probability and other Monte Carlo methods draw samples from this posterior probability). In the limit of large but otherwise standard networks, [Neal:book96] has shown that the prior distribution over non-linear functions implied by the Bayesian neural network falls in a class of probability distributions known as Gaussian processes. The hyperparameters of the neural network model determine the characteristic lengthscales of the Gaussian process. Neal's observation motivates the idea of discarding parameterized networks and working directly with Gaussian processes. Computations in which the parameters of the network are optimized are then replaced by simple matrix operations using the covariance matrix of the Gaussian process. In this chapter I will review work on this idea by [Williams:Gaussian96], [Neal:montecarlogp97], [Barber:Gaussian97] and [Gibbs:variational00], and will assess whether, for supervised regression and classification tasks, the feedforward network has been superceded.