[edit]
Workshop Schedule
16:30 - 17:30 | Local Walk |
19:30 - 21:00 | Dinner |
21:00 - 21:45 | Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? [slides] |
Zoubin Ghahramani, University of Cambridge, Cambridge, U.K. | |
I'll present some thoughts and research directions in Bayesian machine learning. I'll contrast black-box approaches to machine learning with model-based Bayesian statistics. Can we meaningfully create Bayesian black-boxes? If so what should the prior be? Is non-parametrics the only way to go? Since we often can't control the effect of using approximate inference, are coherence arguments meaningless? How can we convert the pagan majority of ML researchers to Bayesianism? If the audience gets bored of these philosophical musings, I will switch to talking about our latest technical work on Indian buffet processes. |
Saturday 06 September
07:45 - 08:30 | Breakfast |
08:30 - 09:00 | Introduction [slides] |
Neil Lawrence, University of Manchester, Manchester, U.K. |
09:00 - 09:45 | On the relation between Bayesian inference and certain solvable problems of stochastic control [slides] |
Manfred Opper, Technical University, Berlin, Germany | |
Optimal control for nonlinear stochastic dynamical systems requires thesolution of a nonlinear PDE, the so - called Hamilton Jacobi Bellman equation.Recently, Bert Kappen [1] and Emanuel Todorov [2] have shown that for certain types of cost functions, this equationcan be transformed to a linear problem which is mathematically related to a Bayesian estimation problem. This has led to novel efficient algorithms for optimal control of such systems.
I will show a simple proof for this surprising result and discuss some possible implications. [1] Bert Kappen, A linear theory for control of non-linear stochastic systems Physical Review Letters, vol 95, p 200201(2005). [2] Emanuel Todorov, General duality between optimal control and estimationTodorov E (2008). Accepted in the 47th IEEE Conference on Decision and Control, http://www.cogsci.ucsd.edu/~todorov/papers.htm. |
09:45 - 10:00 | Questions and Discussion |
10:00 - 10:45 | Multi-task Learning with Gaussian Processes [slides] |
Chris Williams, University of Edinburgh, Edinburgh, U.K. | |
We consider the problem of multi-task learning, i.e. the
setup where there are multiple related prediction problems (tasks),
and we seek to improve predictive performance by sharing information
across the different tasks. We address this problem using Gaussian
process (GP) predictors, using a model that learns a shared
covariance function on input-dependent features and a ``free-form''
covariance matrix that specifies inter-task similarity. We discuss the
application of the method to a number of real-world problems such as
compiler performance prediction and learning robot inverse dynamics.
Joint work with Kian Ming Chai, Edwin Bonilla, Stefan Klanke, Sethu Vijayakumar (Edinburgh) |
10:45 - 11:00 | Questions and Discussion |
11:00 - 11:30 | Coffee |
11:30 - 12:15 | Latent Force Models with Gaussian Processes [slides] |
Neil Lawrence, University of Manchester, Manchester, U.K. | |
We are used to dealing with the situation where we have a latent
variable. Often we assume this latent variable to be independently
drawn from a distribution, e.g. probabilistic PCA or factor
analysis. This simplification is often extended for temporal data
where tractable Markovian independence assumptions are used
(e.g. Kalman filters or hidden Markov models).
In this talk we will consider the more general case where the latent variable is a forcing function in a differential equation model. We will show how for some simple ordinary differential equations the latent variable can be dealt with analytically for particular Gaussian process priors over the latent force. In this talk we will introduce the general framework, present results in systems biology preview extensions. Joint work with Magnus Rattray, Mauricio Alvarez, Pei Gao, Antti Honkela, David Luengo, Guido Sanguinetti and Michalis Titsias. |
12:15 - 12:30 | Questions and Discussion |
12:30 - 13:30 | Lunch |
13:30 - 16:00 | Local Walk |
16:00 - 16:45 | Bayesian learning of sparse factor loadings [slides] |
Magnus Rattray, University of Manchester, Manchester, U.K. | |
Learning sparse structure is useful in many applications. For example, gene regulatory networks are sparsely connected since each gene is typically only regulated by a small number of other genes. In this case factor analysis models with sparse loading matrices have been used to uncover the regulatory network from gene expression data. In this talk I will examine the performance of sparsity priors, such as mixture and L1 priors, by calculating learning curves for Bayesian PCA in the limit of large data dimension. This allows us to address a number of questions e.g. how well can we estimate sparsity using the marginal likelihood when the prior is not well-matched to the data generating process? |
16:45 - 17:00 | Questions and Discussion |
17:00 - 17:30 | Coffee |
17:30 - 18:15 | Covariance functions and Bayes errors for GP regression on random graphs [slides] |
Peter Sollich, King's College London, U.K. | |
We consider GP learning of functions defined on the nodes of a random
graph. Covariance functions proposed for this scenario, based on
diffusion processes on the graph, are shown to have some
counter-intuitive properties. In particular, on graphs with tree-like
structure where loops can be neglected (as is typically the case for
randomly generated graphs), the "obvious" limit of a large correlation
length scale does not produce a constant covariance function.
In the second part, we look at Bayes errors for GP regression on graphs and study how the learning curves depend on the size of the graph, its connectivity, and the number of training examples. Joint work with Camille Coti. |
18:15 - 18:30 | Questions and Discussion |
18:30 - 19:15 | Bayeswatch [slides] |
Amos Storkey, University of Edinburgh, Edinburgh, U.K. | |
Apart from the obvious problem of doing the sums, there are a number of theoretical difficulties associated with the practical business of using Bayesian methods. This talk will introduce a few of these, including the problem of model correctness (from a subjective Bayes standpoint), the irrelevant reward issue and the problem of monolithic probabilistic models. I will introduce the issues, to get them on the table, and talk about a Bayesian framework for allowing the discussion of these. |
19:15 - 19:30 | Questions and Discussion |
19:30 - 21:00 | Dinner |
Sunday 07 September
07:45 - 08:30 | Breakfast |
08:30 - 08:50 | The role of mechanistic models in Bayesian inference [slides] |
Dan Cornford, Aston University, Birmingham, U.K. | |
I'll outline the role of mechanistic models, or simulators, in defining priors in a Bayesian inference setting. In particular I will focus on two main cases: 1) where process based understanding of the system allows us to construct a stochastic simulator for the system - which translates to inference in stochastic processes; 2) where an existing (typically) deterministic mechanistic model exists - which we can then emulate and treat 'correctly' in a Bayesian manner. I will pay special attention to the relation between the simulator and reality, since it is reality that typically is sampled to generate the observations used for inference in the model. I will outline ideas from emulation, and show the challenges I think remain to be solved.
This is joint work with lots of people: Alexis Boukouvalas, Yuan Shen, Michael Vrettas, Manfred Opper and many others in the MUCM project. |
08:50 - 09:00 | Questions and Discussion |
09:00 - 09:45 | Probabilistic models for ranking and information extraction [slides] |
Ed Snelson, MSR, Cambridge | |
I will summarize some current approaches to information extraction, which aims to obtain structured information from unstructured text sources such as the web. I will then discuss whether Bayesian modelling may be useful in this area and describe a first attempt at extracting class-attributes from web search query logs. If time remains I will move on to discuss various models for probabilistic ranking, and where possible appropriate Bayesian inference techniques. |
09:45 - 10:00 | Questions and Discussion |
10:00 - 10:45 | Well-known shortcomings, advantages and computational challenges in Bayesian modelling: a few case stories [slides] |
Ole Winther, Technical University, Denmark | |
Bayesian inference can be used to judge the data fit quantitatively through the marginal likelihood. In many practical cases only one model is considered and parameter averaging is simply used to avoid overfitting. I show such an example for a large data set of genomic sequence tags where we want to predict how many new unique tags we will find if we perform new sequencing. The two parameter Yor-Pitman process is used and the results illustrate a few well-known facts: parameter averaging can be crucial and large data sets will expose the inadequacy of the model as seen by unrealistically narrow error-bars on (cross-validated) predictions. This indicates that we should come up with better models and being able to calculate the marginal likelihood for these models to perform model selection. In the second part of the talk I will discuss some of the computational challenges of calculating marginal likelihoods. Gaussian process classification is used as an example to illustrate that this is hard even for a uni-modal posterior. |
10:45 - 11:00 | Questions and Discussion |
11:00 - 11:30 | Coffee |
11:30 - 12:15 | Variational Model Selection for Sparse Gaussian Process Regression [slides] |
Michalis Titsias, University of Manchester, Manchester, U.K. | |
Model selection for sparse Gaussian process (GP) models is an important problem that involves the selection of both the inducing/active variables and the kernel parameters. We describe an auxiliary variational method for sparse GP regression that jointly learns the inducing variables and kernel parameters by minimizing the Kullback-Leibler divergence between an approximate distribution and the true posterior over the latent function values. The variational distribution is parametrized using an unconstrained distribution over inducing variables and a conditional GP prior. This framework allows us to compute a lower bound of the true log marginal likelihood which can be reliably maximized over the inducing inputs and the kernel parameters. We will show how we can reformulate several of the most advanced sparse GP methods, such as the subset of data (SD), DTC, FITC and PITC method, based on the above framework. |
12:15 - 12:30 | Questions and Discussion |
12:30 - 13:30 | Lunch |
13:30 - 14:15 | Negotiated Interaction : Iterative Inference and Feedback of Intention in HCI [slides] |
Roderick Murray Smith, University of Glasgow, U.K. | |
I will talk about an approach to human-computer interaction which makes the uncertainty in the computer's interpretation of the user's intentions tangible, supporting efficient and enjoyable interaction. I will present a 'liquid cursor' demonstration, as an example of making Bayesian inference concrete and visible, as evidence flows between the user and computer. I will present some current research challenges which I hope the BARK audience can engage with, including the use of complex models to shape interaction dynamics, and in measures of interaction between agents. Application examples from mobile interaction and Brain Computer Interaction will be used. |
14:15 - 14:30 | Questions and Discussion |
14:30 - 15:15 | Bayesian Inference and Learning to Control |
Carl Rasmussen, University of Cambridge, Cambridge, U.K. | |
Control is usually accomplished in two steps: first identifying the plant dynamics and secondly constructing a controller for this dynamics. However, usually there will be some discrepancy between the actual dynamics and the inferred model (due to approximations, limited range of validity, etc), and the optimal controller for the inferred dynamics may not behave well on the actual plant. In this work, we sidestep these issues by learning a controller based on observations of the real plant: we don't need an explicit identification, and uncertainties in the models and plant are properly integrated out in the Bayesian formalism. We show surprisingly fast learning in illustrative control problems with continuous states and discrete time.
Joint work with Marc P. Deisenroth. |
15:15 - 15:30 | Questions and Discussion |
15:30 - 16:30 | Washing Up --- Workshop Summary (with Coffee) |
Joaquin Quinoñero Candela, Microsoft Research, Cambridge, U.K. |