Estimating a function from data is a central aspect of machine learning and a host of different methods exists for this task. Fundamentally a function is an object that associates an input value with a single output value. However for some tasks not all of the input values are observed meaning that for each input location multiple outputs are possible. An example of this would be a dynamic system who can be in multiple states and the state itself is not observed. Estimating a model in this scenario is often referred to as a ``data assiciation'' problem \cite{Bar-Shalom:1987, Cox93areview} where both the different functions and the association of the observations to a function needs to be estimated. A simple example of this can be seen in Figure~\ref{} where no single function could have generated the data. A slightly different view of the same problem is to consider the data to have been generated by a mixture of processes where we are interested to factorise the data into these components \cite{choi18_choic}. An application of the latter is where we want to want to explain away a noise process from an underlying signal. Inherently the data association problem is ill-posed and requires assumptions on both the underlying functions and the association of the observations. An early ... \todo[inline,color=green]{CARL: previous work here}

Estimating a function from data is a central aspect of machine learning and a host of different methods exists for this task.

Fundamentally a function is an object that associates an input value with a single output value.

However for some tasks not all of the input values are observed meaning that for each input location multiple outputs are possible.

An example of this would be a dynamic system who can be in multiple states and the state itself is not observed.

Estimating a model in this scenario is often referred to as a ``data assiciation'' problem \cite{Bar-Shalom:1987, Cox93areview} where both the different functions and the association of the observations to a function needs to be estimated.

A simple example of this can be seen in \cref{fig:semi_bimodal} where no single function could have generated the data.

A slightly different view of the same problem is to consider the data to have been generated by a mixture of processes where we are interested to factorise the data into these components \cite{choi18_choic}.

An application of the latter is where we want to want to explain away a noise process from an underlying signal.

Inherently the data association problem is ill-posed and requires assumptions on both the underlying functions and the association of the observations.

An early ... \todo[inline,color=green]{CARL: previous work here}

In this paper we formulate a Bayesian model for the data association problem. Underpinning our approach is the use of Gaussian process priors used to encode structure both on the functions and the associations themselves. This leads to a flexible yet interpretable model with a princpled treatment of uncertainty. Our model is non-stationary in the sense that a different number of modes can be ``activated'' in different locations in the input space. Importantly we describe this non-stationary structure using additional Gaussian process priors which allows us to make full use of problem specific knowledge.

In this paper we formulate a Bayesian model for the data association problem.

Underpinning our approach is the use of Gaussian process priors used to encode structure both on the functions and the associations themselves.

This leads to a flexible yet interpretable model with a princpled treatment of uncertainty.

Our model is non-stationary in the sense that a different number of modes can be ``activated'' in different locations in the input space.

Importantly we describe this non-stationary structure using additional Gaussian process priors which allows us to make full use of problem specific knowledge.

\todo[inline,color=green]{CARL: this is a start, could you have a go at adding some of the related work into the middle and then I can wrap this up when the experiments are fully in}