
@@ 48,17 +48,18 @@ This setup emulates an industrial system in which, for example due to wear or de

48

48

In this setting, we want to recover both joint predictions marginalizing the current state of operation but also informative models for these separate states.

49

49


50

50

Estimating a model in this scenario is often referred to as a \emph{data association problem} \parencite{BarShalom:1987, Cox93areview} where both the different functions and the associations of the observations to a function need to be estimated.

51


A simple example of this can be seen in \cref{fig:semi_bimodal:b}, where no single function could have generated the data.


51

+A simple example of this can be seen in \cref{fig:semi_bimodal}, where no single function could have generated the data.

52

52

A slightly different view of the same problem is to consider the data to have been generated by a mixture of processes where we are interested to factorise the data into these components~\parencite{choi_choicenet_2018}.

53


The separation of underlying signal and a noise process is an application of the latter, where we consider certain observations to be noise and others to be signal\todo{noisy data reference?}.


53

+The separation of underlying signal and a noise process is an application of the latter, where we consider certain observations to be noise and others to be signal~\parencite{rousseeuw_robust_2005,hodge_survey_2004}.

54

54


55

55

Early approaches to explaining data using multiple generative processes is based on separating the input space and training local expert models explaining easier subtasks~\parencite{jacobs_adaptive_1991,tresp_mixtures_2001, rasmussen_infinite_2002}.

56

56

The assignment of data points to local experts is handled by a gating network, which learns a function from the inputs to assignment probabilities.

57

57

However, it is still a central assumption of these models that the underlying generative process is unimodal.

58

58

That is, at every position in the input space, exactly one expert explains the data.

59


On the input space as a whole, this induces nonstationary behaviour through the different experts.


59

+Another approach is presented in~\parencite{bishop_mixture_1994}, where the multimodal regression tasks are interpreted as a density estimation problem.


60

+A high number of candidate distributions are reweighed to match the observed data without modeling the underlying generative process.

60

61


61


In contrast, we are interested in a generative process where data at the same location in the input space could have been generated by multiple independent processes.


62

+In contrast, we are interested in a generative process where data at the same location in the input space could have been generated by a number of global independent processes.

62

63

Inherently, the data association problem is illposed and requires assumptions on both the underlying functions and the association of the observations.

63

64

In~\parencite{lazarogredilla_overlapping_2011}, the authors place Gaussian process priors on the different generative processes which are assumed to be relevant globally.

64

65

The associations are modelled via a latent association matrix and the model is trained using an EM algorithm.
