
@@ 39,7 +39,7 @@

39

39

]

40

40


41

41

\begin{abstract}

42


 The data association problem is concearned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality.


42

+ The data association problem is concerned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality.

43

43

We present a fully Bayesian approach to this problem.

44

44

Our model is capable of simultaneously solving the data association problem and the induced supervised learning problems.

45

45

Underpinning our approach is the use of Gaussian process priors to encode the structure of both the data and the data associations.


@@ 50,10 +50,10 @@

50

50

\section{Introduction}

51

51

\label{sec:introduction}

52

52

Realworld data often include multiple operational regimes of the considered system, for example a wind turbine or gas turbine~\parencite{hein_benchmark_2017}.

53


As an example, consider a model describing the lift resulting from airflow around a wing profile as a function of attack angle.


53

+As an example, consider a model describing the lift resulting from airflow around the wing profile of an airplane as a function of attack angle.

54

54

At a low angle the lift increases linearly with attack angle until the wing stalls and the characteristic of the airflow fundamentally changes.

55

55

Building a truthful model of such data requires learning two separate models and correctly associating the observed data to each of the dynamical regimes.

56


A similar example would be if our sensors that measure the lift are faulty in a manner such that we either get a accurate reading or a noisy one.


56

+A similar example would be if our sensors that measure the lift are faulty in a manner such that we either get an accurate reading or a noisy one.

57

57

Estimating a model in this scenario is often referred to as a \emph{data association problem}~\parencite{BarShalom:1987, Cox93areview}, where we consider the data to have been generated by a mixture of processes and we are interested in factorising the data into these components.

58

58


59

59

\Cref{fig:choicenet_data} shows an example of faulty sensor data, where sensor readings are disturbed by uncorrelated and asymmetric noise.


@@ 316,7 +316,7 @@ We collect the latent multilayer function values as $\mat{F^\prime} = \Set{\mat

316

316

\begin{align}

317

317

\begin{split}

318

318

\label{eq:deep_variational_distribution}

319


 \Variat*{\mat{F^\prime}, \mat{\alpha}, \mat{U^\prime}}


319

+ \MoveEqLeft\Variat*{\mat{F^\prime}, \mat{\alpha}, \mat{U^\prime}} = \\

320

320

= &\Variat*{\mat{\alpha}, \Set*{\mat{u_\alpha^{\pix{k}}}}_{k=1}^K, \Set*{\mat{F_l^{\prime\pix{k}}}, \mat{u_l^{\prime\pix{k}}}}_{k=1,l=1}^{K,L}} \\

321

321

= &\prod_{k=1}^K\prod_{n=1}^N \Prob*{\mat{\alpha_n^{\pix{k}}} \given \mat{u_\alpha^{\pix{k}}}, \mat{x_n}}\Variat*{\mat{u_\alpha^{\pix{k}}}} \\

322

322

\MoveEqLeft\prod_{k=1}^K \prod_{l=1}^L \prod_{n=1}^N \Prob*{\mat{f_{n,l}^{\prime\pix{k}}} \given \mat{u_l^{\prime\pix{k}}}, \mat{x_n}}\Variat*{\mat{u_l^{\prime\pix{k}}}},


@@ 366,7 +366,8 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the

366

366

\newcolumntype{Z}{>{\columncolor{sStone!33}\centering\arraybackslash}X}%

367

367

\begin{tabularx}{\linewidth}{rYYZZZZZZ}

368

368

\toprule

369


 Outliers & DAGP (MLL) & DAGP (RMSE) & CN & MDN & MLP & GPR & LGPR & RGPR \\


369

+ Outliers & DAGP & DAGP & CN & MDN & MLP & GPR & LGPR & RGPR \\


370

+ & \scriptsize MLL & \scriptsize RMSE & \scriptsize RMSE & \scriptsize RMSE & \scriptsize RMSE & \scriptsize RMSE & \scriptsize RMSE & \scriptsize RMSE \\

370

371

\midrule

371

372

0\,\% & 2.86 & \textbf{0.008} & 0.034 & 0.028 & 0.039 & \textbf{0.008} & 0.022 & 0.017 \\

372

373

20\,\% & 2.71 & \textbf{0.008} & 0.022 & 0.087 & 0.413 & 0.280 & 0.206 & 0.013 \\


@@ 453,7 +454,8 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the

453

454

\label{fig:semi_bimodal:c}

454

455

Normalized samples from the assignment process $\mat{\alpha}$ of the model shown in \cref{fig:semi_bimodal}.

455

456

The assignment process is used to weigh the predictive distributions of the different modes depending on the position in the input space.

456


 The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$ and the outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$.


457

+ The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$.


458

+ Outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$.

457

459

}

458

460

\end{figure}

459

461

%


@@ 481,7 +483,7 @@ To avoid pathological solutions for high outlier ratios, we add a prior to the l

481

483

The model proposed in~\parencite{choi_choicenet_2018}, called ChoiceNet (CN), is a specific neural network structure and inference algorithm to deal with corrupted data.

482

484

In their work, they compare their approach to a standard multilayer perceptron (MLP), a mixture density network (MDN), standard Gaussian process regression (GPR), leveraged Gaussian process regression (LGPR)~\parencite{choi_robust_2016}, and infinite mixtures of Gaussian processes (RGPR)~\parencite{rasmussen_infinite_2002}.

483

485

\Cref{tab:choicenet} shows results for outlier rates varied from 0\,\% to 80\,\%.

484


Besides the root mean squared error (RMSE), we also report the mean test log likelihood (MLL) of the process representing the target function in our model.


486

+Besides the root mean squared error (RMSE), we also report the mean test log likelihood (MLL) of the process representing the signal in our model.

485

487


486

488

Up to an outlier rate of 40\,\%, our model correctly identifies the outliers and ignores them, resulting in a predictive posterior of the signal equivalent to standard GP regression without outliers.

487

489

In the special case of 0\,\% outliers, DAGP correctly identifies that the process modelling the noise is not necessary and disables it, thereby simplifying itself to standard GP regression.


@@ 553,7 +555,7 @@ Our third experiment is based on the cartpole benchmark for reinforcement learn

553

555

In this benchmark, the objective is to apply forces to a cart moving on a frictionless track to keep a pole, which is attached to the cart via a joint, in an upright position.

554

556

We consider the regression problem of predicting the change of the pole's angle given the current state of the cart and the action applied.

555

557

The current state of the cart consists of the cart's position and velocity and the pole's angular position and velocity.

556


To simulate a dynamical system with changing states of operation our experimental setup is to sample trajectories from two different cartpole systems and merging the resulting data into one training set.


558

+To simulate a dynamical system with changing system characteristics our experimental setup is to sample trajectories from two different cartpole systems and merging the resulting data into one training set.

557

559

The task is not only to learn a model which explains this data well, but to solve the association problem introduced by the different system configurations.

558

560

This task is important in reinforcement learning settings where we study systems with multiple operational regimes.

559

561



@@ 606,7 +608,7 @@ The data association problem is inherently illconstrained and requires signific

606

608

In this paper, we make use of interpretable Gaussian process priors allowing global a priori information to be included into the model.

607

609

Importantly, our model is able to exploit information both about the underlying functions and the association structure.

608

610

We have derived a principled approximation to the marginal likelihood which allows us to perform inference for flexible hierarchical processes.

609


In future work, we would like to incorporate the proposed model in a reinforcement learning scenario where we study a dynamical system with state changes.


611

+In future work, we would like to incorporate the proposed model in a reinforcement learning scenario where we study a dynamical system with different operational regimes.

610

612


611

613


612

614

\printbibliography
