@ -280,7 +285,7 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the

In this section we investigate the behavior of the DMGP model in multiple regression settings.

First, we apply the DMGP to an artificial data set and showcase how the different components of the model interact to identify unimodal and multi-modal parts of the input space.

Second, we show how different priors on the different modes can be used to separate a signal from unrelated noise.

\todo{Reformulate in accordance with the introduction}And third, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DMGP can recover information about both systems.

\todo{Reformulate when intro is done}And third, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DMGP can recover information about both systems.

We use an implementation of DMGP in TensorFlow \parencite{tensorflow2015-whitepaper} based on GPflow \parencite{matthews_gpflow_2017} and the implementation of doubly stochastic variational inference \parencite{salimbeni_doubly_2017}.

@ -430,7 +435,7 @@ For high outlier rates, strong prior knowledge about the signal would be require

While the function has still been identified well, some of the noise is also explained using this mode, thereby introducing slight errors in the predictions.

\subsection{Cartpole data}

\subsection{Mixed Cart-pole systems}

\label{subsec:cartpole}

\begin{table*}[t]

\centering

@ -475,7 +480,7 @@ We use the implementation provided by OpenAI Gym \parencite{brockman_openai_2016

In this benchmark, the objective is to apply forces to a cart moving on a frictionless track to keep a pole attached to the cart by a joint in an upright position.

We consider the regression problem of predicting the change of the pole's angle given the current state of the cart and the action applied.

The current state of the cart is given by the cart's position and velocity and the pole's angular position and velocity.

\todo{Reformulate in accordance with the introduction} To simulate a dynamical system with changing states of operation, our experimental setup is to sample trajectories from two different cart-pole systems and merging the resulting data into one training set.

\todo{Reformulate when intro is done} To simulate a dynamical system with changing states of operation, our experimental setup is to sample trajectories from two different cart-pole systems and merging the resulting data into one training set.

The task is to not only learn a model which explains this data well but to recover the multi-modalities introduced by the different system configurations.

We sample trajectories from the system by initializing the pole in an almost upright position and then applying 10 uniform random actions.

@ -500,9 +505,9 @@ Additionally, the mean prediction is approximately the mean of the two modes and

Both the BNN+LV and DMGP models perform substantially better as they can model the bimodality.

BNN+LV assumes continuous latent variables and a bimodal distribution can be recovered by approximately marginalizing these latent variables via sampling.

The predictive posterior of unknown shape is approximated using a mixture of many Gaussians.

In contrast, a shallow DMGP has a stronger and interpretable prior and the predictive posterior is a mixture of two Gaussians representing the two modes in the data.

Adding more layers ot the DMGP model leads to more expressive models whose prior becomes less informative.

\todo{Maybe fix this with later results?} For this cartpole data, two-layer deep GPs seem to be a good compromise between model expressiveness and the strength of the prior.

Compared to the shallow DMGP, the prior of BNN+LV is harder to interpret, as the DMGP's generative process produces a mixture of two Gaussians representing the two modes in the data.

Adding more layers to the DMGP model leads to more expressive models whose prior on the different modes becomes less informative.

\todo{Revisit with final results}For this cart-pole data, two-layer deep GPs seem to be a good compromise between model expressiveness and the strength of the prior.

On the \emph{mixed} test set, DMGP and BNN+LV both show comparable likelihoods.

However, the DMGP is a more expressive model whose different modes can be evaluated further.