Browse Source

Do not talk about modes in Cart-Pole experiment

icml
Markus Kaiser 3 years ago
parent
commit
c0661d861f
  1. BIN
      dynamic_dirichlet_deep_gp.pdf
  2. 25
      dynamic_dirichlet_deep_gp.tex

BIN
dynamic_dirichlet_deep_gp.pdf

Binary file not shown.

25
dynamic_dirichlet_deep_gp.tex

@ -538,7 +538,6 @@ At $x = -10$ the inferred modes and assignment processes start reverting to thei
\bottomrule
\end{tabular}
\end{table*}
\todo[inline]{Avoid talking about modes}
Our third experiment is based on the cart-pole benchmark for reinforcement learning as described by~\textcite{barto_neuronlike_1983} and implemented in OpenAI Gym~\parencite{brockman_openai_2016}.
In this benchmark, the objective is to apply forces to a cart moving on a frictionless track to keep a pole, which is attached to the cart via a joint, in an upright position.
We consider the regression problem of predicting the change of the pole's angle given the current state of the cart and the action applied.
@ -557,7 +556,7 @@ We consider three test sets, one sampled from the default system, one sampled fr
They are generated by sampling trajectories with an aggregated size of 5000 points from each system for the first two sets and their concatenation for the mixed set.
For this data set, we use squared exponential kernels for both the $f^{\pix{k}}$ and $\alpha^{\pix{k}}$ and 100 inducing points in every GP.
We evaluate the performance of deep GPs with up to 5 layers and squared exponential kernels as models for the different modes.
We evaluate the performance of deep GPs with up to 5 layers and squared exponential kernels as models for the different functions.
As described in~\parencite{salimbeni_doubly_2017}, we use identity mean functions for all but the last layers and initialize the variational distributions with low covariances.
We compare our models with three-layer relu-activated Bayesian neural networks with added latent variables (BNN+LV) as introduced by~\textcite{depeweg_learning_2016}.
These latent variables can be used to effectively model multimodalities and stochasticity in dynamical systems for model-based reinforcement learning~\parencite{depeweg_decomposition_2018}.
@ -566,26 +565,26 @@ They are trained on the mixed data set, the default system and the short-pole sy
\Cref{tab:cartpole} shows mean training and test log likelihoods and their standard error over ten runs for these models.
The \emph{mixed}-column corresponds to training and test log likelihoods for a standard regression problem, which in this case is a bimodal one.
The GPR model trained on the mixed data set shows the worst performance, since its predictions are single Gaussians spanning both modes.
Additionally, the mean prediction is approximately the mean of the two modes and is physically implausible.
The GPR model trained on the mixed data set shows the worst performance, since its predictions are single Gaussians spanning both system states.
Additionally, the mean prediction is approximately the mean of the two states and is physically implausible.
Both the BNN+LV and DAGP models perform substantially better as they can model the bimodality.
BNN+LV assumes continuous latent variables and a bimodal distribution can be recovered by approximately marginalizing these latent variables via sampling.
The predictive posterior of unknown shape is approximated using a mixture of many Gaussians.
Compared to the shallow DAGP, the prior of BNN+LV is harder to interpret, as the DAGP's generative process produces a mixture of two Gaussians representing the two modes in the data.
Adding more layers to the DAGP model leads to more expressive models whose priors on the different modes become less informative.
Compared to the shallow DAGP, the prior of BNN+LV is harder to interpret, as the DAGP's generative process produces a mixture of two Gaussians representing the two processes in the data.
Adding more layers to the DAGP model leads to more expressive models whose priors on the different processes become less informative.
For this cart-pole data, two-layer deep GPs seem to be a good compromise between model expressiveness and the strength of the prior, as they are best able to separate the data into the two separate dynamics.
On the \emph{mixed} test set, DAGP and BNN+LV both show comparable likelihoods.
However, the DAGP is a more expressive model, whose different modes can be evaluated further.
However, the DAGP is a more expressive model, whose different components can be evaluated further.
The results in the \emph{default only} and \emph{short-pole only} columns compare training and test likelihoods on the parts of the training and test sets corresponding to these systems respectively.
We calculate these values by evaluating both modes separately on the data sets and reporting the higher likelihood.
We calculate these values by evaluating both functions separately on the data sets and reporting the higher likelihood.
We compare these results with sparse GP models trained only on the respective systems.
The two modes of DAGP reliably separate the two different systems.
In fact, the mode corresponding to the \emph{default} system in the two-layer DAGP shows equal test performance to the corresponding GPR model trained only on data of this mode.
The two functions of DAGP reliably separate the two different systems.
In fact, the function corresponding to the \emph{default} system in the two-layer DAGP shows equal test performance to the corresponding GPR model trained only on data of this system.
The \emph{default} and \emph{short-pole} systems are sufficiently different such that the sparse GPs trained on only one of the two sets performs very poorly.
Out of these two systems, the \emph{short-pole} system is more complicated and harder to learn.
The second mode of DAGP still recovers an adequate model.
Given the fact that the two modes of DAGP model the two system dynamics in the original data, sampling trajectories from them results in physically plausible data, which is not possible with a sparse GP or BNN+LV model.
Out of these two systems, the \emph{short-pole} system is more complicated and harder to learn due to higher instability.
The second function of DAGP still recovers an adequate model.
Given the fact that the two functions of DAGP model the two system dynamics in the original data, sampling trajectories from them results in physically plausible data, which is not possible with a sparse GP or BNN+LV model.
\section{Conclusion}

Loading…
Cancel
Save