Browse Source

Switch first and second experiment

icml
Markus Kaiser 3 years ago
parent
commit
8fffe81241
  1. BIN
      dynamic_dirichlet_deep_gp.pdf
  2. 141
      dynamic_dirichlet_deep_gp.tex

BIN
dynamic_dirichlet_deep_gp.pdf

Binary file not shown.

141
dynamic_dirichlet_deep_gp.tex

@ -43,7 +43,7 @@
Underpinning our approach is the use of Gaussian process priors which encode structure both on the functions and the associations themselves.
The association of samples and functions are determined by taking both inputs and outputs into account while also obtaining a posterior belief about the relevance of the global components throughout the input space.
We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors.
We show results for an artificial data set, a noise separation problem, and a multimodal regression problem based on the cart-pole benchmark.
We show results for a noise separation proble, an artificial multimodal data set, and a multimodal regression problem based on the cart-pole benchmark.
\end{abstract}
@ -296,45 +296,6 @@ Samples from the predictive density over $\Variat*{\mat{a_\ast} \given \mat{x_\a
The distribution $\Variat*{\mat{a_\ast} \given \mat{x_\ast}}$ reflects the model's belief about how many and which of the $K$ modes are relevant at the test location $\mat{x_\ast}$.
%
\begin{figure*}[t]
\centering
\begin{subfigure}{.495\linewidth}
\centering
\includestandalone{figures/semi_bimodal_joint}
% \caption{
% \label{fig:semi_bimodal:a}
% Joint posterior.
% }
\end{subfigure}
\hfill
\begin{subfigure}{.495\linewidth}
\centering
\includestandalone{figures/semi_bimodal_attrib}
% \caption{
% \label{fig:semi_bimodal:b}
% }
\end{subfigure}
\caption{
\label{fig:semi_bimodal}
The DAGP posterior on an artificial data set with bimodal and trimodal parts.
The left plot shows the joint predictions which are mixtures of four Gaussians weighed by the assignment probabilities shown in \cref{fig:semi_bimodal:c}.
The weights are represented via the opacity of the modes, which shows that the orange mode is completely disabled and the red mode only relevant around the interval $[0, 5]$.
The right plot shows the posterior belief about the assignment of the training data to the respective modes.
}
\end{figure*}
%
\begin{figure}[t]
\centering
\includestandalone{figures/semi_bimodal_attrib_process}
\caption{
\label{fig:semi_bimodal:c}
Normalized samples from the assignment process $\mat{\alpha}$ of the model shown in \cref{fig:semi_bimodal}.
The assignment process is used to weigh the predictive distributions of the different modes depending on the position in the input space.
The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$ and the outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$.
}
\end{figure}
%
\subsection{Deep Gaussian Processes}
\label{subsec:deep_gp}
For clarity, we have described the variational bound in terms of a shallow GP.
@ -388,35 +349,6 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the
\section{Experiments}
\label{sec:experiments}
In this section we investigate the behavior of the DAGP model in multiple regression settings.
First, we apply the DAGP to an artificial data set and showcase how the different components of the model interact to identify unimodal and multimodal parts of the input space.
Second, we show how different priors on the different modes can be used to separate a signal from unrelated noise.
Finally, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DAGP can recover information about both systems and infer the state variable separating the systems.
We use an implementation of DAGP in TensorFlow~\parencite{tensorflow2015-whitepaper} based on GPflow~\parencite{matthews_gpflow_2017} and the implementation of DSVI~\parencite{salimbeni_doubly_2017}.
\subsection{Artificial Data Set}
\label{subsec:semi_bimodal}
To demonstrate inference in our model, we begin with an experiment based on an artificial data set.
The data, together with recovered posterior attributions, can be seen in \cref{fig:semi_bimodal}.
We uniformly sample 350 data points in the interval $x \in [-2\pi, 2\pi]$ and obtaine $y_1 = \Fun{\sin}{x} + \epsilon$, $y_2 = \Fun{\sin}{x} - 2 \Fun{\exp}{-\sfrac{1}{2} \cdot (x-2)^2} + \epsilon$ and $y_3 = -1 - \sfrac{3}{8\pi} \cdot x + \sfrac{3}{10} \cdot \Fun*{\sin}{2x} + \epsilon$ with additive independent noise $\epsilon \sim \Gaussian*{0, 0.005^2}$.
The resulting data set $\D = \Set{\left( x, y_1 \right), \left( x, y_2 \right), \left( x, y_3 \right)}$ is trimodal in the interval $[0, 5]$ and is otherwise bimodal with one mode containing double the amount of data than the other.
We use squared exponential kernels as priors for both the $f^{\pix{k}}$ and $\alpha^{\pix{k}}$ and $25$ inducing points in every GP.
\Cref{fig:semi_bimodal,fig:semi_bimodal:c} show the posterior of an DAGP with $K = 4$ modes applied to the data, which correctly identified the underlying functions.
\Cref{fig:semi_bimodal} shows the posterior belief about the assignments $\mat{A}$ and illustrates that DAGP recovered that it needs only three of the four available modes to explain the data.
One of the modes is only assigned points in the interval $[0, 5]$ where the data is actually trimodal.
This separation is explicitly represented in the model via the assignment processes $\mat{\alpha}$ shown in \cref{fig:semi_bimodal:c}.
The model has disabled the mode $k = 2$ in the complete input space and has learned that the mode $k = 1$ is only relevant in the interval $[0, 5]$ where the three enabled modes each explain about a third of the data.
Outside this interval, the model has learned that one of the modes has about twice the assignment probability than the other one, thus correctly reconstructing the true generative process.
The DAGP is implicitly incentivized to explain the data using as few modes as possible through the likelihood term of the inferred $\mat{a_n}$ in \cref{eq:variational_bound}.
At $x = -10$ the inferred modes and assignment processes start reverting to their respective priors away from the data.
\subsection{Robust Regression}
\label{subsec:choicenet}
%
\begin{figure*}[t]
\centering
@ -469,7 +401,57 @@ At $x = -10$ the inferred modes and assignment processes start reverting to thei
}
\end{figure*}
%
Our second experiment applies DAGP to a one-dimensional regression problem with uniformly distributed outliers in the training data.
%
\begin{figure*}[t]
\centering
\begin{subfigure}{.495\linewidth}
\centering
\includestandalone{figures/semi_bimodal_joint}
% \caption{
% \label{fig:semi_bimodal:a}
% Joint posterior.
% }
\end{subfigure}
\hfill
\begin{subfigure}{.495\linewidth}
\centering
\includestandalone{figures/semi_bimodal_attrib}
% \caption{
% \label{fig:semi_bimodal:b}
% }
\end{subfigure}
\caption{
\label{fig:semi_bimodal}
The DAGP posterior on an artificial data set with bimodal and trimodal parts.
The left plot shows the joint predictions which are mixtures of four Gaussians weighed by the assignment probabilities shown in \cref{fig:semi_bimodal:c}.
The weights are represented via the opacity of the modes, which shows that the orange mode is completely disabled and the red mode only relevant around the interval $[0, 5]$.
The right plot shows the posterior belief about the assignment of the training data to the respective modes.
}
\end{figure*}
%
\begin{figure}[t]
\centering
\includestandalone{figures/semi_bimodal_attrib_process}
\caption{
\label{fig:semi_bimodal:c}
Normalized samples from the assignment process $\mat{\alpha}$ of the model shown in \cref{fig:semi_bimodal}.
The assignment process is used to weigh the predictive distributions of the different modes depending on the position in the input space.
The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$ and the outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$.
}
\end{figure}
%
In this section we investigate the behavior of the DAGP model in multiple regression settings.
First, we show how prior knowledge about the different generative processes can be used to separate a signal from unrelated noise.
Second, we apply the DAGP to a multimodal data set and showcase how the different components of the model interact to identify how many modes are necessary to explain the data.
Finally, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DAGP can recover information about both systems and infer the state variable separating the systems.
We use an implementation of DAGP in TensorFlow~\parencite{tensorflow2015-whitepaper} based on GPflow~\parencite{matthews_gpflow_2017} and the implementation of DSVI~\parencite{salimbeni_doubly_2017}.
\subsection{Robust Regression}
\label{subsec:choicenet}
We begin with an experiment based on a noise separation problem.
We apply DAGP to a one-dimensional regression problem with uniformly distributed outliers in the training data.
We use a task proposed by~\textcite{choi_choicenet_2018} where we sample inputs $x \in [-3, 3]$ uniformly and apply the function $\Fun{f}{x} = (1 - \delta)\Fun{\cos}{\sfrac{\pi}{2} \cdot x}\Fun{\exp}{-(\sfrac{x}{2})^2} + \delta \cdot \epsilon$, where $\delta \sim \Fun{\Ber}{\lambda}$ and $\epsilon \sim \Fun{\Uni}{-1, 3}$.
That is, a fraction $\lambda$ of the training data, the outliers, are replaced by uniform noise.
We sample a total of 1000 data points and use $25$ inducing points for every GP in our model.
@ -490,6 +472,25 @@ For high outlier rates, strong prior knowledge about the signal would be require
While the function has still been identified well, some of the noise is also explained using this mode, thereby introducing slight errors in the predictions.
\subsection{Multimodal Data Set}
\label{subsec:semi_bimodal}
Our second experiment applies DAGP to a multimadal data set.
The data, together with recovered posterior attributions, can be seen in \cref{fig:semi_bimodal}.
We uniformly sample 350 data points in the interval $x \in [-2\pi, 2\pi]$ and obtaine $y_1 = \Fun{\sin}{x} + \epsilon$, $y_2 = \Fun{\sin}{x} - 2 \Fun{\exp}{-\sfrac{1}{2} \cdot (x-2)^2} + \epsilon$ and $y_3 = -1 - \sfrac{3}{8\pi} \cdot x + \sfrac{3}{10} \cdot \Fun*{\sin}{2x} + \epsilon$ with additive independent noise $\epsilon \sim \Gaussian*{0, 0.005^2}$.
The resulting data set $\D = \Set{\left( x, y_1 \right), \left( x, y_2 \right), \left( x, y_3 \right)}$ is trimodal in the interval $[0, 5]$ and is otherwise bimodal with one mode containing double the amount of data than the other.
We use squared exponential kernels as priors for both the $f^{\pix{k}}$ and $\alpha^{\pix{k}}$ and $25$ inducing points in every GP.
\Cref{fig:semi_bimodal,fig:semi_bimodal:c} show the posterior of an DAGP with $K = 4$ modes applied to the data, which correctly identified the underlying functions.
\Cref{fig:semi_bimodal} shows the posterior belief about the assignments $\mat{A}$ and illustrates that DAGP recovered that it needs only three of the four available modes to explain the data.
One of the modes is only assigned points in the interval $[0, 5]$ where the data is actually trimodal.
This separation is explicitly represented in the model via the assignment processes $\mat{\alpha}$ shown in \cref{fig:semi_bimodal:c}.
The model has disabled the mode $k = 2$ in the complete input space and has learned that the mode $k = 1$ is only relevant in the interval $[0, 5]$ where the three enabled modes each explain about a third of the data.
Outside this interval, the model has learned that one of the modes has about twice the assignment probability than the other one, thus correctly reconstructing the true generative process.
The DAGP is implicitly incentivized to explain the data using as few modes as possible through the likelihood term of the inferred $\mat{a_n}$ in \cref{eq:variational_bound}.
At $x = -10$ the inferred modes and assignment processes start reverting to their respective priors away from the data.
\subsection{Mixed Cart-Pole Systems}
\label{subsec:cartpole}
\begin{table*}[t]

Loading…
Cancel
Save