+ 31
- 53

dynamic_dirichlet_deep_gp.tex
View File
@@ -352,6 +352,16 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the | |||

\section{Experiments} | |||

\label{sec:experiments} | |||

In this section we investigate the behavior of the DAGP model in multiple regression settings. | |||

First, we show how prior knowledge about the different generative processes can be used to separate a signal from unrelated noise. | |||

Second, we apply the DAGP to a multimodal data set and showcase how the different components of the model interact to identify how many modes are necessary to explain the data. | |||

Finally, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DAGP can recover information about both systems and infer the state variable separating the systems. | |||

We use an implementation of DAGP in TensorFlow~\parencite{tensorflow2015-whitepaper} based on GPflow~\parencite{matthews_gpflow_2017} and the implementation of DSVI~\parencite{salimbeni_doubly_2017}. | |||

\subsection{Noise Separation} | |||

\label{subsec:choicenet} | |||

% | |||

\begin{figure*}[t] | |||

\centering | |||

@@ -411,56 +421,6 @@ This extended bound thus has complexity $\Fun*{\Oh}{NM^2LK}$ to evaluate in the | |||

} | |||

\end{figure*} | |||

% | |||

% | |||

\begin{figure*}[t] | |||

\centering | |||

\includestandalone{figures/semi_bimodal_joint} | |||

\caption{ | |||

\label{fig:semi_bimodal} | |||

The DAGP posterior on an artificial data set with bimodal and trimodal parts. | |||

The left plot shows the joint predictions which are mixtures of four Gaussians weighed by the assignment probabilities shown in \cref{fig:semi_bimodal:c}. | |||

The weights are represented via the opacity of the modes, which shows that the orange mode is completely disabled and the red mode only relevant around the interval $[0, 5]$. | |||

The right plot shows the posterior belief about the assignment of the training data to the respective modes. | |||

\todo[inline]{Fix captions} | |||

} | |||

\end{figure*} | |||

% | |||

\begin{figure}[t] | |||

\centering | |||

\begin{subfigure}[b]{.495\linewidth} | |||

\centering | |||

\includestandalone{figures/semi_bimodal_attrib_process} | |||

% \caption{ | |||

% \label{fig:semi_bimodal:b} | |||

% } | |||

\end{subfigure} | |||

\hfill | |||

\begin{subfigure}[b]{.495\linewidth} | |||

\centering | |||

\includestandalone{figures/semi_bimodal_attrib} | |||

% \caption{ | |||

% \label{fig:semi_bimodal:b} | |||

% } | |||

\end{subfigure} | |||

\caption{ | |||

\label{fig:semi_bimodal:c} | |||

Normalized samples from the assignment process $\mat{\alpha}$ of the model shown in \cref{fig:semi_bimodal}. | |||

The assignment process is used to weigh the predictive distributions of the different modes depending on the position in the input space. | |||

The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$. | |||

Outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$. | |||

} | |||

\end{figure} | |||

% | |||

In this section we investigate the behavior of the DAGP model in multiple regression settings. | |||

First, we show how prior knowledge about the different generative processes can be used to separate a signal from unrelated noise. | |||

Second, we apply the DAGP to a multimodal data set and showcase how the different components of the model interact to identify how many modes are necessary to explain the data. | |||

Finally, we investigate a data set which contains observations of two independent dynamical systems mixed together and show how the DAGP can recover information about both systems and infer the state variable separating the systems. | |||

We use an implementation of DAGP in TensorFlow~\parencite{tensorflow2015-whitepaper} based on GPflow~\parencite{matthews_gpflow_2017} and the implementation of DSVI~\parencite{salimbeni_doubly_2017}. | |||

\subsection{Noise Separation} | |||

\label{subsec:choicenet} | |||

We begin with an experiment based on a noise separation problem. | |||

We apply DAGP to a one-dimensional regression problem with uniformly distributed asymmetric outliers in the training data. | |||

We use a task proposed by~\textcite{choi_choicenet_2018} where we sample $x \in [-3, 3]$ uniformly and apply the function $\Fun{f}{x} = (1 - \delta)(\Fun{\cos}{\sfrac{\pi}{2} \cdot x}\Fun{\exp}{-(\sfrac{x}{2})^2} + \gamma) + \delta \cdot \epsilon$, where $\delta \sim \Fun{\Ber}{\lambda}$, $\epsilon \sim \Fun{\Uni}{-1, 3}$ and $\gamma \sim \Gaussian{0, 0.15^2}$. | |||

@@ -486,17 +446,35 @@ While the function has still been identified well, some of the noise is also exp | |||

\subsection{Multimodal Data} | |||

\label{subsec:semi_bimodal} | |||

% | |||

\begin{figure*}[t] | |||

\centering | |||

\includestandalone{figures/semi_bimodal_joint} | |||

\includestandalone{figures/semi_bimodal_attrib} | |||

\includestandalone{figures/semi_bimodal_attrib_process} | |||

\caption{ | |||

\label{fig:semi_bimodal} | |||

The DAGP posterior on an artificial data set with bimodal and trimodal parts. | |||

The joint predictions (top) are mixtures of four Gaussians weighed by the assignment probabilities $\mat{\alpha}$ (bottom). | |||

The weights are represented via the opacity of the modes. | |||

The model has learned that the mode $k = 2$ is irrelevant, that the mode $k = 1$ is only relevant around the interval $[0, 5]$. | |||

Outside this interval, the mode $k = 3$ is twice as likely as the mode $k = 4$. | |||

The concrete assignments $\mat{a}$ (middle) of the training data show that the mode $k = 1$ is only used to explain observations where the training data is trimodal. | |||

The mode $k = 2$ is never used. | |||

} | |||

\end{figure} | |||

% | |||

Our second experiment applies DAGP to a multimodal data set. | |||

The data, together with recovered posterior attributions, can be seen in \cref{fig:semi_bimodal}. | |||

We uniformly sample 350 data points in the interval $x \in [-2\pi, 2\pi]$ and obtain $y_1 = \Fun{\sin}{x} + \epsilon$, $y_2 = \Fun{\sin}{x} - 2 \Fun{\exp}{-\sfrac{1}{2} \cdot (x-2)^2} + \epsilon$ and $y_3 = -1 - \sfrac{3}{8\pi} \cdot x + \sfrac{3}{10} \cdot \Fun*{\sin}{2x} + \epsilon$ with additive independent noise $\epsilon \sim \Gaussian*{0, 0.005^2}$. | |||

The resulting data set $\D = \Set{\left( x, y_1 \right), \left( x, y_2 \right), \left( x, y_3 \right)}$ is trimodal in the interval $[0, 5]$ and is otherwise bimodal with one mode containing double the amount of data than the other. | |||

We use squared exponential kernels as priors for both the $f^{\pix{k}}$ and $\alpha^{\pix{k}}$ and $25$ inducing points in every GP. | |||

\Cref{fig:semi_bimodal,fig:semi_bimodal:c} show the posterior of an DAGP with $K = 4$ modes applied to the data, which correctly identified the underlying functions. | |||

\Cref{fig:semi_bimodal} shows the posterior belief about the assignments $\mat{A}$ and illustrates that DAGP recovered that it needs only three of the four available modes to explain the data. | |||

\Cref{fig:semi_bimodal} shows the posterior of an DAGP with $K = 4$ modes applied to the data, which correctly identified the underlying functions. | |||

Te figure shows the posterior belief about the assignments $\mat{A}$ and illustrates that DAGP recovered that it needs only three of the four available modes to explain the data. | |||

One of the modes is only assigned points in the interval $[0, 5]$ where the data is actually trimodal. | |||

This separation is explicitly represented in the model via the assignment processes $\mat{\alpha}$ shown in \cref{fig:semi_bimodal:c}. | |||

This separation is explicitly represented in the model via the assignment processes $\mat{\alpha}$. | |||

The model has disabled the mode $k = 2$ in the complete input space and has learned that the mode $k = 1$ is only relevant in the interval $[0, 5]$ where the three enabled modes each explain about a third of the data. | |||

Outside this interval, the model has learned that one of the modes has about twice the assignment probability than the other one, thus correctly reconstructing the true generative process. | |||

The DAGP is implicitly incentivized to explain the data using as few modes as possible through the likelihood term of the inferred $\mat{a_n}$ in \cref{eq:variational_bound}. |

BIN

figures/choicenet_attrib.pdf
View File
BIN

figures/choicenet_attrib_40.pdf
View File
BIN

figures/choicenet_data.pdf
View File
BIN

figures/choicenet_data_40.pdf
View File
BIN

figures/choicenet_data_intro.pdf
View File
BIN

figures/choicenet_joint.pdf
View File
BIN

figures/choicenet_joint_40.pdf
View File
BIN

figures/dynamic_graphical_model.pdf
View File
+ 1
- 1

figures/preamble/tikz_common.tex
View File
@@ -65,7 +65,7 @@ | |||

} | |||

\pgfplotsset{model plot/.style = { | |||

grid=major, | |||

height=100pt, | |||

height=105pt, | |||

% enlarge x limits=false, | |||

ylabel style={rotate=-90}, | |||

}} |

BIN

figures/semi_bimodal_attrib.pdf
View File
+ 1
- 2

figures/semi_bimodal_attrib.tex
View File
@@ -9,9 +9,8 @@ | |||

\def\datapath{\figurepath/data/semi_bimodal_fancy} | |||

\begin{axis}[ | |||

model plot, | |||

width=\plothalfwidth, | |||

width=\plotfullwidth, | |||

xlabel=$\rv{X}$, ylabel=$\rv{y}$, | |||

ylabel=, | |||

ymin=-4, ymax=3, | |||

] | |||

BIN

figures/semi_bimodal_attrib_process.pdf
View File
+ 6
- 4

figures/semi_bimodal_attrib_process.tex
View File
@@ -10,14 +10,16 @@ | |||

\begin{axis}[ | |||

model plot, | |||

clip mode=individual, | |||

width=\plothalfwidth, | |||

width=.8\plotfullwidth, | |||

xlabel=$\rv{X}$, ylabel=$\Fun{\softmax}{\rv{\alpha}}$, | |||

ylabel style={rotate=90}, | |||

ytick={0, 0.33, 0.66}, | |||

legend columns=-1, | |||

legend columns=1, | |||

legend style={ | |||

at={(0.5, 1.05)}, | |||

anchor=south, | |||

% at={(0.5, 1.05)}, | |||

% anchor=south, | |||

at={(1.05, 0.5)}, | |||

anchor=west, | |||

} | |||

] | |||

BIN

figures/semi_bimodal_data.pdf
View File
BIN

figures/semi_bimodal_joint.pdf
View File
Loading…