Browse Source

Remove subcaptions from choicenet figure; Save more space

Markus Kaiser 3 months ago
parent
commit
27bd6b2c1e

+ 12
- 27
dynamic_dirichlet_deep_gp.tex View File

@@ -28,10 +28,10 @@
28 28
 \aistatsaddress{ Siemens AG \\ Technical University of Munich \And Siemens AG \And Siemens AG \\ Technical University of Munich \And University of Bristol }]
29 29
 
30 30
 \begin{abstract}
31
-    We propose a novel Bayesian approach to modelling multimodal data generated by multiple independent processes, simultaneously solving the data assoiciation and induced supervised learning problems.
31
+    We propose a novel Bayesian approach to modelling multimodal data generated by multiple independent processes, simultaneously solving the data association and induced supervised learning problems.
32 32
     Underpinning our approach is the use of Gaussian process priors used to encode structure both on the functions and the associations themselves.
33 33
     The association of samples and functions are determined taking both inputs and outputs into account while also obtaining a posterior belief about the relevance of the global components throughout the input space.
34
-    We present an efficient learning scheme based on doubly stochastic variational inference and discuss how the model can be extended to deep Gaussian process priors.
34
+    We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied deep Gaussian process priors.
35 35
     We show results for an artificial data set, a noise separation problem and a multimodal regression problem based on the cart-pole benchmark.
36 36
 \end{abstract}
37 37
 
@@ -42,7 +42,7 @@ Estimating a function from data is a central aspect of machine learning and a ho
42 42
 Fundamentally, a function is an object relating an input value to a single output value, often represented as elements of finite-dimensional vector spaces.
43 43
 However, for some tasks, not all relevant input dimensions can be observed, meaning that for each input location, multiple outputs are possible due to changing missing information.
44 44
 One class of problem with these characteristics are dynamic systems which often can be in multiple states of operation and where this state itself is not observed.
45
-Examples include faulty sensors which, at any time, might emit a correct reading or uninformative noise\todo{noisy data reference?}, or industrial systems with accumulating latent effects which can induce bifurcation or hysteresis \parencite{hein_benchmark_2017}.
45
+Examples include faulty sensors which, at any time, might emit a correct reading or uninformative noise, or industrial systems with accumulating latent effects which can induce bifurcation or hysteresis \parencite{hein_benchmark_2017}.
46 46
 In this work, we will investigate a data set derived from the cart-pole benchmark which contains trajectories of two instances of the benchmark with different pole lengths.
47 47
 This setup emulates an industrial system in which, for example due to wear or defective parts, the underlying dynamics change over time.
48 48
 In this setting, we want to recover both joint predictions marginalizing the current state of operation but also informative models for these separate states.
@@ -50,7 +50,7 @@ In this setting, we want to recover both joint predictions marginalizing the cur
50 50
 Estimating a model in this scenario is often referred to as a \emph{data association problem} \parencite{Bar-Shalom:1987, Cox93areview} where both the different functions and the associations of the observations to a function need to be estimated.
51 51
 A simple example of this can be seen in \cref{fig:semi_bimodal:b}, where no single function could have generated the data.
52 52
 A slightly different view of the same problem is to consider the data to have been generated by a mixture of processes where we are interested to factorise the data into these components~\parencite{choi_choicenet_2018}.
53
-The separation of underlying signal and a noise process is an application of the latter, where we consider certain observations to be noise and others to be signal.
53
+The separation of underlying signal and a noise process is an application of the latter, where we consider certain observations to be noise and others to be signal\todo{noisy data reference?}.
54 54
 
55 55
 Early approaches to explaining data using multiple generative processes is based on separating the input space and training local expert models explaining easier subtasks~\parencite{jacobs_adaptive_1991,tresp_mixtures_2001, rasmussen_infinite_2002}.
56 56
 The assignment of data points to local experts is handled by a gating network, which learns a function from the inputs to assignment probabilities.
@@ -147,7 +147,7 @@ The prior on the assignments $\mat{A}$ is given by marginalizing the $\mat{\alph
147 147
 Modelling the relationship between the input and the associations allows us to efficiently model data which, for example, is unimodal in some parts of the input space and bimodal in others.
148 148
 A simple smoothness prior will encode a belief for how quickly we believe the components switch across the input domain.
149 149
 
150
-Since the GPs of the $\mat{\alpha^{\pix{k}}}$ use a zero mean function, our prior assumption is a uniform distribution between the different modes everywhere in the input space.
150
+Since the GPs of the $\mat{\alpha^{\pix{k}}}$ use a zero mean function, our prior assumption is a uniform distribution of the different modes everywhere in the input space.
151 151
 If inference on the $\mat{a_n}$ reveals that, say, all data points at similar positions in the input space can be explained by the same $\nth{k}$ mode, the belief about $\mat{\alpha}$ can be adjusted to make a non-uniform mode distribution favorable at this position, thereby increasing the likelihood via $\Prob*{\mat{A} \given \mat{X}}$.
152 152
 This mechanism introduces an incentive for the model to use as few modes as possible to explain the data and allows us to predict a relative importance of the modes when calculating the posterior of new observations $\mat{x^\ast}$.
153 153
 
@@ -166,7 +166,7 @@ Exact inference is intractable in this model.
166 166
 Instead, we now formulate a variational approximation following ideas from~\parencite{hensman_gaussian_2013, salimbeni_doubly_2017}.
167 167
 Because of the rich structure in our model, finding a variational lower bound which is both faithful and can be evaluated analytically is hard.
168 168
 Instead, we formulate an approximation which factorizes along both the $K$ modes and $N$ data points.
169
-This bound can be sampled efficiently and allows us to optimize both the models for the different modes $\Set*{f^{\pix{k}}}_{k=1}^K$ and our belief about the data assignments $\Set*{\mat{a_n}}_{n=1}^N$ simultaneously using stochastic optimization methods.
169
+This bound can be sampled efficiently and allows us to optimize both the models for the different modes $\Set*{f^{\pix{k}}}_{k=1}^K$ and our belief about the data assignments $\Set*{\mat{a_n}}_{n=1}^N$ simultaneously using stochastic optimization.
170 170
 
171 171
 \subsection{Variational Lower Bound}
172 172
 \label{subsec:lower_bound}
@@ -175,11 +175,11 @@ We collect them as $\mat{Z} = \Set*{\mat{Z^{\pix{k}}}, \mat{Z_\alpha^{\pix{k}}}}
175 175
 Taking the function $f^{\pix{k}}$ and its corresponding GP as an example, the inducing variables $\mat{u^{\pix{k}}}$ are jointly Gaussian with the latent function values $\mat{F^{\pix{k}}}$ of the observed data by the definition of GPs.
176 176
 We follow \parencite{hensman_gaussian_2013} and choose the variational approximation $\Variat*{\mat{F^{\pix{k}}}, \mat{u^{\pix{k}}}} = \Prob*{\mat{F^{\pix{k}}} \given \mat{u^{\pix{k}}}, \mat{X}, \mat{Z^{\pix{k}}}}\Variat*{\mat{u^{\pix{k}}}}$ with $\Variat*{\mat{u^{\pix{k}}}} = \Gaussian*{\mat{u^{\pix{k}}} \given \mat{m^{\pix{k}}}, \mat{S^{\pix{k}}}}$.
177 177
 This formulation introduces the set $\Set*{\mat{Z^{\pix{k}}}, \mat{m^{\pix{k}}}, \mat{S^{\pix{k}}}}$ of variational parameters indicated in~\cref{fig:dynamic_graphical_model}.
178
-To simplify notation we will drop the dependency on the inducing inputs $\mat{Z}$ in the following.
178
+To simplify notation we drop the dependency on $\mat{Z}$ in the following.
179 179
 
180 180
 A central assumption of this approximation is that given enough well-placed inducing variables $\mat{u^{\pix{k}}}$, they are a sufficient statistic for the latent function values $\mat{F^{\pix{k}}}$.
181 181
 This implies conditional independence of the $\mat{f_n^{\pix{k}}}$ given $\mat{u^{\pix{k}}}$ and $\mat{X}$.
182
-With this assumption, the variational posterior of a single GP can be written as,
182
+The variational posterior of a single GP can then be written as,
183 183
 \begin{align}
184 184
 \begin{split}
185 185
     \Variat*{\mat{F^{\pix{k}}} \given \mat{X}}
@@ -430,42 +430,27 @@ At $x = -10$ both the two modes and the assignment processes start reverting to
430 430
     \begin{subfigure}{.495\linewidth}
431 431
         \centering
432 432
         \includestandalone{figures/choicenet_joint_40}
433
-        \caption{
434
-            \label{fig:choicenet:a}
435
-            Joint posterior with 40\,\% outliers.
436
-        }
437 433
     \end{subfigure}
438 434
     \hfill
439 435
     \begin{subfigure}{.495\linewidth}
440 436
         \centering
441 437
         \includestandalone{figures/choicenet_attrib_40}
442
-        \caption{
443
-            \label{fig:choicenet:b}
444
-            Assignment probabilities with 40\,\% outliers.
445
-        }
446 438
     \end{subfigure}
447
-    \\[.5\baselineskip]
439
+    \\
448 440
     \begin{subfigure}{.495\linewidth}
449 441
         \centering
450 442
         \includestandalone{figures/choicenet_joint}
451
-        \caption{
452
-            \label{fig:choicenet:c}
453
-            Joint posterior with 60\,\% outliers.
454
-        }
455 443
     \end{subfigure}
456 444
     \hfill
457 445
     \begin{subfigure}{.495\linewidth}
458 446
         \centering
459 447
         \includestandalone{figures/choicenet_attrib}
460
-        \caption{
461
-            \label{fig:choicenet:d}
462
-            Assignment probabilities with 60\,\% outliers.
463
-        }
464 448
     \end{subfigure}
465 449
     \captionof{figure}{
466 450
         \label{fig:choicenet}
467
-        The MDGP posterior on the ChoiceNet data set with 40\,\% outliers (upper row) and 60\,\% outliers (lower row).
468
-        The bimodal MDGP identifies the underlying signal perfectly up to 40\,\% outliers.
451
+        MDGP on the ChoiceNet data set with 40\,\% outliers (upper row) and 60\,\% outliers (lower row).
452
+        We show the joint posterior (left) and assignment probabilities (right).
453
+        The bimodal MDGP identifies the signal perfectly up to 40\,\% outliers.
469 454
         For 60\,\% outliers, some of the noise is interpreted as signal, but the latent function is still recovered.
470 455
     }
471 456
 \end{figure*}

BIN
figures/choicenet_attrib_40.pdf View File


+ 1
- 0
figures/choicenet_attrib_40.tex View File

@@ -12,6 +12,7 @@
12 12
     width=.9\plotlinewidth,
13 13
     attrib colorbar,
14 14
     xlabel=, ylabel=,
15
+    xticklabels={,,},
15 16
     ]
16 17
 
17 18
     \addplot[

BIN
figures/choicenet_joint_40.pdf View File


+ 1
- 0
figures/choicenet_joint_40.tex View File

@@ -12,6 +12,7 @@
12 12
     clip mode=individual,
13 13
     width=\plotlinewidth,
14 14
     xlabel=,
15
+    xticklabels={,,},
15 16
     ]
16 17
 
17 18
     \addplot[

Loading…
Cancel
Save