Browse Source

Complete first draft of the rebuttal

Markus Kaiser 5 months ago
2 changed files with 28 additions and 6 deletions
  1. BIN
  2. 28

rebuttal/nips_2018_author_response.pdf View File

+ 28
- 6
rebuttal/nips_2018_author_response.tex View File

@@ -17,7 +17,7 @@
17 17
18 18
 We would like to thank the reviewers for their insightful and constructive comments and suggestions for improvement.
19 19
 We will address their comments below.
-\todo[inline]{Things currently not part of the rebuttal: The triviality of the model and hardness of the derivation.}
+\todo{Currently not included: \\ - Significance of the Psi-Statistics \\ - Triviality of the model \\ - Hardness of the derivation.}
21 21
22 22
 \paragraph{R1 and R2: Modeling choices and the significance of the CP}
23 23
 As described in the paper, the generative process underlying the observations of power production of wind turbines is heavily influenced by the noise introduced through local and global turbulence, leading to a low signal to noise ratio.
@@ -28,15 +28,37 @@ This effect is included in our model via the convolution process.
28 28
 From a technical point of view, the CP introduces additional regularization and allows the model to avoid local minima introduced by overfitting the shared layer to one of the time series before an informative alignment can be found by allowing additional smoothing in the shared layer.
29 29
 We performed preliminary experiments using a model close to the one described in the reference provided by Reviewer 2 and found it very hard to train.
30 30
 Because this model can be understood as a special case with very narrow convolutions and we found the comparisons provided in the paper to be more descriptive, we decided against including these results in the paper.
-We thank Reviewer 2 for providing this reference and will include it in the final version of this paper.
+We thank Reviewer 2 for providing this reference and will include it in the final version of this paper\todo{Explain differences more clearly?}.
32 32
33 33
 \paragraph{R2 and R3: Interpretation of results on the wind data set}
+Figures 4 and 5 showcase the results obtained from data recorded from a pair of neighbouring wind turbines in a wind farm.
+We agree that these results should be explained in more detail in the final version.
+The purple graph in Figure 4 shows the relative alignment between the two time series identified by our model, that is, the number of minutes the two time series are misaligned at a given time.
+Given the two time series, we are interested in recovering both the latent wind fronts and the propagation behaviour of these fronts and, critically, want to separate posterior uncertainties about the two.
35 38
-\paragraph{R1 and R2: Applicability and computational complexity of the inference scheme}
+This separation of uncertainties can be seen in Figure 5d, which shows samples drawn from our model.
+The three depicted samples have similar structure, which implies that the model is fairly certain about the latent wind fronts.
+The predictive uncertainties stem from the uncertain relative alignment (shown in Figure 4) and leads to the samples being displaced along the X-axis.
+The separation of uncertainties, which is implied by our model structure, gives rise to a much more informative model when compared to the other models presented in the paper which are not able to correctly identify the generative process and because of this yield uninformative samples.
+\paragraph{R2: Applicability and computational cost of the inference scheme}
+We agree that the computational cost and scalability of the inference scheme should be discussed in more detail.
+Our inference scheme is based on nested variational compression.
+One of the advantages of this scheme is that the variational lower bound factorizes along the data which allows the use of stochastic optimization methods and, more specifically, the use of minibatches during training.
+This considerably speeds up training for larger data sets as the required decompositions can be calculated on smaller matrices.
+It is correct that the computational cost increases with a larger number of signals because the shared convolutional layer increases in size.
+More specifically, in order to propagate a single point through this layer, the variational parameters of all output signals have to be considered to correctly represent the shared function.
+However, if the modelling assumption that the different outputs share a common function holds, increasing the number of signals should allow us to reduce the number of variational parameters for every signal in f, because the shared function can still be represented faithfully, thereby reducing the computational cost.
+We agree that it is important to avoid local minima during training.
+Specifically, because the shared latent spaces and the alignments must be identified simultaneously, the model can collapse to trivial solutions.
+However, we model a real-world system which is itself inherently hierarchical and every part of our model solves an interpretable sub-problem for which prior knowledge is available and can be encoded into the different GPs.
+For example, mean propagation times extracted the locations of the turbines and mean prevailing wind conditions in a wind farm can be used to extract an informative mean function for the alignment GPs and physical models of wind propagation give us an idea about the length scales and variances of the different kernels.
+It is a strength of deep GPs that we are able to encode this prior knowledge easily.
38 58
39 59
 \paragraph{R1, R2 and R3: Notational clarity}
+We thank all reviewers for their feedback on how the notation in the paper can be made clearer and will include proposed changes in the final paper.
+\todo{Do we want to rewrite Section 2 in terms of time series? Should we mention it explicitly?}
+We specifically thank Reviewer 3 for picking up on the error in equations 6 and 7 where the inducing outputs u must indeed be replaced by the inducing inputs Z. This mistake did not exist in our code.
41 63
42 64