We want to thank the reviewers for their constructive comments and suggestions for improvement. We first want to comment on two suggestions made by Reviewer #1 and why we did not include them for this revision:
Theoretical extensions to the model, e.g. a non-parametric model for K This work showcases how hierarchical Bayesian models can be used to incorporate high-level expert knowledge during model design. The prior knowledge we assume includes knowledge about the correct number of modes required. Note that inference over the correct number of modes is a hard problem due to the ill-posedness of the data association problem. It is crucial to formulate a strong prior over acceptable associations to obtain interpretable solutions. We do agree that this observation is not obvious. We have therefore added a new experiment to showcase the effect of other choices of K.
Experiments on other benchmarks, e.g. the Industrial Benchmark As this work is specifically about formulating a Bayesian model tailored to a problem and available knowledge, adding experiments on another benchmark requires significant changes to the paper, including the formulation of a completely new model. The industrial benchmark's is not multi-modal. Instead, its difficulties lie in its high dimensionality and latent information. We consider such a comparison as out of scope for this submission.
Second, as Reviewer #1 expressed concern about the amount of novel material when compared to the ESANN submission, we give an explicit list of additions here:
- (Section 1) Extended introduction and related work
- (Section 3) Addition of a description of the inference scheme employed to train the to transition model and policies
- (Section 4) A considerably more detailed analysis of the formulated transition model, insights obtained from data and discussion of the model's interpretability.
- (Section 4) An extension of the original experiment with a comparison to an additional model (BNN+LV) as suggested by Reviewer #1
- (Section 4) An new experiment on how the interpretable model can be used for reward shaping
- (Section 4) An new experiment on the effects of model misspecification on the data efficiency as suggested by Reviewer #1