Browse Source

Add second paragraph introducing model pollution

Markus Kaiser 3 years ago
  1. 6


@ -48,14 +48,16 @@
Many real-world modelling tasks can be categorized by multiple operational regimes.
Many real-world modelling tasks can be categorized by multiple operational regimes~\parencite{hein_benchmark_2017}.
As an example, consider a model describing the lift resulting from airflow around a wing profile as a function of attack angle.
At a low angle the lift increases linearly with attack angle until the wing stalls and the characteristic of the airflow fundamentally changes.
Building a truthful model of such data requires learning two separate models and correctly associating the observed data to each of the dynamical regimes.
A similar example would be if our sensors that measures the lift are faulty in a manner such that we either get a correct reading or a noisy one.
Estimating a model in this scenario is often referred to as a \emph{data association problem}~\parencite{Bar-Shalom:1987, Cox93areview}, where we consider the data to have been generated by a mixture of processes and we are interested in factorising the data into these components.
\todo[inline]{Introduce model pollution? Cite Daniel? Mention Figure 1?}
\Cref{fig:choicenet_data} shows an example of faulty sensor data, where sensor readings are disturbed by uncorrelated and asymmetric noise.
Applying standard machine learning approaches to such data can lead to model pollution, where the expressive power of the model is used to explain noise instead of the underlying signal.
Solving the data association problem by factorizing the data into signal an noise gives rise to a principled approach to avoiding this behaviour.