# Max-min versus sum-product aggregation 

Recall the definition of plausibility degrees $\pi(y | \vec{x}_{q})$ as introduced in Section {numref}`uqnl`.
The computation of $\pi(+1 \given \vec{x}_{q})$ according to {eq}`plaus` is illustrated in {numref}`inf`, where the hypothesis space $\cH$ is shown schematically as one of the axes. In comparison to Bayesian inference {eq}`pd`, two important differences are notable: 

- First, evidence of hypotheses is represented in terms of normalized likelihood $\pi_{\cH}(h)$ instead of posterior probabilities $\prob(h \given \cD)$, and support for a class $y$ in terms of $\pi(y \given h, \vec{x}_{q})$ instead of probabilities $h(\vec{x}_{q}) = \prob(y \given \vec{x}_{q})$. 

- Second, the "sum-product aggregation" in Bayesian inference is replaced by a "max-min aggregation". 


:::{figure-md} inf
<img src="pic-inference.jpg" alt="settings" width="600px">

The plausibility $\pi(+1 \given \vec{x}_{q})$ of the positive class is given by the maximum (dashed line) over the pointwise minima of the plausibility of hypotheses $h$ (blue line) and the corresponding plausibility of the positive class given $h$ (green line).
:::

More formally, the meaning of sum-product aggregation is that {eq}`pd` corresponds to the computation of the standard (Lebesque) integral of the class probability $\prob(y \given \vec{x}_{q})$ with respect to the (posterior) probability distribution $\prob(h \given \cD)$. Here, instead, the definition of $\pi(y \given \vec{x}_{q})$ corresponds to the Sugeno integral \citep{suge_to} of the support $\pi(y \given h, \vec{x}_{q})$ with respect to the possibility measure $\Pi_{\cH}$ induced by the distribution {eq}`noli` on $\cH$:
\begin{equation}
\pi(y \given \vec{x}_{q}) =  S \!\!\!\!\!\! \int_{\cH} \pi(y \given h, \vec{x}_{q}) \circ \Pi_{\cH}
\end{equation}
In general, given a measurable space $(X,\mathcal{A})$ and an $\mathcal{A}$-measurable function $f:\, X \longrightarrow [0,1]$, the Sugeno integral of $f$ with respect to a monotone measure $g$ (i.e., a measure on $\mathcal{A}$ such that $g(\emptyset) = 0$, $g(X) = 1$, and $g(A) \leq g(B)$ for $A \subseteq B$) is defined as
\begin{equation}
S \!\!\!\!\!\! \int_X f(x) \circ g := \sup_{A \in \mathcal{A}} \left[ \min \left( \min_{x \in A} f(x) , g(A) \right) \right] = \sup_{\alpha \in [0,1]} \Big[ \min \big( \alpha , g(F_\alpha) \big) \Big] \, , 
\end{equation}
where $F_\alpha := \{ x \with f(x) \geq \alpha \}$.

In comparison to sum-product aggregation, max-min aggregation avoids the loss of information due to averaging and is more in line with the "existential" aggregation in version space learning. In fact, it can be seen as a graded generalization of {eq}`cbi`. Note that max-min inference requires the two measures $\pi_{\cH}(h)$ and $\pi(+1 \given h, \vec{x}_{q})$ to be commensurable. This is why the normalization of the likelihood according to {eq}`noli` is important. 


Compared to MAP inference {eq}`pd`, max-min inference takes more information into account. Indeed, MAP inference only looks at the probability of hypotheses but ignores the probabilities assigned to the classes. In contrast, a class can be considered plausible according to {eq}`plaus` even if not being strongly supported by the most likely hypothesis $h^{ml}$---this merely requires sufficient support by another hypothesis $h$, which is not much less likely than $h^{ml}$. 


