Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose!

Éva Kenyeres; János Abonyi; Alex Kummer

doi:10.12688/f1000research.168801.1

Home Browse Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Brief Report

Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose!

[version 1; peer review: 1 approved with reservations]

Éva Kenyeres ¹, János Abonyi¹, Alex Kummer¹

PUBLISHED 26 Aug 2025

Author details Author details

¹ HUN-REN-PE Complex Systems Monitoring Research Group, Department of Process Engineering, University of Pannonia, Veszprém, H-8200, Hungary

Éva Kenyeres
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation

János Abonyi
Roles: Conceptualization, Supervision, Writing – Review & Editing

Alex Kummer
Roles: Conceptualization, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

Interval estimates are a common way to express uncertain knowledge of experts. To model them and aggregate multiple judgments, both the probability and possibility theories are applicable. Previous studies have shown that the performances of the aggregated distributions obtained by these two approaches are similar on average; however, there is a lack of works investigating how we can establish a preference between them in certain cases.

Methods

The distribution of expert-based interval estimates on the latent Dunning-Kruger curve, i.e., the correlation between their accuracy (estimation error) and confidence/precision (interval width) was determined. The judgments were modelled using both probabilistic and possibilistic approaches, and the estimation errors of the obtained aggregated distributions were compared, described by an advantage score. Its dependence on the confidence-accuracy interdependence of expert judgments was investigated involving estimates for multiple variables.

Results

Interval estimates of ten experts regarding nine properties of a manual waste sorting system were analyzed including feed composition, product purity and yield. The results show that there is a strong correlation between the confidence-accuracy interdependence of expert judgments and the advantage score. When narrower interval estimates imply greater accuracy, the probabilistic approach is preferable. However, in the reverse case, the possibilistic method yields better results.

Conclusions

Our basic intuition is that narrower interval estimates are more accurate than wider ones. In this case, the probabilistic approach for modeling expert knowledge is appropriate. However, as the Dunning-Kruger effect highlights, sometimes its reverse is true; then, the possibilistic approach tends to be more suitable as it does not amplify the effect of narrow estimates. The results show that the choice between the two concepts can be based on the correlation trend between the accuracy and precision of judgments that could be deduced, e.g., from the composition of the expert group.

Keywords

possibility theory, probabilistic approach, expert knowledge, interval estimates, system monitoring, Dunning-Kruger effect

Corresponding authors: Éva Kenyeres, Alex Kummer

Competing interests: No competing interests were disclosed.

Grant information: Funded by the Research Fellowship Programme (Code: 2024-2.1.1-EKÖP) of the Ministry of Culture and Innovation of Hungary from the National Fund for Research, Development and Innovation.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Kenyeres É et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Kenyeres É, Abonyi J and Kummer A. Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose! [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:824 (https://doi.org/10.12688/f1000research.168801.1) First published: 26 Aug 2025, 14:824 (https://doi.org/10.12688/f1000research.168801.1) Latest published: 26 Aug 2025, 14:824 (https://doi.org/10.12688/f1000research.168801.1)

Introduction

Are narrower interval estimates more accurate? Most people would confidently say YES based on their basic intuitions.¹ However, the incorrectness of this claim has already been proven experimentally.² Besides, the Dunning-Kruger effect also draws attention to the fact that greater confidence does not necessarily result from greater expertise.³ Moreover, the uncertainty of estimates can also be affected by other factors such as the pressure of giving informative judgments even in case of ignorance.⁴

To model and aggregate expert-based interval estimates, both probability and possibility theories are applicable. Considering interval estimates as a kind of (uncertain) measurement, their precision and accuracy can be characterized by the interval width and the estimation error, respectively.⁵ Probabilistic aggregation monotonically reduces the variance of low-precision estimates by the number of available judgments. On the other hand, it may keep the bias of high-precision estimates if they have low accuracy. In this case, possibilistic modeling and aggregation can be more beneficial as it ensures that the aggregated distribution would cover the true value, e.g., by using the union operator. However, when there are both low- and high-precision estimates, the decision between the probabilistic and possibilistic approaches is not so straightforward.

In case of estimates with varying precision, its relationship with accuracy may not be negligible when choosing the modeling technique. If higher precision does not mean greater expertise, the simplest and most commonly used probabilistic modeling technique (which favors narrower estimates) does not seem to be an acceptable solution. Possibility theory appears to be preferable, which not only assigns equal importance to judgments with different interval widths, but also makes it possible to investigate the consensus of estimates by simply analyzing the overlap of the resulted possibility distributions.⁶

Some previous works have compared the probabilistic and possibilistic approaches in different fields. When creating a measurement model, the preference strongly depends on the available a priori knowledge.⁷ Besides, a structural engineering example showed that problem complexity also matters in case of uncertainty propagation.⁸ As for expert knowledge modeling, the most significant work in recent years was conducted by Rohmer and Chojnacki.⁹ They performed extensive analysis involving many datasets to compare probabilistic and possibilistic approaches regarding how well they represent the aggregated opinions of multiple experts, using accuracy- and informativeness-based measures. They were unable to show significant differences between the performances; however, they investigated the average scores of multiple different estimation tasks, and did not consider the potentially different relationship between precision and accuracy in certain cases.

In this study, we are taking a deeper look into the modeling of expert-based interval estimates by investigating their distribution on the Dunning-Kruger curve. It illustrates the ambiguous relationship between confidence and expertise,³ which are equivalent to precision and accuracy in the case of interval estimates, respectively. Starting from the key differences caused by the different mathematical logic, we examine how the distribution of judgments on the Dunning-Kruger curve, i.e., the correlation trend of precision and accuracy affects the relative performance of probabilistic and possibilistic approaches, thus facilitating the decision between them.

Methods

Expert-based interval estimates are ambiguous. If we have an $[x_{L}^{e}, x_{U}^{e}]$ estimate about variable $x$ from expert $e$ , the width of the interval generally represents the uncertainty of the knowledge about $x$ . However, we usually do not have any information about the prioritization of values inside the interval: the expert may have thought that every value in the interval is equally probable, but (s)he might not have meant to give such additional information by his/her estimate. Therefore, the commonly used probabilistic modeling technique that assumes certain probability values over the interval does not always represent correctly the real information content of the estimations.

Probabilistic and possibilistic modeling

The given interval can be represented probabilistically by a uniform distribution as:

(1)

p_{e} (x) = {\begin{cases} \frac{1}{x_{U}^{e} - x_{L}^{e}} & if x_{L}^{e} \leq x \leq x_{U}^{e} \\ 0 & otherwise \end{cases}

which means that the probability is positive and inversely proportional to the interval length if the value of

x

is in the given interval, and zero otherwise. In this case, probability represents the degree of uncertainty expressed by the interval length: narrow estimates will get a higher probability than wider ones, as they are less uncertain.

Otherwise, interval estimates can be modeled by fuzzy numbers in the possibilistic case, showing the degree of being member of the set, defined by the tuple:

(2)

π_{e} (x) = (x_{L}^{e} - α, x_{L}^{e}, x_{U}^{e}, x_{U}^{e} + α)

where the first and last members define the support, and the second and third members define the core of the fuzzy number, as shown in Figure 1. Inside the core, the fuzzy number takes one, and outside the support (defined by tunable

α

bandwidth) zero.

Figure 1. Definition of the trapezoidal fuzzy number representing the $[x_{L}^{e}, x_{U}^{e}]$ interval estimate.

$α$ marks the user-defined bandwidth.

Notice that in this case, without restriction on the integral, the given interval defines the core of the fuzzy number, so the $x$ values inside $[x_{L}^{e}, x_{U}^{e}]$ take one in all the cases regardless the interval width.

Aggregation of interval estimates

The anomaly mentioned above, namely, that the probabilistic approach prioritizes narrow interval estimates but the possibilistic approach assigns all equal importance, increasingly prevails when the judgments of multiple experts are aggregated.

The aim of the aggregation is to summarize the judgments of multiple experts in one probability/possibility distribution. In case of the probabilistic approach, averaging the distribution functions representing individual opinions is a conventional aggregation technique:

(3)

p_{aggr} (x) = \frac{1}{N_{e}} \sum_{e} p_{e} (x)

where

N_{e}

marks the number of experts.

In this work, we use the average operator in case of the possibilistic approach as well, which avoids giving zero possibility everywhere along the domain in case of conflicting estimates (e.g., min operator), and also covering uninformatively large area by possibility value one if the estimates are well distributed, without showing preferences (e.g., max operator):

(4)

π_{aggr} (x) = \frac{1}{N_{e}} \sum_{e} π_{e} (x)

In the case of averaging possibility distributions, all expert judgments are considered with equal importance, as values within an interval estimate of each expert are considered with a possibility equal to one during the aggregation. Consequently, the mean curve (disregarding bandwidth) roughly represents the voting ratio for certain $x$ values as shown on the right in Figure 2. On the other hand, the $x$ values falling within narrow intervals are overemphasized in the probabilistic case, as they take higher probability than the values belonging to wide-interval responses as can be seen on the left in Figure 2.

Figure 2. Illustrative example of the comparison of probabilistic and possibilistic representation and aggregation of interval estimates.

Three interval estimates are considered here: $[1, 3]$ , $[1, 4]$ and $[3.5, 4.5]$ . They are represented by uniform probability distributions on the left, and trapezoidal fuzzy number with zero bandwidth ( $α = 0$ ) on the right. The aggregated distributions are illustrated by red dashed lines, and the function values belonging to $x = 2$ and $x = 4.25$ by black points.

Three expert judgments with different interval widths were modeled and aggregated in Figure 2. It can be noticed that $x = 4.25$ takes higher a probability but a lower possibility value than $x = 2$ . This is a key consequence of the narrow-interval prioritization effect of the probabilistic approach.

Additionally, it has to be mentioned that the aggregated curve in the probabilistic case is still a probability distribution (with integral equal to one). Meanwhile, the averaging in the possibilistic case results in a subnormal possibility distribution (with a maximum below one). If needed, it should be normalized for further calculations.¹⁰

Advantage score

An advantage score is defined to express the performance difference between the aggregated probability and possibility distributions. In order to determine the extent to which the probabilistic approach outperforms the possibilistic, the accuracies of the gained aggregated distributions are compared. Their means are calculated (in the possibilistic case, it corresponds to the centroid defuzzification method) and their distance from the correct value ( $x^{correct}$ ) is evaluated:

(5)

e_{p} (x) = | E (p_{aggr} (x)) - x^{correct} |

(6)

e_{π} (x) = | E (π_{aggr} (x)) - x^{correct} |

where

e_{p}

and

e_{π}

refer to the absolute errors of the mean of the aggregated probability and possibility distributions, respectively, and

E (\cdot)

represents the expected value of the function in the argument.

The difference of errors defines the advantage score ( $s_{adv}$ ) as:

(7)

s_{adv} (x) = \frac{e_{π} (x) - e_{p} (x)}{f_{x}}

The normalization factor $f_{x}$ aims to bring the error differences to a common ground, if the $x$ variables are scaled differently; it can be, e.g., equal to the domain width of $x$ . In this work, this normalization operation was eliminated ( $f_{x} = 1$ ), as all variables were scaled equally in our case study. The $s_{adv}$ metric is positive if the probabilistic approach performs better than the possibilistic, and negative if the possibilistic approach shows more appropriate results.

Correlation analysis

Calculating correlation has two roles in this work. Firstly, it is used to characterize the relation between the interval length and accuracy of estimates about a variable, marked by $r$ . Afterwards, having estimates about multiple variables, the correlation of this correlation and the advantage score belonging to each is calculated, denoted by $R$ . The Pearson’s correlation coefficient is used for both cases.

The interval lengths and estimation errors of estimates about $x$ from multiple experts are collected in $d$ and $e$ vectors, respectively:

(8)

d = [d^{1}, d^{2}, \dots, d^{N_{e}}]

(9)

e = [e^{1}, e^{2}, \dots, e^{N_{e}}]

where

(10)

d^{e} = x_{U}^{e} - x_{L}^{e}, e = 1, \dots, N_{e}

and

e^{e}

absolute errors are defined as the deviation of the middle of the interval (if a point estimate should be given, supposedly this would be that) from the correct value:

(11)

e^{e} = | \frac{x_{U}^{e} - x_{L}^{e}}{2} - x^{correct} |, e = 1, \dots, N_{e}

The confidence-accuracy interdependence belonging to an $x$ variable can be defined as the correlation between interval lengths ( $d$ ) and estimation errors ( $e$ ) of the estimates from multiple experts:

(12)

r = \frac{cov (d, e)}{σ_{d} σ_{e}}

where the standard deviations of

d

and

e

are denoted by

σ_{d}

and

σ_{e}

, respectively.

If $r$ shows a strong positive correlation, that means narrower interval estimates belong to higher expertise level. However, if there is a significant negative correlation, then the interval width increases with the level of expertise, which aligns with the Dunning-Kruger effect.

Having estimates about multiple variables ( $x_{i}, i = 1, \dots, N_{x}$ ), we can calculate the confidence-accuracy correlation and advantage score for each, collected them as:

(13)

r = [r_{1}, r_{2}, \dots, r_{N_{x}}]

(14)

s_{adv} = [s_{adv} (x_{1}), s_{adv} (x_{2}), \dots, s_{adv} (x_{N_{x}})]

Their relationship can also be described by a correlation coefficient ( $R$ ) numerically:

(15)

R = \frac{cov (r, s_{adv})}{σ_{r} σ_{s_{adv}}}

In this work, we wanted to explore whether the distribution of answers on the Dunning-Kruger curve (characterized by $r$ ) has any effect on the performance difference between the probabilistic and possibilistic approaches ( $s_{adv}$ ). This potential effect is quantified by $R$ .

Data source

Interval estimates are available about nine key variables of a manual waste sorting system, including product purity, yields and feed composition, each with a percentage dimension. Thereby, their values are limited to between 0% and 100%. The data collection was performed in a classroom setting in which 10 students (representing experts) were involved. Ethical approval for this study was obtained from the Institutional Research Ethics Committee of the University of Pannonia (approval number: KEB 2/2024. (12.03.)).

Results

The available interval estimates about the nine variables ( $x_{i}, i = 1, \dots, 9$ ) given by the ten experts ( $N_{e} = 10$ ) are summarized in Figure 3.

Figure 3. Interval responses of the ten experts for the nine key variables ( $x_{1} - x_{9}$ ).

The correct value is illustrated by yellow vertical lines. The correlation coefficients between interval width and estimation error ( $r_{1} - r_{9}$ ) are also depicted.

It can be seen that the correlation is positive in only about half of the cases, thus the falsity of the claim that narrower estimates necessarily imply higher level of expertise is verified. A strong correlation was found only in one case ( $x_{9}$ ) and a moderate correlation four times (two negative ( $x_{1}$ , $x_{8}$ ) and two positive ( $x_{3}$ , $x_{7}$ )). Estimates of four variables did not show a significant relationship between interval width and accuracy.

The aggregated probability and possibility distributions ( $α = 2 %$ ) with their means and the related advantage scores can be seen in Figure 4. There are some cases where the probabilistic approach outperforms the possibilistic ( $x_{2}$ , $x_{3}$ and $x_{9}$ ) and there are examples where the reverse is true ( $x_{1}$ and $x_{8}$ ).

Figure 4. Comparison of the performance of the probabilistic and possibilistic approaches.

The aggregated probability (blue) and possibility (red) distributions are shown. Their means are plotted by dotted lines in the corresponding color, and the correct values are illustrated by yellow vertical lines. The advantage scores ( $s_{adv}$ ) are also depicted.

Finally, the relationship between $r$ and $s_{adv} (x)$ values was investigated, as illustrated in Figure 5. A strong linear correlation ( $R = 0.73$ ) was detected between these two factors, which is an encouraging result in terms of making a decision between probabilistic and possibilistic approaches based on the tendency characterized by $r$ . Although absolute errors ( $e^{e}$ ) are not available in practical cases to calculate $r$ as the correct value is not known, we can have a guess about the correlation trend, e.g, knowing the interval widths and the relative competences of experts.

Figure 5. Relationship between the confidence-accuracy correlation ( $r$ ) and the advantage score of the probabilistic approach ( $s_{adv}$ ).

The data points are labeled based on the variables ( $x_{1} - x_{9}$ ) they belong to, and the line fitted to them are depicted by red dashed line. The correlation coefficient of $r$ and $s_{adv}$ is $R = 0.73$ .

Conclusions

Our results demonstrate that the performance difference between probabilistic and possibilistic modeling approaches strongly depends on the nature of the relationship between confidence (precision) and accuracy. If the distribution of answers on the Dunning-Kruger curve is known, the preferable approach can be defined derivatively as illustrated in Figure 6, where the red and blue points represent two different sets of estimates (e.g., about different variables).

Figure 6. Dunning-Kruger curve of expert-based interval estimates.

If the judgments show a negative correlation between accuracy and confidence, the possibilistic approach is preferable (red). However, if higher accuracy relates to narrower estimates, the probabilistic approach is recommended (blue).

Unfortunately, the type of confidence-accuracy correlation related to a certain task and a certain group of experts cannot be determined or predicted directly. It depends on several factors such as the nature of the task, the pressure of time and demand for informativeness, or the composition of the expert group. For now, it has not yet been clearly described as a function of measurable factors, which defines a research gap in the psychological field, and also provides a future research direction that has great practical relevance.

Although the dependence of the distribution of answers on the Dunning-Kruger curve, i.e., confidence-accuracy space, has not yet been explored completely, once this is done, users will be able to choose with great certainty between probabilistic and possibilistic approaches for modeling expert-based interval estimates. Until then, users are able to apply indirect solutions to deduce the nature of precision-accuracy trend, e.g., involving some a priori knowledge about the relative competence of experts in the group and comparing it with the precision of their estimates.

Ethical considerations

Ethical approval for this study was obtained from the Institutional Research Ethics Committee of the University of Pannonia (Approval number: KEB 2/2024. (12.03.)). Written and signed informed consent was obtained from all participants.

Code availability

Archived source code at time of publication: https://doi.org/ 10.5281/zenodo.16680269.¹¹

License: CC BY 4.0

Data availability

Zenodo: Interval estimates. https://doi.org/10.5281/zenodo.16680269.¹¹

This project contains the following underlying data:

• expert_estimates_and_figure.xlsx (raw data and Figure 5)

Data are available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

References

1. Teigen KH, Løhre E, Hohle SM: The boundary effect: Perceived post hoc accuracy of prediction intervals. Judgm. Decis. Mak. 2018; 13(4): 309–321. Publisher Full Text
2. Teigen KH, JØrgensen M: When 90% confidence intervals are 50% certain: On the credibility of credible intervals. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition. 2005; 19(4): 455–475.
3. Kruger J, Dunning D: Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology. 1999; 77(6): 1121.
4. Grice HP: Logic and conversation. Cole P, Morgan JL, editors, Syntax and Semantics 3: Speech Acts. New York: Academic Press; 1975; p. 41–58.
5. Kenyeres É, Abonyi J: Model-centric integration of uncertain expert knowledge into importance sampling- based parameter estimation. Appl. Sci. 2024; 14(21): 2076–3417. Publisher Full Text Reference Source
6. Baccou J, Chojnacki E: A practical methodology for information fusion in presence of uncertainty: application to the analysis of a nuclear benchmark. Environ. Syst. Decis. 2014; 34(2): 237–248.
7. Balakin DA, Pyt’ev YP: A comparative analysis of reduction quality for probabilistic and possibilistic measurement models. Mosc. Univ. Phys. Bull. 2017; 72(2): 101–112.
8. Modares M, Desch M: Comparison between probabilistic and possibilistic approaches for structural uncertainty analysis. Pract. Period. Struct. Des. Constr. 2021; 26: 05. Publisher Full Text
9. Rohmer J, Chojnacki E: Forecast of environment systems using expert judgements: performance comparison between the possibilistic and the classical model. Environ. Syst. Decis. 2021; 41(1): 131–146.
10. Oussalah M: On the normalization of subnormal possibility distributions: New investigations. Int. J. Gen. Syst. 2002; 31(3): 277–301. Publisher Full Text
11. Kenyeres É: Interval estimates. [Code] Zenodo. 2025. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Aug 2025

Author details Author details

¹ HUN-REN-PE Complex Systems Monitoring Research Group, Department of Process Engineering, University of Pannonia, Veszprém, H-8200, Hungary

Éva Kenyeres
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation

János Abonyi
Roles: Conceptualization, Supervision, Writing – Review & Editing

Alex Kummer
Roles: Conceptualization, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

Funded by the Research Fellowship Programme (Code: 2024-2.1.1-EKÖP) of the Ministry of Culture and Innovation of Hungary from the National Fund for Research, Development and Innovation.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 26 Aug 2025, 14:824

https://doi.org/10.12688/f1000research.168801.1

Copyright

© 2025 Kenyeres É et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Kenyeres É, Abonyi J and Kummer A. Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose! [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:824 (https://doi.org/10.12688/f1000research.168801.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 26 Aug 2025

Views

2

Reviewer Report 03 Oct 2025

Jean Baccou, Autorité de Sûrete Nucléaire et de Radioprotection Cadarache, Saint-Paul-lez-Durance, Provence-Alpes-Côte d'Azur, France

Approved with Reservations

https://doi.org/10.5256/f1000research.186016.r412136

This paper is devoted to the important question of expert knowledge modeling and aggregation by considering probability and possibility theories. Interestingly, this work emphasizes the connection between the choice of one of these theories and the Dunning-Kruger curve. This article ... Continue reading

This paper is devoted to the important question of expert knowledge modeling and aggregation by considering probability and possibility theories. Interestingly, this work emphasizes the connection between the choice of one of these theories and the Dunning-Kruger curve. This article presents the methodological material as well as an application related to manual waste sorting system. The following important points should be addressed for a better understanding of the authors' contributions.

Introduction:
A more exhaustive review of the state of the art on expert knowledge modeling and aggregation should be provided. Some references should be added for probability and possibility theories. Please cite also the work of Cooke: "Cooke RM. 1991 Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford, UK: Oxford University Press". Finally, in the framework of uncertainty quantification, Destercke and Chojnacki have investigated the performances of probability and possibility modeling for nuclear application. It would be valuable to add a reference to their work: “Destercke, Chojnacki, Methods for the evaluation and synthesis of multiple sources of information applied to nuclear computer codes, Nuclear Engineering and Design, 2008”.

In the paper of Rohmer and Chojnacki, the two measures are informativeness and calibration.

Method:
The authors first recall how to model experts’ knowledge using probability and possibility theories. Then, they introduce three key quantities for the analysis. The two first quantities are an advantage score to compare probability and possibility approaches and the correlation between interval lengths and estimation error. The last quantity is the correlation between the two first quantities.

The advantage evaluation is a multicriteria problem and in many fields is not reduced to a scalar criterion that only focusses on the error between a mean value and the correct value. In the framework of uncertainty quantification, this advantage evaluation should take into account, for example, the modeling assumptions (a probability uniform law implies that all values inside the interval are likely to occur which is not always an information given by the expert), the width of the aggregated interval and the error between a “mean” and correct values. The authors should indicate a reference where the proposed advantage score is introduced or specify for which type of application this score is tailored.

Formula (6): the expectation symbol refers to the probability theory. Please change the notation for possibility theory.

Results:
This section shows an application related to a manual waste sorting system. The results emphasize a strong correlation between the confidence-accuracy correlation of expert judgements and the advantage score.

If possible, I would suggest to go deeper in the application description as well as in the result analysis. Could you provide more details on the process that was followed for expert elicitation? Moreover, on fig. 3, is there a reason that explains that for some variables such as x8, there is a large dispersion of intervals? In this case, a mean aggregation can be questionable. For x7, the intervals are very narrow, can you explain why?

Fig. 5 shows a strong correlation between the confidence-accuracy correlation of expert judgements and the advantage score. As mentioned in my comments on the previous section, considering a unique advantage score might be very restrictive in several applications to make a choice between probability and possibility theories. The choice of a theory is more related to the amount of information given by an expert rather than to the “characteristics” of the provided intervals (i.e. closeness between middle value and correct one, and interval width). The authors should better explain their choice of such an advantage score to emphasize the novelty of their contribution.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: uncertainty analysis, statistical method, machine learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Aug 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 26 Aug 25	read

Jean Baccou, Autorité de Sûrete Nucléaire et de Radioprotection Cadarache, Saint-Paul-lez-Durance, France

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

2 Views

03 Oct 2025 | for Version 1

Jean Baccou, Autorité de Sûrete Nucléaire et de Radioprotection Cadarache, Saint-Paul-lez-Durance, Provence-Alpes-Côte d'Azur, France

2 Views Cite this report Responses(0)

Approved With Reservations

This paper is devoted to the important question of expert knowledge modeling and aggregation by considering probability and possibility theories. Interestingly, this work emphasizes the connection between the choice of one of these theories and the Dunning-Kruger curve. This article presents the methodological material as well as an application related to manual waste sorting system. The following important points should be addressed for a better understanding of the authors' contributions.

Introduction:
A more exhaustive review of the state of the art on expert knowledge modeling and aggregation should be provided. Some references should be added for probability and possibility theories. Please cite also the work of Cooke: "Cooke RM. 1991 Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford, UK: Oxford University Press". Finally, in the framework of uncertainty quantification, Destercke and Chojnacki have investigated the performances of probability and possibility modeling for nuclear application. It would be valuable to add a reference to their work: “Destercke, Chojnacki, Methods for the evaluation and synthesis of multiple sources of information applied to nuclear computer codes, Nuclear Engineering and Design, 2008”.

In the paper of Rohmer and Chojnacki, the two measures are informativeness and calibration.

Method:
The authors first recall how to model experts’ knowledge using probability and possibility theories. Then, they introduce three key quantities for the analysis. The two first quantities are an advantage score to compare probability and possibility approaches and the correlation between interval lengths and estimation error. The last quantity is the correlation between the two first quantities.

The advantage evaluation is a multicriteria problem and in many fields is not reduced to a scalar criterion that only focusses on the error between a mean value and the correct value. In the framework of uncertainty quantification, this advantage evaluation should take into account, for example, the modeling assumptions (a probability uniform law implies that all values inside the interval are likely to occur which is not always an information given by the expert), the width of the aggregated interval and the error between a “mean” and correct values. The authors should indicate a reference where the proposed advantage score is introduced or specify for which type of application this score is tailored.

Formula (6): the expectation symbol refers to the probability theory. Please change the notation for possibility theory.

Results:
This section shows an application related to a manual waste sorting system. The results emphasize a strong correlation between the confidence-accuracy correlation of expert judgements and the advantage score.

If possible, I would suggest to go deeper in the application description as well as in the result analysis. Could you provide more details on the process that was followed for expert elicitation? Moreover, on fig. 3, is there a reason that explains that for some variables such as x8, there is a large dispersion of intervals? In this case, a mean aggregation can be questionable. For x7, the intervals are very narrow, can you explain why?

Fig. 5 shows a strong correlation between the confidence-accuracy correlation of expert judgements and the advantage score. As mentioned in my comments on the previous section, considering a unique advantage score might be very restrictive in several applications to make a choice between probability and possibility theories. The choice of a theory is more related to the amount of information given by an expert rather than to the “characteristics” of the provided intervals (i.e. closeness between middle value and correct one, and interval width). The authors should better explain their choice of such an advantage score to emphasize the novelty of their contribution.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

uncertainty analysis, statistical method, machine learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Teigen KH, Løhre E, Hohle SM: The boundary effect: Perceived post hoc accuracy of prediction intervals. Judgm. Decis. Mak. 2018; 13(4): 309–321. Publisher Full Text

[2] 2. Teigen KH, JØrgensen M: When 90% confidence intervals are 50% certain: On the credibility of credible intervals. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition. 2005; 19(4): 455–475.

[3] 3. Kruger J, Dunning D: Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology. 1999; 77(6): 1121.

[4] 4. Grice HP: Logic and conversation. Cole P, Morgan JL, editors, Syntax and Semantics 3: Speech Acts. New York: Academic Press; 1975; p. 41–58.

[5] 5. Kenyeres É, Abonyi J: Model-centric integration of uncertain expert knowledge into importance sampling- based parameter estimation. Appl. Sci. 2024; 14(21): 2076–3417. Publisher Full Text Reference Source

[6] 6. Baccou J, Chojnacki E: A practical methodology for information fusion in presence of uncertainty: application to the analysis of a nuclear benchmark. Environ. Syst. Decis. 2014; 34(2): 237–248.

[7] 7. Balakin DA, Pyt’ev YP: A comparative analysis of reduction quality for probabilistic and possibilistic measurement models. Mosc. Univ. Phys. Bull. 2017; 72(2): 101–112.

[8] 8. Modares M, Desch M: Comparison between probabilistic and possibilistic approaches for structural uncertainty analysis. Pract. Period. Struct. Des. Constr. 2021; 26: 05. Publisher Full Text

[9] 9. Rohmer J, Chojnacki E: Forecast of environment systems using expert judgements: performance comparison between the possibilistic and the classical model. Environ. Syst. Decis. 2021; 41(1): 131–146.

[10] 10. Oussalah M: On the normalization of subnormal possibility distributions: New investigations. Int. J. Gen. Syst. 2002; 31(3): 277–301. Publisher Full Text

[11] 11. Kenyeres É: Interval estimates. [Code] Zenodo. 2025. Publisher Full Text

Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose!

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Methods

Probabilistic and possibilistic modeling

(1)

(2)

Figure 1. Definition of the trapezoidal fuzzy number representing the [xLe,xUe] interval estimate.

Aggregation of interval estimates

(3)

(4)

Figure 2. Illustrative example of the comparison of probabilistic and possibilistic representation and aggregation of interval estimates.

Advantage score

(5)

(6)

(7)

Correlation analysis

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

Data source

Results

Figure 3. Interval responses of the ten experts for the nine key variables ( x1−x9 ).

Figure 4. Comparison of the performance of the probabilistic and possibilistic approaches.

Figure 5. Relationship between the confidence-accuracy correlation ( r ) and the advantage score of the probabilistic approach ( sadv ).

Conclusions

Figure 6. Dunning-Kruger curve of expert-based interval estimates.

Ethical considerations

Code availability

Data availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Figure 1. Definition of the trapezoidal fuzzy number representing the $[x_{L}^{e}, x_{U}^{e}]$ interval estimate.

Figure 3. Interval responses of the ten experts for the nine key variables ( $x_{1} - x_{9}$ ).

Figure 5. Relationship between the confidence-accuracy correlation ( $r$ ) and the advantage score of the probabilistic approach ( $s_{adv}$ ).