ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Brief Report

Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose!

[version 1; peer review: awaiting peer review]
PUBLISHED 26 Aug 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background

Interval estimates are a common way to express uncertain knowledge of experts. To model them and aggregate multiple judgments, both the probability and possibility theories are applicable. Previous studies have shown that the performances of the aggregated distributions obtained by these two approaches are similar on average; however, there is a lack of works investigating how we can establish a preference between them in certain cases.

Methods

The distribution of expert-based interval estimates on the latent Dunning-Kruger curve, i.e., the correlation between their accuracy (estimation error) and confidence/precision (interval width) was determined. The judgments were modelled using both probabilistic and possibilistic approaches, and the estimation errors of the obtained aggregated distributions were compared, described by an advantage score. Its dependence on the confidence-accuracy interdependence of expert judgments was investigated involving estimates for multiple variables.

Results

Interval estimates of ten experts regarding nine properties of a manual waste sorting system were analyzed including feed composition, product purity and yield. The results show that there is a strong correlation between the confidence-accuracy interdependence of expert judgments and the advantage score. When narrower interval estimates imply greater accuracy, the probabilistic approach is preferable. However, in the reverse case, the possibilistic method yields better results.

Conclusions

Our basic intuition is that narrower interval estimates are more accurate than wider ones. In this case, the probabilistic approach for modeling expert knowledge is appropriate. However, as the Dunning-Kruger effect highlights, sometimes its reverse is true; then, the possibilistic approach tends to be more suitable as it does not amplify the effect of narrow estimates. The results show that the choice between the two concepts can be based on the correlation trend between the accuracy and precision of judgments that could be deduced, e.g., from the composition of the expert group.

Keywords

possibility theory, probabilistic approach, expert knowledge, interval estimates, system monitoring, Dunning-Kruger effect

Introduction

Are narrower interval estimates more accurate? Most people would confidently say YES based on their basic intuitions.1 However, the incorrectness of this claim has already been proven experimentally.2 Besides, the Dunning-Kruger effect also draws attention to the fact that greater confidence does not necessarily result from greater expertise.3 Moreover, the uncertainty of estimates can also be affected by other factors such as the pressure of giving informative judgments even in case of ignorance.4

To model and aggregate expert-based interval estimates, both probability and possibility theories are applicable. Considering interval estimates as a kind of (uncertain) measurement, their precision and accuracy can be characterized by the interval width and the estimation error, respectively.5 Probabilistic aggregation monotonically reduces the variance of low-precision estimates by the number of available judgments. On the other hand, it may keep the bias of high-precision estimates if they have low accuracy. In this case, possibilistic modeling and aggregation can be more beneficial as it ensures that the aggregated distribution would cover the true value, e.g., by using the union operator. However, when there are both low- and high-precision estimates, the decision between the probabilistic and possibilistic approaches is not so straightforward.

In case of estimates with varying precision, its relationship with accuracy may not be negligible when choosing the modeling technique. If higher precision does not mean greater expertise, the simplest and most commonly used probabilistic modeling technique (which favors narrower estimates) does not seem to be an acceptable solution. Possibility theory appears to be preferable, which not only assigns equal importance to judgments with different interval widths, but also makes it possible to investigate the consensus of estimates by simply analyzing the overlap of the resulted possibility distributions.6

Some previous works have compared the probabilistic and possibilistic approaches in different fields. When creating a measurement model, the preference strongly depends on the available a priori knowledge.7 Besides, a structural engineering example showed that problem complexity also matters in case of uncertainty propagation.8 As for expert knowledge modeling, the most significant work in recent years was conducted by Rohmer and Chojnacki.9 They performed extensive analysis involving many datasets to compare probabilistic and possibilistic approaches regarding how well they represent the aggregated opinions of multiple experts, using accuracy- and informativeness-based measures. They were unable to show significant differences between the performances; however, they investigated the average scores of multiple different estimation tasks, and did not consider the potentially different relationship between precision and accuracy in certain cases.

In this study, we are taking a deeper look into the modeling of expert-based interval estimates by investigating their distribution on the Dunning-Kruger curve. It illustrates the ambiguous relationship between confidence and expertise,3 which are equivalent to precision and accuracy in the case of interval estimates, respectively. Starting from the key differences caused by the different mathematical logic, we examine how the distribution of judgments on the Dunning-Kruger curve, i.e., the correlation trend of precision and accuracy affects the relative performance of probabilistic and possibilistic approaches, thus facilitating the decision between them.

Methods

Expert-based interval estimates are ambiguous. If we have an [xLe,xUe] estimate about variable x from expert e , the width of the interval generally represents the uncertainty of the knowledge about x . However, we usually do not have any information about the prioritization of values inside the interval: the expert may have thought that every value in the interval is equally probable, but (s)he might not have meant to give such additional information by his/her estimate. Therefore, the commonly used probabilistic modeling technique that assumes certain probability values over the interval does not always represent correctly the real information content of the estimations.

Probabilistic and possibilistic modeling

The given interval can be represented probabilistically by a uniform distribution as:

(1)
pe(x)={1xUexLeifxLexxUe0otherwise
which means that the probability is positive and inversely proportional to the interval length if the value of x is in the given interval, and zero otherwise. In this case, probability represents the degree of uncertainty expressed by the interval length: narrow estimates will get a higher probability than wider ones, as they are less uncertain.

Otherwise, interval estimates can be modeled by fuzzy numbers in the possibilistic case, showing the degree of being member of the set, defined by the tuple:

(2)
πe(x)=(xLeα,xLe,xUe,xUe+α)
where the first and last members define the support, and the second and third members define the core of the fuzzy number, as shown in Figure 1. Inside the core, the fuzzy number takes one, and outside the support (defined by tunable α bandwidth) zero.

1b72cf60-a995-4eb9-834b-435d813c9447_figure1.gif

Figure 1. Definition of the trapezoidal fuzzy number representing the [xLe,xUe] interval estimate.

α marks the user-defined bandwidth.

Notice that in this case, without restriction on the integral, the given interval defines the core of the fuzzy number, so the x values inside [xLe,xUe] take one in all the cases regardless the interval width.

Aggregation of interval estimates

The anomaly mentioned above, namely, that the probabilistic approach prioritizes narrow interval estimates but the possibilistic approach assigns all equal importance, increasingly prevails when the judgments of multiple experts are aggregated.

The aim of the aggregation is to summarize the judgments of multiple experts in one probability/possibility distribution. In case of the probabilistic approach, averaging the distribution functions representing individual opinions is a conventional aggregation technique:

(3)
paggr(x)=1Neepe(x)
where Ne marks the number of experts.

In this work, we use the average operator in case of the possibilistic approach as well, which avoids giving zero possibility everywhere along the domain in case of conflicting estimates (e.g., min operator), and also covering uninformatively large area by possibility value one if the estimates are well distributed, without showing preferences (e.g., max operator):

(4)
πaggr(x)=1Neeπe(x)

In the case of averaging possibility distributions, all expert judgments are considered with equal importance, as values within an interval estimate of each expert are considered with a possibility equal to one during the aggregation. Consequently, the mean curve (disregarding bandwidth) roughly represents the voting ratio for certain x values as shown on the right in Figure 2. On the other hand, the x values falling within narrow intervals are overemphasized in the probabilistic case, as they take higher probability than the values belonging to wide-interval responses as can be seen on the left in Figure 2.

1b72cf60-a995-4eb9-834b-435d813c9447_figure2.gif

Figure 2. Illustrative example of the comparison of probabilistic and possibilistic representation and aggregation of interval estimates.

Three interval estimates are considered here: [1,3] , [1,4] and [3.5,4.5] . They are represented by uniform probability distributions on the left, and trapezoidal fuzzy number with zero bandwidth ( α=0 ) on the right. The aggregated distributions are illustrated by red dashed lines, and the function values belonging to x=2 and x=4.25 by black points.

Three expert judgments with different interval widths were modeled and aggregated in Figure 2. It can be noticed that x=4.25 takes higher a probability but a lower possibility value than x=2 . This is a key consequence of the narrow-interval prioritization effect of the probabilistic approach.

Additionally, it has to be mentioned that the aggregated curve in the probabilistic case is still a probability distribution (with integral equal to one). Meanwhile, the averaging in the possibilistic case results in a subnormal possibility distribution (with a maximum below one). If needed, it should be normalized for further calculations.10

Advantage score

An advantage score is defined to express the performance difference between the aggregated probability and possibility distributions. In order to determine the extent to which the probabilistic approach outperforms the possibilistic, the accuracies of the gained aggregated distributions are compared. Their means are calculated (in the possibilistic case, it corresponds to the centroid defuzzification method) and their distance from the correct value ( xcorrect ) is evaluated:

(5)
ep(x)=|E(paggr(x))xcorrect|
(6)
eπ(x)=|E(πaggr(x))xcorrect|
where ep and eπ refer to the absolute errors of the mean of the aggregated probability and possibility distributions, respectively, and E(·) represents the expected value of the function in the argument.

The difference of errors defines the advantage score ( sadv ) as:

(7)
sadv(x)=eπ(x)ep(x)fx

The normalization factor fx aims to bring the error differences to a common ground, if the x variables are scaled differently; it can be, e.g., equal to the domain width of x . In this work, this normalization operation was eliminated ( fx=1 ), as all variables were scaled equally in our case study. The sadv metric is positive if the probabilistic approach performs better than the possibilistic, and negative if the possibilistic approach shows more appropriate results.

Correlation analysis

Calculating correlation has two roles in this work. Firstly, it is used to characterize the relation between the interval length and accuracy of estimates about a variable, marked by r . Afterwards, having estimates about multiple variables, the correlation of this correlation and the advantage score belonging to each is calculated, denoted by R . The Pearson’s correlation coefficient is used for both cases.

The interval lengths and estimation errors of estimates about x from multiple experts are collected in d and e vectors, respectively:

(8)
d=[d1,d2,,dNe]
(9)
e=[e1,e2,,eNe]
where
(10)
de=xUexLe,e=1,,Ne
and ee absolute errors are defined as the deviation of the middle of the interval (if a point estimate should be given, supposedly this would be that) from the correct value:
(11)
ee=|xUexLe2xcorrect|,e=1,,Ne

The confidence-accuracy interdependence belonging to an x variable can be defined as the correlation between interval lengths ( d ) and estimation errors ( e ) of the estimates from multiple experts:

(12)
r=cov(d,e)σdσe
where the standard deviations of d and e are denoted by σd and σe , respectively.

If r shows a strong positive correlation, that means narrower interval estimates belong to higher expertise level. However, if there is a significant negative correlation, then the interval width increases with the level of expertise, which aligns with the Dunning-Kruger effect.

Having estimates about multiple variables ( xi,i=1,,Nx ), we can calculate the confidence-accuracy correlation and advantage score for each, collected them as:

(13)
r=[r1,r2,,rNx]
(14)
sadv=[sadv(x1),sadv(x2),,sadv(xNx)]

Their relationship can also be described by a correlation coefficient ( R ) numerically:

(15)
R=cov(r,sadv)σrσsadv

In this work, we wanted to explore whether the distribution of answers on the Dunning-Kruger curve (characterized by r ) has any effect on the performance difference between the probabilistic and possibilistic approaches ( sadv ). This potential effect is quantified by R .

Data source

Interval estimates are available about nine key variables of a manual waste sorting system, including product purity, yields and feed composition, each with a percentage dimension. Thereby, their values are limited to between 0% and 100%. The data collection was performed in a classroom setting in which 10 students (representing experts) were involved. Ethical approval for this study was obtained from the Institutional Research Ethics Committee of the University of Pannonia (approval number: KEB 2/2024. (12.03.)).

Results

The available interval estimates about the nine variables ( xi,i=1,,9 ) given by the ten experts ( Ne=10 ) are summarized in Figure 3.

1b72cf60-a995-4eb9-834b-435d813c9447_figure3.gif

Figure 3. Interval responses of the ten experts for the nine key variables ( x1x9 ).

The correct value is illustrated by yellow vertical lines. The correlation coefficients between interval width and estimation error ( r1r9 ) are also depicted.

It can be seen that the correlation is positive in only about half of the cases, thus the falsity of the claim that narrower estimates necessarily imply higher level of expertise is verified. A strong correlation was found only in one case ( x9 ) and a moderate correlation four times (two negative ( x1 , x8 ) and two positive ( x3 , x7 )). Estimates of four variables did not show a significant relationship between interval width and accuracy.

The aggregated probability and possibility distributions ( α=2% ) with their means and the related advantage scores can be seen in Figure 4. There are some cases where the probabilistic approach outperforms the possibilistic ( x2 , x3 and x9 ) and there are examples where the reverse is true ( x1 and x8 ).

1b72cf60-a995-4eb9-834b-435d813c9447_figure4.gif

Figure 4. Comparison of the performance of the probabilistic and possibilistic approaches.

The aggregated probability (blue) and possibility (red) distributions are shown. Their means are plotted by dotted lines in the corresponding color, and the correct values are illustrated by yellow vertical lines. The advantage scores ( sadv ) are also depicted.

Finally, the relationship between r and sadv(x) values was investigated, as illustrated in Figure 5. A strong linear correlation ( R=0.73 ) was detected between these two factors, which is an encouraging result in terms of making a decision between probabilistic and possibilistic approaches based on the tendency characterized by r . Although absolute errors ( ee ) are not available in practical cases to calculate r as the correct value is not known, we can have a guess about the correlation trend, e.g, knowing the interval widths and the relative competences of experts.

1b72cf60-a995-4eb9-834b-435d813c9447_figure5.gif

Figure 5. Relationship between the confidence-accuracy correlation ( r ) and the advantage score of the probabilistic approach ( sadv ).

The data points are labeled based on the variables ( x1x9 ) they belong to, and the line fitted to them are depicted by red dashed line. The correlation coefficient of r and sadv is R=0.73 .

Conclusions

Our results demonstrate that the performance difference between probabilistic and possibilistic modeling approaches strongly depends on the nature of the relationship between confidence (precision) and accuracy. If the distribution of answers on the Dunning-Kruger curve is known, the preferable approach can be defined derivatively as illustrated in Figure 6, where the red and blue points represent two different sets of estimates (e.g., about different variables).

1b72cf60-a995-4eb9-834b-435d813c9447_figure6.gif

Figure 6. Dunning-Kruger curve of expert-based interval estimates.

If the judgments show a negative correlation between accuracy and confidence, the possibilistic approach is preferable (red). However, if higher accuracy relates to narrower estimates, the probabilistic approach is recommended (blue).

Unfortunately, the type of confidence-accuracy correlation related to a certain task and a certain group of experts cannot be determined or predicted directly. It depends on several factors such as the nature of the task, the pressure of time and demand for informativeness, or the composition of the expert group. For now, it has not yet been clearly described as a function of measurable factors, which defines a research gap in the psychological field, and also provides a future research direction that has great practical relevance.

Although the dependence of the distribution of answers on the Dunning-Kruger curve, i.e., confidence-accuracy space, has not yet been explored completely, once this is done, users will be able to choose with great certainty between probabilistic and possibilistic approaches for modeling expert-based interval estimates. Until then, users are able to apply indirect solutions to deduce the nature of precision-accuracy trend, e.g., involving some a priori knowledge about the relative competence of experts in the group and comparing it with the precision of their estimates.

Ethical considerations

Ethical approval for this study was obtained from the Institutional Research Ethics Committee of the University of Pannonia (Approval number: KEB 2/2024. (12.03.)). Written and signed informed consent was obtained from all participants.

Code availability

Archived source code at time of publication: https://doi.org/ 10.5281/zenodo.16680269.11

License: CC BY 4.0

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 26 Aug 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Kenyeres É, Abonyi J and Kummer A. Probabilistic or possibilistic expert knowledge modeling? Dunning-Kruger curve helps to choose! [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:824 (https://doi.org/10.12688/f1000research.168801.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 26 Aug 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.