ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Revised

Assessing hard and loose “endpoints”: comparison of patient and expert Bristol Stool Scale scoring of 2280 fecal samples

[version 2; peer review: 2 approved]
PUBLISHED 19 Dec 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background

Stool consistency is an important outcome measure to evaluate in the investigation of several gastrointestinal diseases. The Bristol Stool Scale (BSS) is one of the most commonly used tools for evaluation of stool consistency. BSS ranges from 1-7 and each score is assigned to a given consistency of the feces. Self-reported characterizations can differ from an expert evaluation, and the reliability of BSS is unclear. We aimed to evaluate the reliability of BSS by comparing patient scores with expert scores.

Methods

Patients with inflammatory bowel disease collected stool samples throughout a 3-year follow-up. The stool´s consistency was evaluated with BSS by the patients and matched with an expert score. Agreement between patient and expert scores was assessed using Cohen’s kappa.

Results

BSS scores from 2280 fecal samples collected from 992 patients at up to five time points were included. When all samples were compared, there was good to substantial agreement between patient and expert scores (Cohen’s weighted kappa: 0.66-0.72). When the BSS scores were simplified and categorized as 1 (scores 1-2), 2 (scores 3-5) or 3 (scores 6-7), the agreement improved slightly (Cohen’s weighted kappa: 0.73-0.77). When the scores from the first sample per patient were compared, the experts were more likely to assign higher scores compared to the patient. The proportion of the lowest assigned scores (1-2) was 12.1% for patients and 8.1% for experts.

Conclusions

The agreement between patient and expert BSS scores is good to substantial, especially when the BSS scores are simplified into three categories.

Keywords

Fecal sample, stool form assessment, diarrhea, constipation

Revised Amendments from Version 1

The following changes have been made from version 1 to version 2 in order to meet the Reviewers requests and comments:

  • The potential difference between IBS patients and IBD patients when it comes to self-reported assessment of stool consistency has been emphasized, as suggested by Reviewer 1.
  • The classification of Bristol stool category 5 as normal has been nuanced, as suggested by Reviewer 2.
  • The lack of data to calculate interobserver and intraobserver variabilities has been emphasized as a limitation, as suggested by Reviewer 2.
  • Stool frequency may influence stool consistency, and the lack of data on this variable has been mentioned as a limitation, as suggested by Reviewer 2.
  • The substantial agreement between patient reported BSS assessment of the entire stool delivery and expert evaluated BSS assessment of a small fecal sample has been emphasized (along with a suggestion to include photographs of the entire stool delivery for future research), as suggested by Reviewer 2.

See the authors' detailed response to the review by Friedemann Erchinger
See the authors' detailed response to the review by Bodil Ohlsson

Introduction

Stool consistency is a central component in the description of bowel habits and an important outcome measure to evaluate in the investigation of several gastrointestinal (GI) diseases.1 Altered bowel habits are often seen as a consequence of diseases in the GI tract, and the water content in the stools affecting the consistency can reflect the intestinal transit time. Rapid intestinal transit time limits the absorption of water and cause loose or liquid stools (diarrhea), whereas slow intestinal transit causes extensive water absorption and harder stools (constipation).2 The exact evaluation of stool consistency can be made by measurement of viscosity and stool water content, however this method requires cumbersome laboratory analyses and is not well suited for routine purposes. The form of the stool can serve as a proxy measure for the stool consistency, referring to the visually assessed shape and texture.3

The Bristol Stool Scale (BSS) is one of the most commonly used tools for evaluation of stool consistency.1 The BSS was developed in the early 1990s as a surrogate marker for whole-gut transit time, but is now a commonly recommended tool in clinical and research settings for the evaluation of stool consistency, rather than transit time.4,5 BSS is designed to standardize the reporting of stool consistency by a 7-point ranking system, ranging from hard lumps (Type 1 on the scale) to liquid stool (Type 7 on the scale). Types 1 and 2 are considered hard stools corresponding to constipation, whereas types 6 and 7 are considered abnormally loose and watery stools corresponding to diarrhea when evaluated together with other symptoms. Types 3, 4 and 5 are generally considered “normal” stool forms.

The generalizability of BSS as a tool in research is unclear, and fecal water content or dry weight have been suggested as better objective measures.5 Reporting can be inaccurate since subjects are asked to report their “average” stool form for the day, but bowel movements may vary in form throughout the day. Moreover, subjects’ interpretation of the BSS scale may vary, leading to inaccurate reporting of stool form. It is suggested that the BSS score is more precise when evaluated by a trained expert than by the patient.6 Here, we aimed to evaluate the reliability of the BSS in a cohort of patients with inflammatory bowel disease (IBD), by comparing the patients’ subjective scores with scores assigned by the experienced bioengineers who received the fecal samples.

Methods

Design and study sample

The study is a sub-study of the IBSEN III project (Clinical Trials ID: NCT 02727959), a population-based observational inception cohort with prospective follow-up, consisting of newly diagnosed IBD patients and symptomatic non-IBD controls living in the South-Eastern Health Region of Norway, included during a 3-year period from 2017 to 2019. The study protocol for IBSEN III has been presented in detail in a previous publication.7 Here, we present results from the fecal samples delivered at baseline, after 3 and 6 months and after 1 and 3 years follow-up.

Bristol stool scale

BSS is a 7-point ranking system, where types 1-2 and 6-7 correspond to abnormal defecation with constipation and diarrhea, respectively.1 For statistical analyses, the BBS score was simplified into the three categories “1 = constipation” (scores 1-2), “2 = normal” (scores 3-5) and “3 = diarrhea” (scores 6-7). This classification was chosen to correspond with the Rome IV delineation for the classification of irritable bowel syndrome (IBS) subtypes (IBS-D and IBS-C), being an applicable cut-off both in clinical practice and a research setting.8 It should be noted, however, that some authors consider a score of 5 as abnormal.

Fecal samples

Fecal samples were collected in the IBSEN III trial from first the inclusions in 2017 and all follow-ups completed through 2022. The BSS was first implemented after a year of inclusion, hence the fecal samples included in the current analysis are those collected from January 2018 until the end of 2022. Fecal samples were delivered fresh at the hospital, after the patients had collected stool samples at home. The patients were instructed to score the stool sample with the BSS at the time of collection. Then, they were instructed to send the sample to the hospital by mail, the same day as defecation or the day after. The hospital received the sample from one to four days after defecation, and then the expert score was performed immediately (upon freezing the sample). Two experienced bioengineers analyzing the fecal samples assigned the expert score. They divided the work between them, hence the expert score is from one expert only, and not a mean value. All the fecal samples included were assessed by both an expert and a patient.

Ethical considerations

The IBSEN III study has been reviewed and approved by the Regional Committee for Medical and Health Research Ethics in South-Eastern Norway (reference number 2015/946), and was conductance in accordance with the Declaration of Helsinki. All participants provided written informed consent.

Statistical analysis

Categorical data are described with counts and percentages. Agreement between experts and patients was assessed using Cohen’s weighted kappa.9 The results are expressed as point estimates with 95% confidence intervals (CI). P-values <0.05 were considered statistically significant. All analyses were performed using SPSS version 28 (https://www.ibm.com/spss).

Results

Patient and stool sample characteristics

The study protocol and cohort’s baseline characteristics are presented in detail in a previous publication.7 In total, 2970 stool samples from 1359 different patients were available for inclusion. The study subjects delivered stool samples at baseline (100%), after 3 months (48%), six months (34%), one year (26%) and three years (9%), respectively. Of the available stool samples, 2280 samples (77%) from 992 different patients had BSS scores from both patients and experts and were included in the analysis. 992 samples with matched patient and expert scores were available from baseline (100%), 513 samples from the 3-month follow-up (52% of baseline patients), 363 samples from the six-month follow-up (37% of baseline patients), 290 samples from the one-year follow-up (29% of baseline patients) and 122 samples from the three-year follow-up (12% of baseline patients).

Patients versus expert BBS scores

The agreement between patient and expert scores for all assessed time points was good to substantial when assessed with Cohen’s weighted kappa, ranging from 0.66 to 0.72 ( Table 1). Overall, the patients were likely to score lower than the experts. When the scores from the first sample per patient were compared between the patient and the expert ratings, the experts were more likely to assign higher scores compared to the patient’s self-assessment. The proportion of the lowest assigned scores (score 1-2) was 12.1% for patients and 8.1% for the experts. In Table 2, the scores given by the patients and experts are presented for the first assessment time point. The numbers in green represent perfect agreement between the experts and the patients. The numbers in yellow above the diagonal represent the situation when the sample is given a higher score by an expert (over-estimation) and the red numbers below the diagonal represent an under-estimation (experts score lower compared to the patients). The distribution of BSS scores reported by the patient for each value of score given by the expert is presented graphically in Figure 1.

Table 1. BSS score agreement between patients and experts assessed with Cohen’s weighted kappa.

Assessment time pointOriginal BSS scores (1-7)Simplified BSS scores (1-3)
Weighted kappa95%CIWeighted kappa 95%CI
1 (n = 992)0.720.69-0.750.780.74-0.81
2 (n = 513)0.660.61-0.710.750.69-0.81
3 (n = 363)0.710.66-0.760.730.66-0.80
4 (n = 290)0.720.66-0.780.740.66-0.82
5 (n = 122)0.740.66-0.820.740.63-0.85

Table 2. BSS score agreement between patients and experts at assessment point 1.

Expert BSS scores
123456 7
Patient BSS scores 113450200
234918111320
303139442450
401727557110
514773350
60216251630
70001331929
fe85a0a8-1a0d-4746-95df-3bcae8964b8b_figure1.gif

Figure 1. Distribution of BSS scored by patients for each value of score given by experts.

The figure shows that the patients were likely to score lower than the experts. BSS: Bristol Stool Scale.

BSS score simplified into three categories

When the BSS scores were simplified and all the samples categorized as 1 (scores 1 and 2), 2 (scores 3, 4 and 5) or 3 (scores 6 and 7), patient and expert agreement improved slightly, with Cohen’s weighted kappa ranging from 0.73 to 0.77, indicating substantial agreement for all the assessed time points ( Table 1). The distribution of all scores given by the patients and experts with the simplified scores and assessment point 1 are presented in Table 3. A perfect agreement between the experts and the patients is highlighted in green, the portion of the table highlighted in yellow represents over-estimation by the experts and the red number represents an under-estimation by the experts. The distribution of the BSS scored by the patient for each value of score given by the expert at the first assessment point when the BSS score was simplified into three categories is depicted in Figure 2.

Table 3. BSS score agreement between patients and experts at assessment point 1 using a simplified score.

Expert BSS scores
12 3
Patient BSS scores 169492
2959321
3236211
fe85a0a8-1a0d-4746-95df-3bcae8964b8b_figure2.gif

Figure 2. BSS scores categorized as 1, 2 and 3.

Score 1 equals original scores 1 and 2 (constipation), scores 2 equals original scores 3, 4 and 5 (normal), and score 3 equals original scores 6 and 7(diarrhea). The figure shows the distribution of the BSS scored by the patient for each score value given by the expert at the first assessment point when the BSS score was simplified into the three categories. BSS: Bristol Stool Scale.

Discussion

BSS is commonly used in both clinical and research settings for the evaluation of stool consistency, however its reliability is unclear as the scores rely on subjective patient reports. Here, we show that there is good to substantial agreement between patient BSS scores and expert BSS scores when evaluating individual scores from 1-7. However, this agreement is even better when scores are compared in simplified categories corresponding to constipation (BSS score 1-2), normal stool (BSS score 3-5) and diarrhea (BSS score 6-7). Our findings highlight that the patient´s exact BSS score is in good agreement with the expert score, and that the patient score is accurate for evaluation of main stool type (constipation, normal or diarrhea). As the main stool type is commonly the one of most interest for the classification of disease, and more subtle distinctions might not be of great clinical importance, we found BSS to be a reliable tool for stool form evaluation.

Several studies have previously reported on the validity and use of BSS. Our findings contradict those of a previous study investigating the rate reproducibility of BSS by comparing BSS scores of thirty-four gastroenterology providers of 35 different stool photographs.10 Here, they reported high reliability and agreement when BSS scores were used to assess the individual stool type from 1-7. When the experts should categorized the stool type into the three categories constipation, normal and diarrhea (corresponding to the Rome III standard8), the reliability and agreement decreased.10 Of note, in contrast to these prior results, our simplified into three groups was performed based on the original BSS scores and not independently assessed, hence this might explain the different findings. Also, our study compared subjective patient scores to expert scores, which is relevant as the BSS scoring is most often performed by the patients themselves.

A recent study from 2022 comparing subjective IBS patients´ BSS scores with stool water content as an objective measure reported only modest conformity between methods, highlighting that this can affect the classification of IBS-subtype according to BSS score.11 Similarly, a study from 2016 validated the BSS by measuring stool form in 169 healthy adults and comparing it to stool water content and BSS score form 19 patients with diarrhea-predominant IBS.6 They reported the BSS to demonstrate adequate validity and reliability, however they detected difficulties around clinical decision points for types 2, 3, 5 and 6.6 Although the authors report the BSS to be a valid tool for stool evaluation, it has to be taken into consideration that the uncertainty around scorings affects the classification of stool type into constipation, normal or diarrhea, potentially affecting the clinical outcomes and diagnosis of IBS-subtype. It should be emphasized that self-reported assessments may differ between IBS patients and IBD patients, since psychological factors may have a greater influence of IBS symptoms than IBD symptoms.

According to current literature highly variable results are thus presented for the validity of the BSS when comparing subjective measures with expert measures or fecal water content. BSS is widely used both in clinical and research settings, and is currently used in an increasing number of studies analyzing microbiota to adjust for differences in fecal consistency.5 The unclear generalizability of BSS represents a challenge to interpreting such results. Interestingly, a recent study comparing a smartphone application using artificial intelligence (AI) to BSS scores performed by two experts reported high accuracy between the AI and the experts. In addition, they reported the trained AI to be superior to subject self-reported BSS scores, emphasising that AI assessments could provide more objective outcome measures for stool characterization in gastroenterology.12 This should be taken into consideration in further studies, as it highlights a tool that can potentially improve the lack of reliability related to the subjective measures used in BSS today.

Our study has some limitations. As the follow-up period for the IBSEN III cohort is still not completed, we did not include stool samples from all subjects at all time points, and some patients were included with several samples, whereas others were only included with one sample. Due to the covid-19 pandemic, the 1-year follow-up was delayed for some of the subjects, hence the proportion of included study subjects with more than three stool samples is small. As the BSS is a subjective measure, the uneven distribution of the number of samples per participant might be a weakness affecting the results. When performing a stool sample, only a small part of the whole stool delivery is sampled for analysis. It is likely that parts of the discharge may have different consistencies and appearance, hence the part sampled for analysis might not be representative of the whole stool. This might be of particular importance when evaluating patients with GI disorders such as IBD and IBS, as the stool might be a mix ranging from hard lumps at first followed by looser stools. It would have been interesting to know the participants stool frequency, since the frequency may affect the consistency. Our study also only included two experts. It must be considered that a larger number of experts, or a mean of several expert scores, could have strengthened the results. Similarly, including an assessment of interobserver and intraobserver variabilities, both in the patient group and among the experts, would have strenghtened our study substantially.

Conclusion

Taken together, our findings show that the agreement between patient and expert BSS scores is good, especially when divided into three main stool categories. We found the BSS to be a reliable tool for the categorization of stool type when the BSS scores are categorized into three categories corresponding to the clinically relevant concepts of constipation, normal stool or diarrhea. Indeed, our finding that expert scoring of only a small fecal sample is in good agreement with the patient scoring of the entire stool delivery is a novel observation that should be validated in future studies where photographs of the entire stool delivery should also be included.

Ethical approval and consent

The IBSEN III study has been reviewed and approved by the Regional Committee for Medical and Health Research Ethics in South-Eastern Norway (reference number 2015/946, approval date 1st July 2015) and was conductance in accordance with the Declaration of Helsinki. All participants provided written informed consent.

Author contributions

HFD, JV, GHM and JTF planned and designed the study. MLH and VK included patients in the IBSEN III trial. GHM and JTF analysed the fecal samples and performed the expert BSS scoring. MCS performed the statistical analyses. HFD wrote the manuscript. JV, MCS, GHM and JTF contributed to data interpretation and critical revision of the manuscript. All authors reviewed and approved the final manuscript.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 25 Jul 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Dale HF, Hagen M, Malmstrøm GH et al. Assessing hard and loose “endpoints”: comparison of patient and expert Bristol Stool Scale scoring of 2280 fecal samples [version 2; peer review: 2 approved]. F1000Research 2024, 13:833 (https://doi.org/10.12688/f1000research.152496.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 19 Dec 2024
Revised
Views
2
Cite
Reviewer Report 24 Dec 2024
Friedemann Erchinger, University of Bergen, Bergen, Norway;  Kanalspesialistene AS, Bergen, Norway 
Approved
VIEWS 2
No ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Erchinger F. Reviewer Report For: Assessing hard and loose “endpoints”: comparison of patient and expert Bristol Stool Scale scoring of 2280 fecal samples [version 2; peer review: 2 approved]. F1000Research 2024, 13:833 (https://doi.org/10.5256/f1000research.175975.r350717)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 25 Jul 2024
Views
9
Cite
Reviewer Report 11 Dec 2024
Friedemann Erchinger, University of Bergen, Bergen, Norway;  Kanalspesialistene AS, Bergen, Norway 
Approved with Reservations
VIEWS 9
The article describes agreement between patients and experts when scoring stool consistency after the Bristol Stool Scale.
Using Cohen’s weighted kappa for all scales 1 to 7 or grouped scales, constipation (1-2) normal (3-5) and diare (6,7), agreement was substantial; ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Erchinger F. Reviewer Report For: Assessing hard and loose “endpoints”: comparison of patient and expert Bristol Stool Scale scoring of 2280 fecal samples [version 2; peer review: 2 approved]. F1000Research 2024, 13:833 (https://doi.org/10.5256/f1000research.167263.r344492)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 19 Dec 2024
    Jørgen Valeur, Unger-Vetlesen Institute, Lovisenberg Diaconal Hospital, Oslo, Norway
    19 Dec 2024
    Author Response
    We would like to thank the Reviewer for taking the time to read and comment on our paper, and are grateful for both the encouraging comments, as well as the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 19 Dec 2024
    Jørgen Valeur, Unger-Vetlesen Institute, Lovisenberg Diaconal Hospital, Oslo, Norway
    19 Dec 2024
    Author Response
    We would like to thank the Reviewer for taking the time to read and comment on our paper, and are grateful for both the encouraging comments, as well as the ... Continue reading
Views
8
Cite
Reviewer Report 24 Aug 2024
Bodil Ohlsson, Clinical Sciences, Lund University, Malmo, Sweden 
Approved
VIEWS 8
The study aimed to compare the assessment of coherence between the patients and the experts scores regarding Bristol Stool Scale (BSS). 2280 fecal samples collected from 992 patients with inflammatory bowel disease (IBD) at up to five time points for ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ohlsson B. Reviewer Report For: Assessing hard and loose “endpoints”: comparison of patient and expert Bristol Stool Scale scoring of 2280 fecal samples [version 2; peer review: 2 approved]. F1000Research 2024, 13:833 (https://doi.org/10.5256/f1000research.167263.r315298)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 19 Dec 2024
    Jørgen Valeur, Unger-Vetlesen Institute, Lovisenberg Diaconal Hospital, Oslo, Norway
    19 Dec 2024
    Author Response
    We would like to thank the Reviewer for taking the time to read and comment on our paper, and are grateful for both the encouraging comments, as well as the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 19 Dec 2024
    Jørgen Valeur, Unger-Vetlesen Institute, Lovisenberg Diaconal Hospital, Oslo, Norway
    19 Dec 2024
    Author Response
    We would like to thank the Reviewer for taking the time to read and comment on our paper, and are grateful for both the encouraging comments, as well as the ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 25 Jul 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.