ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Data Note

Dataset: A consolidated and harmonised Verbal Autopsy dataset from Health and Demographic Surveillance Sites in South Africa

[version 1; peer review: 1 approved, 2 approved with reservations]
PUBLISHED 19 May 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Health Services gateway.

Abstract

This data note provides details of the development of a Verbal Autopsy (VA) dataset produced with the South African Population Research Infrastructure Network (SAPRIN) drawing on datasets from health and socio-demographic surveillance sites’ (HDSS) ‘ covering a population of over 250,000 in two rural provinces in South Africa for the period 2012-2019. The purpose of the data set was to refine an analytical tool within VA, which provides unique information on care seeking and utilisation at and around the time of death complementary to that of medical cause of death. On an individual basis, the dataset includes demographic data, probable cause of death data, and data on care seeking and utilisation at or around the time of death drawn from longitudinal population cohorts. The purpose of this publication is to describe both the dataset and methods in formatting and processing the data for other researchers who may be interested in similar data. The data described in this paper are available to be requested from the respective HDSS repositories.

Keywords

South Africa; Verbal Autopsy; Cause of death; Circumstances of Mortality

Introduction

Every year, the medical causes of approximately 30 million deaths, half of all deaths worldwide, are not formally registered1. These deaths occur predominantly in low- and middle-income countries where there is a lack of complete and functioning civil registration and vital statistics (CRVS) systems2. Verbal autopsy (VA) is currently the only realistic alternative to medical certification of deaths in settings where CRVS is incomplete or absent. VA is a pragmatic survey-based method in which trained fieldworkers gather information from final caregivers on signs and symptoms of the deceased prior to death. VA data are then interpreted, by physicians or computer models, to determine probable cause(s) of death3. The method is used to quantify levels and causes of death in otherwise unregistered populations. The World Health Organization (WHO) leads the development of international standards for VA.

This data note provides details of the development of a Verbal Autopsy dataset produced with the South African Population Research Infrastructure Network (SAPRIN) drawing on datasets from health and socio-demographic surveillance sites’ (HDSS. The purpose of the data set was to refine an analytical tool within VA, which provides unique information on care seeking and utilisation at and around the time of death complementary to that of medical cause of death.

Acknowledging the social determinants of heath as the fundamental causes of avoidable mortality and health inequalities, we sought to develop a systematic and scalable categorization system for circumstantial drivers of deaths4. We previously devised an approach within VA tools called Circumstances of Mortality Categories (COMCAT)5. The system is designed for large scale population assessment of burden of disease inclusive of the needs and behaviours of individuals and the responsiveness of the health system towards these6. For example, a woman whose cause of death is assigned as obstetric haemorrhage might have died at home, while another woman with the same cause of death might have been inadequately managed despite reaching a facility. Measuring these scenarios at population level will provide important information for health services and reducing avoidable mortality.

The development of the COMCAT model began with the supplementation of existing interview questions on medical causes of death, to include input questions on care seeking and utilisation at and around the time of death, which were taken up in the 2012 WHO VA standard7. From this, models were developed within existing automated VA data interpretation tools to assign likelihoods to circumstantial categories for each death on: emergencies, recognition of illness severity, use of traditional medicine, accessing care, and perceptions of poor quality of care5.

This paper describes the collation and formatting of a mortality dataset from Health and Demographic Surveillance Sites (HDSS) in South Africa for use in refining the COMCAT system. HDSS are geographically defined populations that undergo continuous demographic monitoring. All vital events, such as births and deaths, are regularly recorded to track population change and highlight health and social care priorities8. The dataset harmonises and links routinely collected VA data from the South African Population Research Infrastructure Network (SAPRIN). SAPRIN is a national research infrastructure funded by the National Department of Science and Innovation that aims to harmonise and integrate South Africa’s HDSSs.

Methods

Each HDSS had a specific VA questionnaire that, since 2012, is broadly based on the WHO-2012 or WHO-2016 standard. VA data are collected electronically at household level by trained fieldworkers. Trained fieldworkers select responses to the questions from a specified set of answers, with logical skips and validation rules consistent with the WHO standard. Data quality control is carried out on al captured questionairres by specific HDSS team supervisors using either RedCap or Survey Solutions. We obtained all VA data, from the three HDSS’ included in the SAPRIN Network that had been collected on deaths that occurred from 2012 onwards. This was in order to increase the likelihood of inclusion of the COMCAT data, which were included in the WHO standard since 2012.

As each HDSS has a unique VA questionnaire, we aligned each of the HDSS’ questionnaires and potential responses to the WHO-2016 standard. As the VA interpretation tools are based on the WHO standard, in doing this we ensured the required indicators were available to utilise both a VA data formatting packages (PyCrossVA) and one of the automated VA interpretation tools to generate probable cause of death. A common data specification was developed that would retain maximum information but allow us to utilise one of the VA interpretation tools. VA interpretation tools use mathematical formulae, such as Bayes theorem, to calculate the probability of cause of death from a prior set of probabilities relating to input indicators, from the VA questionnaire9.

After formation of the data specification, data were examined, as detailed above, to ensure the dataset included the indicators required to be processed in a VA interpretation tool to output both a reliable probable cause of death and COMCATs. A variety of additional indicators to the WHO standard had been included in the different sites’ questionnaires. These indicators were not included in the consolidated dataset as they are not required for the automated VA tool. However, individual case ID remained consistent throughout and these additional indicators could be included from the original dataset if of interest after the data had been processed by the VA interpretation tool. At this stage, we excluded one of the HDSSs, DIMAMO, as they did not have relevant data on the COMCAT input indicators. Data were then recoded and renamed in line with the newly developed data specification, this was done in pyCrossVA, a Python package (Python Programming Language, RRID:SCR_008394) developed to format VA data from WHO standard into the format for use in the desired VA interpretation tools. At this stage, we processed the data using the InterVA-5.1 interpretation tool in R 3.61 (R Project for Statistical Computing, RRID:SCR_001905). InterVA-5 was selected as this is currently the only tool that will output COMCATs, and refining these was the objective for the use of the data.

At all stages, data were processed individually by HDSS’. After the data had been processed through InterVA-5.1 we then added an additional variable of HDSS name to allow us to differentiate these by location before appending the two datasets. The final data set included records of 7980 deaths, 5924 and 2056 from Agincourt and AHRI HDSS respectively, for the period of 2012–19, and consisted of 25 variables detailing, basic demographics, probable cause of death, COMCAT and COMCAT input indicators.

The data were subject to consistency checks in InterVA-5.1. These are carried out before probable causes of death are determined for each individual death, where possible errors will be adjusted by InterVA-5.1 using other questions. These generate warning messages that can be interpreted by researchers. For example, a record of a male that has identified as pregnant will generate a warning message and, depending on the other information available, one of these inputs (i.e. male or pregnant) will be deemed an error and corrected by InterVA-5.1. Further to this, we excluded those aged over 100 years due to the unreliability of the data given the average life expectancy in the region.

Software availibility

Software packages used to both format and process VA data are all open source and are available from the following ‘https://github.com/verbal-autopsy-software’. These packages also contain functions to analyse VA data.

Data availability statement

The data described in this study cannot be made available to the public in an open repository due to the sensitive nature of the data. However, the data are available to be requested from SAPRIN or the respective HDSS repositories. Requests for the data can be made at the following link https://saprindata.samrc.ac.za/index.php/catalog/33.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 19 May 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Cowan E, D'Ambruoso L, Price J et al. Dataset: A consolidated and harmonised Verbal Autopsy dataset from Health and Demographic Surveillance Sites in South Africa [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2023, 12:520 (https://doi.org/10.12688/f1000research.55377.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 19 May 2023
Views
2
Cite
Reviewer Report 13 Sep 2024
Tom Smith, Swiss Tropical and Public Health Institute, Basel, Switzerland 
Approved with Reservations
VIEWS 2
The article provides a well written description of the rationale for the dataset. If anything, the importance of this is understated, since analysis of such dataset is crucial for understanding how to reduce mortality across the world.

... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Smith T. Reviewer Report For: Dataset: A consolidated and harmonised Verbal Autopsy dataset from Health and Demographic Surveillance Sites in South Africa [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2023, 12:520 (https://doi.org/10.5256/f1000research.58945.r174907)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
8
Cite
Reviewer Report 06 Feb 2024
Bruno Masquelier, Université Catholique de Louvain, Louvain, Belgium 
Approved with Reservations
VIEWS 8
I read this short article with interest, as it deals with an important subject. I agree with the authors on the importance of complementing traditional verbal autopsies with additional, standardized information on the circumstances surrounding death, regarding the social determinants ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Masquelier B. Reviewer Report For: Dataset: A consolidated and harmonised Verbal Autopsy dataset from Health and Demographic Surveillance Sites in South Africa [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2023, 12:520 (https://doi.org/10.5256/f1000research.58945.r237763)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
7
Cite
Reviewer Report 21 Jun 2023
Tathagata Bhattacharjee, Department of Population Health, London School of Hygiene and Tropical Medicine, London, UK 
Approved
VIEWS 7
The paper brings out an important aspect for the need to prepare datasets for VA analysis. The method is crisply explained. However, sharing a sample anonymized dataset would have been more appreciated along with some sample code implementations for more ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bhattacharjee T. Reviewer Report For: Dataset: A consolidated and harmonised Verbal Autopsy dataset from Health and Demographic Surveillance Sites in South Africa [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2023, 12:520 (https://doi.org/10.5256/f1000research.58945.r174908)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 19 May 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.