This collection has now closed to submissions.
Large administrative data sources have played a critical role in epidemiologic research, and in pharmacoepidemiology in particular. Along with newer electronic health records and other sources of data, these so-called ‘real-world data’ provide opportunities to generate evidence about medications and other interventions. Machine learning tools developed in recent years have further expanded the toolkit for use of these data.
Multi-database studies, incorporating data from several sources, expand sample size and capacity to address novel questions. Design and analysis of observational studies in large datasets requires attention to core statistical and epidemiologic principles.  Modern machine learning methods provide opportunities for new analysis but must be implemented respecting core principles.
 
Using the F1000Research publishing model, we aim to curate a comprehensive collection of current research on the challenges involving high-dimensional data sources in epidemiology. Topic of interest include, but are not limited to, pharmacoepidemiology, drug safety, public health issues and policy, aging epidemiology and clinical epidemiology.
This collection highlights analytical methods, study design, generalizability, and reproducibility of high-dimensional data. We welcome a full range of article types including research articles, brief reports, data notes, method articles, registered reports, reviews, software tool articles, case studies, living systematic reviews and many more.
 
Keywords: targeted learning, machine learning for prediction, high-dimensional propensity scores, multi-database studies, reproducibility, epidemiology
 
Deadline for submissions: 30th of July 2023
 
This collection is part of the 
Global Public Health Gateway.
Any questions about this collection? Please get in contact directly with Maxine Dillon (maxine.dillon@tandf.co.uk)