<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="data-paper" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.178856.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Data Note</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>RxPairEvid: an auditable, machine-learning-ready dataset of drug&#x2013;drug pairs with pharmacovigilance signal features and MedDRA PT-code rationales</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hashir</surname>
                        <given-names>Qadeer</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Asfand-E-Yar</surname>
                        <given-names>Muhammad</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Ali Shah</surname>
                        <given-names>Asghar</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0325-7579</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Shoukat</surname>
                        <given-names>Shabana</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Center of Excellence in Artificial Intelligence (CoE-AI), Department of Computer Science, Bahria University, Islamabad, 44000, Pakistan</aff>
                <aff id="a2">
                    <label>2</label>Department of Computer Science, Kateb University, Kabul, 1007, Afghanistan</aff>
                <aff id="a3">
                    <label>3</label>Medical ICU, Holy Family Hospital, Rawalpindi Medical University, Rawalpindi, Punjab, 46000, Pakistan</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:asghar.ali.shah@kateb.edu.af">asghar.ali.shah@kateb.edu.af</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>5</month>
                <year>2026</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2026</year>
            </pub-date>
            <volume>15</volume>
            <elocation-id>662</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>14</day>
                    <month>4</month>
                    <year>2026</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Hashir Q et al.</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/15-662/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>Drug&#x2013;drug interactions (DDIs) remain a major source of preventable harm, yet many computational DDI resources are hard to reproduce, difficult to audit, or constrained by redistribution licenses. We created RxPairEvid-50&#x00a0;K to provide a small, license-clean, model-ready matrix of canonical drug pairs with conservative pharmacovigilance signal summaries and a rationale pointer that supports human interpretation.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>RxPairEvid is derived from the FDA Adverse Event Reporting System (FAERS) by resolving drug mentions to a stable 14-character InChIKey stem (IK14), enumerating co-medication pairs per case, joining outcomes at the MedDRA Preferred Term (PT) code level, and computing disproportionality statistics (PRR, ROR, and the continuity-corrected lower 95% confidence bound of ROR) for each pair&#x2013;PT. Signals are rolled up to per-pair features under strict count floors (a_raw&#x2265;3, pair&#x2265;10, PT&#x00a0;&#x2265;&#x00a0;10). RxPairEvid-50&#x00a0;K is a deterministic stratified sample from the strict matrix.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>RxPairEvid-50&#x00a0;K contains 50,000 drug&#x2013;drug pairs with stable identifiers, strict-regime FAERS signal features, PT-code rationale pointers, and audit artifacts. It is intended to support benchmarking, label construction, and exploratory modeling of interaction risk with transparent, reproducible processing steps.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>drug&#x2013;drug interaction; pharmacovigilance; FAERS; MedDRA; disproportionality analysis; PRR; ROR; dataset</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec4" sec-type="intro">
            <title>Introduction</title>
            <p>Spontaneous reporting systems such as the FDA Adverse Event Reporting System (FAERS) can surface DDI-related safety signals at scale, but their reuse in machine learning is often limited by unstable drug identifiers and opaque, non-auditable label construction.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> RxPairEvid-50&#x00a0;K fills these gaps with a combination of unchanging canonical drug identifiers (IK14), conservative rollups of disproportionality, and a PT-code rationale pointer per pair that associates each pair with the individual most supportive adverse outcome to be reviewed by a human as rapidly as possible. Outcomes are represented at the Preferred Term (PT) level with the use of the MedDRA terminology.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> A compact pipeline overview is shown in 
                <xref ref-type="fig" rid="f1">
Figure 1</xref>, with further implementation detail in 
                <xref ref-type="fig" rid="f2">
Figures 2&#x2013;3</xref>.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>
Figure 1. </label>
                <caption>
                    <title>Graphical overview of the RxPairEvid-50&#x00a0;K pipeline: FAERS (2018 onward)&#x00a0;&#x2192;&#x00a0;ETL and IK14 harmonization &#x2192; pair&#x2013;PT contingency tables &#x2192; continuity-corrected PRR/ROR and ROR95_LCL&#x00a0;&#x2192;&#x00a0;strict per-pair roll-ups&#x00a0;&#x2192;&#x00a0;stratified deterministic sampling &#x2192; release bundle (CSV&#x00a0;+&#x00a0;schema + audits + codebook).</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/197294/a6b81d51-bfd8-40fc-b474-ef114ff3fd9a_figure1.gif"/>
            </fig>
            <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                <label>
Figure 2. </label>
                <caption>
                    <title>End-to-end processing workflow, comprising of identifier mapping, canonical pair ordering, MedDRA PT joins, per pair - PT signal computation, loose/strict roll-ups, and export of release.</title>
                    <p>Elements shown for TWOSIDES and DrugBank are optional layers in the authors&#x2019; internal build and are not redistributed in the public RxPairEvid-50&#x00a0;K release.</p>
                </caption>
                <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/197294/a6b81d51-bfd8-40fc-b474-ef114ff3fd9a_figure2.gif"/>
            </fig>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>
Figure 3. </label>
                <caption>
                    <title>Internal integration points for PostgreSQL integration view and feature layer attachment points (stg/core/features/labels/ref
).</title>
                    <p>The public release packages ddi_pairs_50k.csv with schema.sql, audits, and a codebook; the other sources in the diagram are attachment points for users who use licensed data sets from their original providers.</p>
                </caption>
                <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/197294/a6b81d51-bfd8-40fc-b474-ef114ff3fd9a_figure3.gif"/>
            </fig>
            <p>The public release is deliberately license-clean. It redistributes only derived FAERS signal features and does not redistribute third-party resources whose terms may restrict redistribution (e.g., DrugBank, KEGG, PDBbind). The accompanying PostgreSQL schema DDL documents attachment points for readers who obtain those resources from original providers.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
            </p>
        </sec>
        <sec id="sec5" sec-type="methods">
            <title>Methods</title>
            <p>RxPairEvid-50&#x00a0;K is a curated subset of a larger internal PostgreSQL database (ddi) that integrates multiple evidence layers for DDI modeling, including chemical structure (DrugBank, ChEMBL, PubChem), pathway and pharmacology (KEGG), targets and network biology (STRING, STITCH), protein&#x2013;ligand bioactivity (BindingDB, PDB/PDBbind), transcriptomics (LINCS), curated adverse reactions (SIDER), and TWOSIDES signals, alongside FAERS pharmacovigilance.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> For public dissemination we release a compact, information-rich 50,000-pair matrix focused on FAERS-derived signal features and rationale pointers, together with provenance and audit artifacts.</p>
            <sec id="sec6">
                <title>Data sources</title>
                <p>Redistributed source and derived outputs:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>FAERS quarterly files (DRUG/REAC/DEMO) from 2018 onward (time window specified in provenance.md).
                                <sup>
                                    <xref ref-type="bibr" rid="ref1">1</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>Referenced as optional attachment points (not redistributed in RxPairEvid-50&#x00a0;K):
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>MedDRA (PT text not redistributed; PT codes only).
                                <sup>
                                    <xref ref-type="bibr" rid="ref4">4</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>DrugBank interaction knowledge and drug properties.
                                <sup>
                                    <xref ref-type="bibr" rid="ref5">5</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>KEGG pathway annotations.
                                <sup>
                                    <xref ref-type="bibr" rid="ref6">6</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>PDB and PDBbind structural/binding evidence.
                                <sup>
                                    <xref ref-type="bibr" rid="ref7">7</xref>,
                                    <xref ref-type="bibr" rid="ref8">8</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>STRING and STITCH network context.
                                <sup>
                                    <xref ref-type="bibr" rid="ref9">9</xref>,
                                    <xref ref-type="bibr" rid="ref10">10</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>SIDER adverse-effect associations.
                                <sup>
                                    <xref ref-type="bibr" rid="ref11">11</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>LINCS L1000 transcriptomic profiles.
                                <sup>
                                    <xref ref-type="bibr" rid="ref12">12</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>ChEMBL, PubChem, BindingDB for chemistry/bioactivity crosswalks.
                                <sup>
                                    <xref ref-type="bibr" rid="ref13">13</xref>&#x2013;
                                    <xref ref-type="bibr" rid="ref15">15</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>RxNorm/ATC for normalization/stratification where available.
                                <sup>
                                    <xref ref-type="bibr" rid="ref16">16</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>TWOSIDES as an external signal set for ablations in the internal build.
                                <sup>
                                    <xref ref-type="bibr" rid="ref17">17</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec7">
                <title>Processing environment and reproducible schema</title>
                <p>The verified internal build ran on PostgreSQL 14.19 (Ubuntu) with a psql 18.0 client. The exported schema report records the database layout (schemas stg, core, features, labels, ml, ref
) and provides row estimates and column types for reproducibility. This build has 11,521 canonical drugs as core.drug, 1,073,256 canonical drug pairs as features.pair_features_all and 50,340 pairs satisfying strict floors (a_raw&#x2265;3, pair&#x2265;10, PT&#x00a0;&#x2265;&#x00a0;10) as seen in strict FAERS rollup table.</p>
            </sec>
            <sec id="sec8">
                <title>Canonical identifiers</title>
                <p>Every drug has a standardized identifier (IK14 identifier, the first 14 characters of the InChIKey) which can be used to stabilize joins across sources, and to enable decent leakage-aware evaluation splits. Pairs are represented in unordered keys (A&#x00a0;&lt;&#x00a0;B) to eliminate duplication and ambiguity in ranking.</p>
            </sec>
            <sec id="sec9">
                <title>FAERS ingestion, normalization, and mapping quality control</title>
                <p>FAERS DRUG quarterly and REAC quarterly records are broken out into normalized staging tables, with powerful management of typical encoding problems. Mentions of drugs are normalised (case folding, removal of punctuations, removal of dosage/form/route tokens) and compared to IK14 with token-based dictionaries and database indexes. In the verified build, there are 322,635 distinct normalized names in the normalized FAERS name table, and the final FAERS name-to-drug mapping table has 59,507 mapped names, which gives it an auditable interface to quality checks of the mapping (e.g., frequency-ranking inspection of unmapped names).</p>
            </sec>
            <sec id="sec10">
                <title>Disproportionality statistics and strict vs loose regimes</title>
                <p>We construct a 2 x 2 contingency table and calculate measures of disproportionality of each (drug A, drug B, PT). PRR is widely employed in signal generation of spontaneous-report data.
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> ROR is as well a popular tool and application of the lower confidence bound will assist in minimizing the false positives based on the limited number of counts.
                    <sup>
                        <xref ref-type="bibr" rid="ref3">3</xref>
                    </sup> We apply a Haldane&#x2013;Anscombe (+0.5) continuity correction before computing log-scale standard errors:
                    <disp-formula id="e1">

                        <mml:math display="block">
                            <mml:mi mathvariant="sans-serif">PRR</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mi mathvariant="sans-serif">a</mml:mi>
                                <mml:mo>/</mml:mo>
                                <mml:mrow>
                                    <mml:mo stretchy="true">(</mml:mo>
                                    <mml:mi mathvariant="sans-serif">a</mml:mi>
                                    <mml:mo>+</mml:mo>
                                    <mml:mi mathvariant="sans-serif">b</mml:mi>
                                    <mml:mo stretchy="true">)</mml:mo>
                                </mml:mrow>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                            <mml:mo>/</mml:mo>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mi mathvariant="sans-serif">c</mml:mi>
                                <mml:mo>/</mml:mo>
                                <mml:mrow>
                                    <mml:mo stretchy="true">(</mml:mo>
                                    <mml:mi mathvariant="sans-serif">c</mml:mi>
                                    <mml:mo>+</mml:mo>
                                    <mml:mi mathvariant="sans-serif">d</mml:mi>
                                    <mml:mo stretchy="true">)</mml:mo>
                                </mml:mrow>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
</disp-formula>

                    <disp-formula id="e2">

                        <mml:math display="block">
                            <mml:mi mathvariant="sans-serif">ROR</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mi mathvariant="sans-serif">a</mml:mi>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">b</mml:mi>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                            <mml:mo>/</mml:mo>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mi mathvariant="sans-serif">c</mml:mi>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">d</mml:mi>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
</disp-formula>

                    <disp-formula id="e3">

                        <mml:math display="block">
                            <mml:mi mathvariant="sans-serif">SE</mml:mi>
                            <mml:mspace width="0.25em"/>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mo mathvariant="sans-serif">log</mml:mo>
                                <mml:mspace width="0.25em"/>
                                <mml:mi mathvariant="sans-serif">ROR</mml:mi>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                            <mml:mo>=</mml:mo>
                            <mml:mtext mathvariant="sans-serif">sqrt</mml:mtext>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mn mathvariant="sans-serif">1</mml:mn>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">a</mml:mi>
                                <mml:mo>+</mml:mo>
                                <mml:mn mathvariant="sans-serif">1</mml:mn>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">b</mml:mi>
                                <mml:mo>+</mml:mo>
                                <mml:mn mathvariant="sans-serif">1</mml:mn>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">c</mml:mi>
                                <mml:mo>+</mml:mo>
                                <mml:mn mathvariant="sans-serif">1</mml:mn>
                                <mml:mo>/</mml:mo>
                                <mml:mi mathvariant="sans-serif">d</mml:mi>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
</disp-formula>

                    <disp-formula id="e4">

                        <mml:math display="block">
                            <mml:mi mathvariant="sans-serif">ROR</mml:mi>
                            <mml:mn mathvariant="sans-serif">95</mml:mn>
                            <mml:mo>_</mml:mo>
                            <mml:mi mathvariant="sans-serif">LCL</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mo mathvariant="sans-serif">exp</mml:mo>
                            <mml:mspace width="0.25em"/>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mo mathvariant="sans-serif">log</mml:mo>
                                <mml:mspace width="0.25em"/>
                                <mml:mrow>
                                    <mml:mo stretchy="true">(</mml:mo>
                                    <mml:mi mathvariant="sans-serif">ROR</mml:mi>
                                    <mml:mo stretchy="true">)</mml:mo>
                                </mml:mrow>
                                <mml:mo>&#x2212;</mml:mo>
                                <mml:mn mathvariant="sans-serif">1.96</mml:mn>
                                <mml:mo>&#x00d7;</mml:mo>
                                <mml:mi mathvariant="sans-serif">SE</mml:mi>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                            <mml:mo>.</mml:mo>
                        </mml:math>
</disp-formula>
                </p>
                <p>It has two regimes of evidence, coverage-oriented loose regime, and a conservative strict regime. Floors on minimum counts (a raw&gt;3, pair&gt;10, PT&#x00a0;&gt;&#x00a0;10) are imposed by the strict regime to eliminate instability in small counts and enhance interpretability of the retained signals.</p>
            </sec>
            <sec id="sec11">
                <title>Per-pair rollups and rationale pointer</title>
                <p>Pair-level features are produced by rolling up eligible PT rows and retaining maxima and coverage counts:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
n_faers_reports: co-report count for the pair.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
faers_prr_max_strict: maximum PRR across PTs under strict floors.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
faers_ror95_lcl_max_strict: maximum lower 95% bound of ROR across PTs under strict floors.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
faers_pt_covered_strict: number of distinct PTs meeting strict floors.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
faers_best_pt_code_strict: PT code corresponding to the maximum strict ROR95_LCL signal.</p>
                        </list-item>
                    </list>
                </p>
                <p>The PT-code pointer enables audit-friendly interpretation without redistributing MedDRA PT text.</p>
            </sec>
            <sec id="sec12">
                <title>Public subset construction (50,000 rows)</title>
                <p>RxPairEvid-50&#x00a0;K is deterministic stratified sample of strict FAERS rollup matrix. When there is a coarse ATC grouping; (i) ROR95 LCL bins, (ii) PT coverage bins, and (iii) a coarse ATC grouping are used to define the strata. The pairs of the stratum are ordered deterministically on the basis of md5(pair_id) and a predefined quota is sampled to obtain 50,000 rows. This enables the selection to be reproduced in rebuilds with the same strict matrix and strata definition.</p>
            </sec>
            <sec id="sec13">
                <title>Data records</title>
                <p>The information is stored on Mendeley Data
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup> and is available in flat files. 
                    <xref ref-type="table" rid="T1">
Table 1</xref> contains the major fields in the primary CSV (ddi_pairs_50k.csv), and 
                    <xref ref-type="table" rid="T2">
Table 2</xref> contains the files that were added in the Mendeley record. 
                    <xref ref-type="fig" rid="f1">
Figure 1</xref> gives an overview in high level of the RxPairEvid-50&#x00a0;K generation and release bundle, 
                    <xref ref-type="fig" rid="f2">
Figure 2</xref> gives details of the steps of the FAERS processing and roll-up on a strict regime prior to sampling, and 
                    <xref ref-type="fig" rid="f3">
Figure 3</xref> summarizes the internal database integration and optional attachment points of licensed resources.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>Main fields in ddi_pairs_50k.csv (RxPairEvid-50&#x00a0;K).</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Field</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Description</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">drug_a_ik14, drug_b_ik14
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Canonical drug identifiers using 14-character InChIKey stems (IK14).</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">a_name, b_name</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Preferred drug names for display.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">pair_id</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stable unordered pair key&#x00a0;=&#x00a0;LEAST (IK14_A, IK14_B)&#x00a0;+&#x00a0;&#x2018;::&#x2019;&#x00a0;+&#x00a0;GREATEST (IK14_A, IK14_B).</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">n_faers_reports
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">FAERS co-report count for the (A,B) pair.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">faers_prr_max_strict
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Maximum PRR across PTs under strict floors.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">faers_ror95_lcl_max_strict
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Maximum lower 95% CI bound of ROR across PTs under strict floors.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">faers_pt_covered_strict
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Number of distinct PT codes meeting strict floors for the pair.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">faers_best_pt_code_strict
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PT code corresponding to the strongest strict lower-bound signal.</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>Files deposited in the RxPairEvid Mendeley Data record.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">File</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Purpose</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ddi_pairs_50k.csv</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Primary dataset: 50,000 drug&#x2013;drug pairs with FAERS-derived strict rollups and PT-code rationale pointer.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">schema.sql</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PostgreSQL DDL (structure-only) to recreate tables and load the CSV consistently.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">codebook.md</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Field-level definitions and data types.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">provenance.md</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Pipeline notes including FAERS window (2018 onward), mapping rules, and audit guidance.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">audit_subset_signal_quantiles.csv</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Quantile summaries of key signal fields for validation and reporting.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">audit_subset_strata_counts.csv</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Counts per sampling stratum used to form the deterministic 50&#x00a0;K subset.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">checksums.txt</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">SHA-256 checksums for integrity verification.</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec14">
                <title>Data validation</title>
                <p>Validation is more about auditability and integrity, and not about predictive performance. They have been released with the sampling strata counts, signal quantile summaries, and SHA-256 checksums so they can be checked with the benefit of both integrity verification and rapid sanity checks. Moreover, high floors and confidence-bound screening minimize the small denominator instability, and rationale pointers are used to justify manual inspection of high-signaling pairs.</p>
            </sec>
            <sec id="sec15">
                <title>Reuse notes and limitations</title>
                <p>Signals derived by FAERS are associative and prone to reporting bias, confounding, and stimulated reporting and should not be used as causal data of a DDI. We suggest RxPairEvid-50&#x00a0;K since (i) it is a benchmarking dataset to signal-based modeling (ii) it is an evidence layer used to construct labels in multi-evidence pipelines (iii) it is a transparent baseline used in ablation analysis. To evaluate leakage-aware, split on the drug level (IK14-disjoint) and then form pair splits, and use PR-AUC as a chief measurement in class imbalance.</p>
            </sec>
        </sec>
        <sec id="sec16">
            <title>Ethics and consent</title>
            <p>Not applicable. RxPairEvid is derived from publicly available, de-identified secondary data sources and does not involve direct collection of human participant data.</p>
        </sec>
    </body>
    <back>
        <sec id="sec19" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec20">
                <title>Underlying data</title>
                <p>Mendeley Data: RxPairEvid doi:
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17632/zrvzpfmzcz.1">https://doi.org/10.17632/zrvzpfmzcz.1</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup>
                </p>
                <p>This project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
audit_subset_signal_quantiles.csv</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
audit_subset_strata_counts.csv</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>checksums.txt</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>codebook.md</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
ddi_pairs_50k.csv</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>provenance.md</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>README.md</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>schema.sql</p>
                        </list-item>
                    </list>
                </p>
                <p>Data is available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International licence</ext-link>.</p>
            </sec>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>We thank the clinicians and domain experts who advised on label design for internal tasks and the auditability of evidence fields.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="other">
                    <collab>FDA</collab>:
                    <article-title>FDA Adverse Event Reporting System (FAERS).</article-title>
                    <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/drugs/drug-approvals-and-databases/fda-adverse-event-reporting-system-faers-database">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Evans</surname>
                            <given-names>SJW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Waller</surname>
                            <given-names>PC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports.</article-title>
                    <source>

                        <italic toggle="yes">Pharmacoepidemiol Drug Saf.</italic>
</source>
                    <year>2001</year>;<volume>10</volume>(<issue>6</issue>):<fpage>483</fpage>&#x2013;<lpage>486</lpage>.
                    <pub-id pub-id-type="doi">10.1002/pds.677</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Puijenbroek</surname>
                            <given-names>EP</given-names>
                            <prefix>van</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Bate</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Leufkens</surname>
                            <given-names>HGM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions.</article-title>
                    <source>

                        <italic toggle="yes">Pharmacoepidemiol Drug Saf.</italic>
</source>
                    <year>2002</year>;<volume>11</volume>(<issue>1</issue>):<fpage>3</fpage>&#x2013;<lpage>10</lpage>.
                    <pub-id pub-id-type="pmid">11998548</pub-id>
                    <pub-id pub-id-type="doi">10.1002/pds.668</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>EG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wood</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wood</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>The medical dictionary for regulatory activities (MedDRA).</article-title>
                    <source>

                        <italic toggle="yes">Drug Saf.</italic>
</source>
                    <year>1999</year>;<volume>20</volume>(<issue>2</issue>):<fpage>109</fpage>&#x2013;<lpage>117</lpage>.
                    <pub-id pub-id-type="doi">10.2165/00002018-199920020-00002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Knox</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilson</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Klinger</surname>
                            <given-names>CM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DrugBank 6.0: the DrugBank knowledgebase for 2024.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2024</year>;<volume>52</volume>(<issue>D1</issue>):<fpage>D1265</fpage>&#x2013;<lpage>D1275</lpage>.
                    <pub-id pub-id-type="pmid">37953279</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkad976</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10767804</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kanehisa</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Furumichi</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sato</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>KEGG for taxonomy-based analysis of pathways and genomes.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2023</year>;<volume>51</volume>(<issue>D1</issue>):<fpage>D587</fpage>&#x2013;<lpage>D592</lpage>.
                    <pub-id pub-id-type="pmid">36300620</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkac963</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9825424</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <collab>wwPDB consortium</collab>:
                    <article-title>Protein Data Bank: the single global archive for 3D macromolecular structure data.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>(<issue>D1</issue>):<fpage>D520</fpage>&#x2013;<lpage>D528</lpage>.
                    <pub-id pub-id-type="doi">10.1093/nar/gky949</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The PDBbind database: methodologies and updates.</article-title>
                    <source>

                        <italic toggle="yes">J Med Chem.</italic>
</source>
                    <year>2005</year>;<volume>48</volume>(<issue>12</issue>):<fpage>4111</fpage>&#x2013;<lpage>4119</lpage>.
                    <pub-id pub-id-type="doi">10.1021/jm048957q</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Szklarczyk</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gable</surname>
                            <given-names>AL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lyon</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The STRING database in 2023: protein&#x2013;protein association networks and functional enrichment analyses.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2023</year>;<volume>51</volume>(<issue>D1</issue>):<fpage>D638</fpage>&#x2013;<lpage>D646</lpage>.
                    <pub-id pub-id-type="doi">10.1093/nar/gkac1000</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Szklarczyk</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Santos</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mering</surname>
                            <given-names>C</given-names>
                            <prefix>von</prefix>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>STITCH 5: augmenting protein&#x2013;chemical interaction networks with tissue and affinity data.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>;<volume>44</volume>(<issue>D1</issue>):<fpage>D380</fpage>&#x2013;<lpage>D384</lpage>.
                    <pub-id pub-id-type="pmid">26590256</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv1277</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4702904</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kuhn</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Letunic</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jensen</surname>
                            <given-names>LJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The SIDER database of drugs and side effects.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>;<volume>44</volume>(<issue>D1</issue>):<fpage>D1075</fpage>&#x2013;<lpage>D1079</lpage>.
                    <pub-id pub-id-type="pmid">26481350</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv1075</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4702794</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Subramanian</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Narayan</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Corsello</surname>
                            <given-names>SM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A next generation connectivity map: L1000 platform and the first 1,000,000 profiles.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2017</year>;<volume>171</volume>(<issue>6</issue>):<fpage>1437</fpage>&#x2013;<lpage>1452.e17</lpage>.
                    <pub-id pub-id-type="pmid">29195078</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2017.10.049</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5990023</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zdrazil</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Felix</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2024</year>;<volume>52</volume>(<issue>D1</issue>):<fpage>D1180</fpage>&#x2013;<lpage>D1192</lpage>.
                    <pub-id pub-id-type="pmid">37933841</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkad1004</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10767899</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PubChem 2023 update.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2023</year>;<volume>51</volume>(<issue>D1</issue>):<fpage>D1373</fpage>&#x2013;<lpage>D1380</lpage>.
                    <pub-id pub-id-type="pmid">36305812</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkac956</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9825602</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wen</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>BindingDB: a web-accessible database of experimentally determined protein&#x2013;ligand binding affinities.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2007</year>;<volume>35</volume>:<fpage>D198</fpage>&#x2013;<lpage>D201</lpage>.
                    <pub-id pub-id-type="pmid">17145705</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkl999</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1751547</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nelson</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zeng</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kilbourne</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Normalized names for clinical drugs: RxNorm at 6 years.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
</source>
                    <year>2011</year>;<volume>18</volume>(<issue>4</issue>):<fpage>441</fpage>&#x2013;<lpage>448</lpage>.
                    <pub-id pub-id-type="pmid">21515544</pub-id>
                    <pub-id pub-id-type="doi">10.1136/amiajnl-2011-000116</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3128404</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tatonetti</surname>
                            <given-names>NP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ye</surname>
                            <given-names>PP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Daneshjou</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Data-driven prediction of drug effects and interactions.</article-title>
                    <source>

                        <italic toggle="yes">Sci Transl Med.</italic>
</source>
                    <year>2012</year>;<volume>4</volume>(<issue>125</issue>).
                    <pub-id pub-id-type="doi">10.1126/scitranslmed.3003377</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hashir</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Asfand-e-Yar</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Algarni</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>RxPairEvid. Mendeley Data.</article-title>
                    <pub-id pub-id-type="doi">10.17632/zrvzpfmzcz.1</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report482736">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.197294.r482736</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Pirmohamed</surname>
                        <given-names>Munir</given-names>
                    </name>
                    <xref ref-type="aff" rid="r482736a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7534-7266</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bright</surname>
                        <given-names>Matthew</given-names>
                    </name>
                    <xref ref-type="aff" rid="r482736a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r482736a1">
                    <label>1</label>University of Liverpool, Liverpool, England, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>6</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Pirmohamed M and Bright M</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport482736" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.178856.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have produced a tool (RxPairEvid) to assess DDIs using data from FAERS.&#x00a0; Comments are provided below.&#x00a0;</p>
            <p> The paper is very hard to read and so it is difficult to see what they have done beyond some drug name standardisation and fairly standard data cleansing. They mention some sort of stratification scheme but this is very hazily described.</p>
            <p> </p>
            <p> Page 3: &#x2018;&#x2026;but their reuse in machine learning is often limited by unstable drug identifiers and opaque, non-auditable label construction.&#x2019;&#x00a0; It&#x2019;s unclear what this means, and the reference given is just to the FAERS database so it&#x2019;s unlikely to support the assertion.</p>
            <p> </p>
            <p> Page 3: &#x2018;For public dissemination we release a compact, information-rich 50,000-pair matrix focused on FAERS derived..&#x201d;: Various data sources are mentioned but it is unclear how they are used to produce a valid DDI data set. It&#x2019;s what &#x2018;signal features&#x2019; are used in FAERS signal features and rationale pointers, together with provenance and audit artifacts</p>
            <p> </p>
            <p> Page 3: &#x2018;FAERS DRUG quarterly and REAC quarterly records are broken out into normalized staging tables, with powerful management of typical encoding problems&#x2019;: It&#x2019;s not clear what this means &#x2013; how were the problems &#x2018;managed&#x2019;. What is a &#x2018;normalised staging table&#x2019;?</p>
            <p> </p>
            <p> Page 4: &#x2018;We construct a 2 x 2 contingency table and calculate measures of disproportionality of each (drug A, drug B, PT). PRR is widely employed in signal generation of spontaneous-report data.2 ROR is as well a popular tool and application of the lower confidence bound will assist in minimizing the false positives based on the limited number of counts.&#x2019;: PRR with just drug pair and PT count is inappropriate in the DDI setting &#x2013; it will show a signal if one of the drugs causes the effect alone, or if both do but there is no actual interaction. It would be better to use specifically designed statistics such as the Omega statistic [Nor&#x00e9;n, G.N, et al, Statist. Med., 27: 3057-3070. (2008)] or INTSS [Almenoff, June S., et al, 
                <italic>Pharmacoepidemiology and drug safety</italic> 12.6 (2003): 517-521]</p>
            <p> </p>
            <p> Page 5: &#x2018;RxPairEvid-50 K is deterministic stratified sample of strict FAERS rollup matrix.: More detail is needed on how the &#x2018;rollups&#x2019; were done. Why is the 
                <italic>maximum</italic> PRR/ROR figure selected from groupings?</p>
            <p> </p>
            <p> Page 5: &#x2018;a predefined quota is sampled to obtain 50,000 rows&#x2019;: How is this sampling done?</p>
            <p> </p>
            <p> Page 6: &#x2018;Validation is more about auditability and integrity, and not about predictive performance.&#x2019;:&#x00a0; If the claim is that this data is a genuine database of DDIs, then this statement needs further justification &#x2013; they would need to show that they function as a good set of controls.</p>
            <p> </p>
            <p> Page 7: &#x2018;Signals derived by FAERS are associative and prone to reporting bias, confounding, and stimulated reporting and should not be used as causal data of a DDI. We suggest RxPairEvid-50 K since&#x2026;&#x2019;: It is certainly the case that PRR/ROR signals would be unreliable (see above). It is hard to see how the dataset could be useful in any of the applications discussed if it is not, or at least unlikely to be, a list of actual DDIs.</p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>No</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>No</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Pharmacovigilance, clinical pharmacology</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
