ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article

Using the “Uniform Scale” to facilitate meta-analysis where exposure variables are qualitative and vary between studies – methodology, examples and software

[version 1; peer review: 1 approved with reservations]
PUBLISHED 22 Jan 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Meta-analyses often combine covariate-adjusted effect estimates (odds ratios or relative risks) and confidence intervals relating a specified endpoint to a given exposure.  Standard techniques are available to do this where the exposure is a simple presence/absence variable, or can be expressed in defined units.  However, where the definition of exposure is qualitative and may vary between studies, meta-analysis is less straightforward.  We introduce a new “Uniform Scale” approach allowing expression of effect estimates in a consistent manner, comparing individuals with the most and least possible exposure. 
 
In 2008, we presented methodology and made available software to obtain estimates for specific pairwise comparisons of exposure, such as any versus none, where the source paper provides estimates for multiple exposure categories, expressed relative to a common reference group.  This methodology takes account of the correlation between the effect estimates for the different levels.  We have now extended our software, available in Excel, SAS and R, to obtain effect estimates per unit of exposure, whether the exposure is defined or is to be expressed in the “Uniform Scale”.  Examples of its use are presented.

Keywords

systematic review, meta-analysis, contrast, dose response

Introduction

Results from individual studies relating an exposure of interest to risk of a disease are often recorded as a set of covariate-adjusted effect estimates (odds ratios (ORs), or relative risks (RRs)), with 95% confidence intervals (CIs), for differing levels of exposure relative to a reference (base) level. Associated with this, data on the numbers of subjects are often presented. For case-control or cross-sectional studies these are subdivided by presence of disease. For prospective studies the numbers with disease and the numbers at risk are typically presented.

For the purpose of conducting meta-analyses, it is often the situation that meta-analysts with no access to the raw data of a study require estimates of covariate-adjusted ORs or RRs for pairwise comparisons other than those presented. For example, if the base level (0) is never smokers, and levels 1 to 4 are, respectively, former smokers and current smokers of 1–10, 11–20 and 21+ cigarettes per day, one may wish to derive estimates for current vs never (levels 2 to 4 combined vs level 0), current vs non (levels 2 to 4 combined vs levels 0 and 1 combined) or current vs former (levels 2 to 4 combined vs level 1). As the effect estimates (OR or RR) for each level are not independent, having a common reference level, one cannot derive these estimates straightforwardly. Thus, for example, using fixed-effects meta-analysis to combine effect estimates for levels 2 to 4 to get an estimate for current smoking is not correct.

A solution to this problem, described in a paper we wrote in 2008 (Hamling et al., 2008), is based on methodology developed much earlier by Greenland & Longnecker (1992). Where there are k exposure levels, the method involves deriving, using the effect estimates and their 95% CIs together with the marginal totals of numbers of subjects, a set of pseudo-numbers for the relevant 2 x (k + 1) table. These pseudo-numbers (which have no direct meaning by themselves) produce the same effect estimates and 95% CIs for comparison with the reference level, and can be combined as appropriate to produce an adjusted estimate for any pairwise comparison of different sets of levels. Our earlier paper (Hamling et al., 2008) gives examples of the methodology in action. That paper not only shows how relative effect estimates can be derived for alternative comparisons, but also presents methodology for deriving alternative comparisons when results are given by categories of disease rather than categories of exposure. It also makes available software to derive the necessary estimates using both an Excel and a SAS implementation. These implementations also produce chi-squared and p-values for heterogeneity and trend corresponding to the table of pseudo-numbers of subjects and based on trend coefficients entered by the user, using formulae 4.38 and 4.39 of Breslow & Day (1980) for case-control studies and modified versions of these formulae for prospective studies described in their later publication (Breslow & Day, 1987).

While meta-analyses are often carried out for a specific comparison of exposure, such as current smokers vs never smokers, one often wishes to quantify the effect per unit exposure. Where exposure is measured in a consistent way in each study, and is known for each level of exposure considered in each study, standard techniques are available (described in the Methods section) to derive such trend estimates. However, where exposure may be measured in various ways, and is only defined semi-quantitatively (e.g. high, medium, low) this is not the case.

Here we introduce a new, “Uniform Scale”, approach to deal with this problem. It is based on the assumption that the exposures range from 0 (least possible) to 1 (most possible), and that the N participants in any study have equally spaced exposures ranging from 1/(2N) to 1−1/(2N). By attempting always to derive an effect estimate relating to a difference of 1 unit of exposure, this “Uniform Scale” approach allows the combination of effect estimates using different measures of an underlying common exposure.

We describe how effect estimates using this “Uniform Scale” approach can be derived.

We also make available an extended version of our software to allow estimation of trend estimates, whether the exposure is quantified in standard units or expressed in the “Uniform Scale”. This extended software, now available also in R as well as in Excel and SAS, can be used for cross-sectional studies as well as for case-control and prospective studies. The new software avoids problems some users had with the original Excel spreadsheet in generating the pseudo-numbers, due to the solver routine used, a Microsoft add-in. The new Excel software is written mainly in Visual Basic, to allow matrix inversion for matrices of variable size (which is needed for the new methodology but is not readily possible using Excel formulae). Note that the extensions of the software which relate to the inclusion of trend estimates make little sense for data subdivided by level of disease, the software here being therefore essentially unchanged.

Methods

Including effect estimates per unit of exposure

Given a set of pseudo-numbers for the levels of exposure for a study, together with an estimate of the mean exposure for each level, it is often required to estimate the effect (and 95% CI) corresponding to a 1 unit increase in exposure. This enables the meta-analyst to produce combined estimates over studies based on effect estimates expressed for differing levels of exposure. Thus, for example, if interest is in the risk increase per cigarette smoked, one can combine these trend estimates over study, even when one study might report results by levels of, say, 0, 1–5, 6–10, 11–15, 16–20, 21–30 and 31+ cigarettes per day and another reports results by levels of 0, 1–19, 20 and 21+ cigarettes per day. Provided one can derive reasonable mean consumption estimates for each exposure level and one assumes that there is a linear relationship between dose and the logarithm of the effect estimate, these trend estimates of risk increase per cigarette per day can then be readily combined in a meta-analysis.

The methodology used for case-control and cross-sectional studies is that described by Berlin et al, (1993), together with the correction for the non-independence of results by exposure level given by Greenland & Longnecker (1992). Orsini et al. (2012) provided the modifications to be used for prospective studies. The method first derives the variance of the effect estimate for each exposure level using the width of its 95% CI. The table of pseudo-numbers is then used to estimate the correlation of pairs of results, those values then being used, together with the variance values, to estimate the covariance of each pair. The variance-covariance matrix is then inverted, and used, together with the effect estimates and dose values (mean exposure levels) to estimate beta, the coefficient of the relationship between the dose and the logarithm of the effect estimate, and its variance. Finally the values of beta and its variance are exponentiated to give the rate of increase in the effect estimate per unit increase in dose.

The methodology described above assumes that the unexposed group has a dose value of zero. If the unexposed dose is a non-zero value, this value is subtracted from each of the dose values specified. Subtracting the same value from each dose value does not change the slope of the relationship, so the estimate of beta is unaffected.

Including effect estimates for the “Uniform Scale”

In some situations, effect estimates are presented by level of exposure where the level is merely expressed as, for example, low, medium or high, with no quantitative estimate of the extent of exposure. An example is data relating initiation of smoking in adolescents to “connectedness” (a feeling of belonging to or having affinity with a person, social group or organisation), where connectedness may be measured in various different ways, e.g. connectedness to school, connectedness to parents, or social connectedness. If these measures all relate to a common underlying scale, one could consider combining effect estimates for the various measures in a single meta-analysis. But what scale could be used?

One possible approach, and the one we suggest here, is to imagine an underlying “Uniform Scale”, where 0 indicates the least possible connectedness and 1 the most possible connectedness, with the population considered to be made up of N individuals with equally spaced scores ranging from 1/(2N) to 1−1/(2N). Thus, if there are 100 individuals in total, the individuals would have scores of 0.005, 0.015, 0.025 … 0.975, 0.985 and 0.995, with a mean score of 0.5. If there are 30 individuals in the low group, 50 in the medium group and 20 in the high group, the mean scores in the three groups would then be 0.15, 0.55 and 0.90. Or more formally, with a total of N subjects divided into k exposure groups of size N1, N2, … Nk respectively, the mean scores in the k groups would be 1/N times, respectively, N1/2, N1 + N2/2, N1 + N2 + N3/2, N1 + N2 + N3 + N4/2, and so on.

These scores are estimated from the numbers of controls for case-control studies, from the numbers at risk for prospective studies and from the total numbers (of cases and non-cases) for cross-sectional studies. The method of estimating the increase in risk per unit exposure is then identical to that described in the previous section.

Note that, within the software provided, the estimated rate of increase given for the “Uniform Scale” is based on scores derived from the table of pseudo-numbers. If the user wishes to see the estimate based on scores derived from the actual distribution, these scores have to be calculated by the user, and then entered as the dose values, the estimated rate of increase then appearing as the “Dose as entered” estimate per unit exposure. The software gives the rate of increases in risk per unit exposure for both methods, so that the user can decide which method is more appropriate and so select the relevant results.

Including studies that provide other forms of dose assessment in “Uniform Scale” meta-analysis

Sometimes results by dose may be presented in ways for which the “Uniform Scale” values can be calculated without the need to use the table of pseudo-numbers or the numbers of individuals in the study.

In some studies, the effect estimate presented relates to a comparison of two groups, one with low and one with high exposure, where the groups together cover the whole population. To include these in “Uniform Scale” meta-analysis one should square the effect estimate (and its 95% CI). To demonstrate this, assume that a proportion x has the high value, and 1−x the low value. As the mean scores for the low and high value are respectively (1−x)/2 and 1−x/2, they differ by 0.5, regardless of x, and as a linear relationship is assumed between the score and the logarithm of the effect estimate, the effect estimate should be raised to the power (1/0.5), i.e. squared.

Note that where the source only provides an effect estimate for high vs low exposure, with intermediate exposures possible, application of the “Uniform Scale” methodology requires additional assumptions. Even if, for example, high and low represent the upper and lower thirds of the distribution, giving mean scores of 0.167 and 0.833, raising the effect estimate to the appropriate power, 1/(0.833–0.167) = 1.5, would assume that the dose-response relationship adequately fitted the complete data, although information on the effect for the intermediate exposure was not provided.

In other studies, the effect estimate (and CI) may be presented in relation to a one point increase in a continuous scale ranging from 0 (least possible exposure) to m (most possible exposure). Here the effect may be simply converted to our “Uniform Scale”, with a range of 0 to 1 corresponding to the difference between least and most possible exposures, by raising the effect estimate (and CI) to the mth power.

Where the effect estimate (and CIs) presented relates to a one level increase in a scale with m levels, where each level relates to a range of exposures (e.g. five levels with 1=very low, 2=low, 3=medium, 4=high and 5=very high), approximate effect estimates (and CIs) may also be obtained by mth power transformation. Thus, in the example with 5 levels and a range of 4, transforming to the 5th power is appropriate. If the levels represent m-tiles, this approximation would be exact, as a one level increase would be equivalent to an increase of 1/m in the “Uniform scale”.

In the situations described above in this section the software we provide is not required, as the meta-analyst can readily convert effect estimates (and 95% CIs) to the required “Uniform Scale” by simply raising them to the required power.

In these calculations of the “Uniform Scale” effect estimates, as illustrated in the Results section below, it is sometimes necessary to know the standard deviation (SD) of the N individual values on the “Uniform Scale”, 1, 3, 5, 7…. (2N−1), each divided by 2N. As the numbers are equally spaced, the mean is 0.5, so that Ʃx = N/2. To estimate the SD we also need Ʃx2. Since the sum of squares of the first N odd numbers is N(2N+1)(2N−1)/3, we can divide this by 4N2 to get the required value of Ʃx2. For large N, Ʃx2 = N/3. Based on the formula for the SD, it can then be shown that for large N, the SD is 1/√12 or 0.2887. This approximation is very good for the values of N usually reported for studies. Thus, for N = 50, SD = 0.2915 and for N = 100, it is 0.2901.

Implementation of the software provided

Allowing entry of results from cross-sectional studies. The original version of the software (Hamling et al., 2008) considered data only from case-control and prospective studies. The updated software also allows entry of data from cross-sectional studies. In nearly all aspects the methodology for cross-sectional studies is identical to that for case-control studies, with ORs for case-control studies based on the relative frequency of cases and controls replaced by ORs for cross-sectional studies based on the relative frequency of those with and without the disease of interest. Exceptionally, when using the “Uniform Scale”, the calculation differs between case-control and cross-sectional studies. Scores are based on the distribution of exposure in the whole target population, which is best approximated by the distribution of the whole study population in cross-sectional studies and by the distribution of the controls for case-control studies. This is because controls in case-control studies are selected to have a distribution of exposure relevant to the whole target population.

The Excel implementation. The Excel spreadsheet described here is similar to that made available with our earlier paper (Hamling et al., 2008) except that it provides estimates of trend per unit dose, including trend based on the “Uniform Scale”.

The methodology for estimating trend involves matrix inversion. The spreadsheet needs to handle matrices of variable size, depending on the number of exposure levels included in analysis. In Excel this is not possible using formulae, so much of the new Excel code is written in Visual Basic. Visual Basic is also used for additional validity checks on the data entered.

The updated spreadsheet attempts to avoid problems encountered in the original software. The Excel software uses a Solver routine (a Microsoft add-in) when generating the table of pseudo-numbers. Updates to the Microsoft operating system meant that this Solver routine ceased to work for some users. The new version of the software avoids these problems by using methods to ensure that the Solver add-in relevant to the user’s operating system is available for use. This has been tested on several versions of the Microsoft operating system.

The updated Excel spreadsheet and its documentation can be downloaded from Zenodo RREst_trend.xlsm and RREst_trend.pdf respectively (Lee et al., 2019). Also available at that site are details of the testing of the spreadsheet: see RREst_Trend_Test_Files.zip, which contains Testing of RREst_Trend in R SAS and Excel.pdf (describing the testing carried out) and the .xlsm files (which provide the details entered and results of each test).

The Excel spreadsheet is provided in .xlsm format because it uses Visual Basic code. This file format can be accessed using Microsoft Excel 2007 and later versions.

Before opening the spreadsheet in Excel the user should ensure that the Solver add-in has been installed. Within Excel look for the Add-Ins option within Tools or Developer.

The use of macros needs to be enabled within Excel. As the spreadsheet is opened in Excel the user may be asked to confirm that they wish to continue opening a file containing macros.

As in the previous version, the spreadsheet provides drop-boxes for selecting categorization (by exposure levels or by categories of disease) and study type, the updated version allowing for cross-sectional studies as well as for case-control and prospective studies.

The actions to be taken by the user are:

(1) Select categorization and study type using the drop-boxes

(2) Enter the 2 × 2 table of numbers of participants. For studies categorized by exposure the rows of the table are always “unexposed” and “all exposed”, while the columns vary by study type: cases and controls for case-control studies; cases and at risk for prospective studies; and cases and non-cases for cross-sectional studies. For studies categorized by disease, the rows and columns are transposed. In this 2 × 2 table the “unexposed” is the reference category presented in the study report, while “all exposed” represents the sum of all the other categories reported. Together they represent the whole study population.

(3) Enter, for each category (“exposed” level or category of disease) its name, the OR/RR estimate and its lower and upper 95% confidence limit.

(4) Enter values in the contrast column. Rows given value 0 or 1 are included in analysis, rows given value -1 are excluded. In the estimation of overall risk, rows given value 0 constitute the baseline, rows given value 1 constitute the exposed.

(5) Enter the dose values for the trend tests. The doses should be proportional to the amount of exposure. They are not meaningful for results categorized by disease type. Dose values for excluded rows are not used in analysis.

(6) Click the Calculate button to generate the estimated numbers of subjects (the pseudo-numbers which appear in the columns to the right of the entered category details) and to produce the required results. The overall risk for the specified contrast (OR/RR and 95% CI) and the results for the heterogeneity and trend test (chi-squared and p values) are as in the earlier version of the spreadsheet. The new results are the “Trend: rate of increase in risk per unit dose” (Rate and 95% CI) giving estimates both using the dose as entered and using the “Uniform Scale”.

The data entry area and the results all appear on the left-hand side of the spreadsheet, in columns A to H. This area also provides space to enter a heading (in rows 1 and 2) and notes (in rows 54–60). Saving the spreadsheet to a relevant file name and location preserves the details of the text and data entered, and the results produced.

Columns J to AF give additional information. Rows 1 to 19 give instructions to the user and notes, while rows 20 onwards give details of the underlying calculations. These include the adjusted dose values for the doses as entered (column AA) if the dose for the reference level is not zero, the dose-values using the “Uniform Scale” estimated from the pseudo-numbers (column AB) and the adjusted version of these (column AC). Adjustment simply involves subtracting the dose value for the reference level from all the other dose values.

This spreadsheet has been tested under Microsoft operating systems WIN 7, WIN 8.1 and WIN 10 using Excel 2010 and 2013. It has also been tested on MacBook using operating system Mac OS 10-13 (High Sierra) running Excel 2016.

The R implementation. This was developed as a web application using the Shiny “Web Application Framework for R” package. The application can be accessed at https://roelee.shinyapps.io/R_RRest/. The R code is available from Zenodo as the file app.R (Lee et al., 2019). Also available at that site are the method for estimating goodness of fit (described in the file Goodness of fit tests for fitted RRs.pdf) and details of the testing of the R code: see RREst_Trend_Test_Files.zip, which contains Testing of RREst_Trend in R SAS and Excel.pdf (describing the testing carried out) and the .csv files that give the input data used and the results generated in each test.

The Shiny app can be accessed in various browsers including those available for Windows, Apple and Android operating systems. The source code is provided to allow users to inspect and possibly modify the code. The code was developed in R Version 3.5.1 (2018-07-02).

Data entry and obtaining the required statistics are very similar in the R and the Excel implementations.

In the Shiny app, tabs at the top of the screen allow the user to “enter data for study”, see the “result for specified contrast”, see the table of “pseudo-numbers” or read the “notes” on using the application.

In “enter data for study”, the user enters first the number of exposed levels, the title and the study type. The user must then enter, in the “2x2 Table” and in “RRs, Contrasts and Doses”, the information described in items 2 to 5 of The Excel implementation above. Pressing the “solve” button on the left will then generate the pseudo-numbers and the required results.

The results include all those given in the Excel implementation. Additionally, for both trend types, the trend coefficient (the logarithm of the increase in risk per unit dose) and its standard error are shown; together with a measure of the goodness-of-fit of the trend (chi-squared value, its degrees of freedom and p value).

For a prospective study, where the pseudo-numbers of cases are Ai (i = 0, 1 … k) and the pseudo-numbers at risk are Ni (i = 0, 1 … k), the goodness-of-fit test is obtained by determining fitted numbers of cases, Fi (i = 0, 1, … k) which satisfy the formulae

i=0kFi=i=0kAiandRi=(FiN0)/(F0Ni)

where Ri are the RR values fitted using the dose and the estimated beta value. These formulae can be solved directly and the goodness-of-fit chi-squared statistic is then derived in the usual way from the formula

χ2=i=0k(AiFi)2Fi

on k-1 degrees of freedom.

For a case-control study, where the pseudo-numbers of cases are Ai (i = 0, 1 .. k) and the pseudo-numbers of controls are Bi (i = 0, 1 .. k), the goodness-of-fit test is obtained by determining fitted numbers of cases, Fi (i = 0, 1, .. k), and controls, Gi (i = 0, 1, .. k), which satisfy the formulae

i=0kFi=i=0kAi,i=0kGi=i=0kBiandOi=(FiG0)/(F0Gi),

where Oi are the OR values fitted using the dose and the estimated beta value. These formulae can be solved by numerical methods (such as Newton Raphson) and the goodness-of-fit chi-squared statistic is then derived using the formula

χ2=i=0k(AiFi)2Fi+i=0k(BiGi)2Gi

on 2k-1 degrees of freedom.

Goodness-of-fit testing for a cross-sectional study is equivalent to that for a case-control study, with controls replaced by non-cases.

The R implementation also allows the user easily to load data from a previously saved .csv file and to save the study data and results as a .csv file.

The Shiny application has been tested using Mozilla Firefox under Microsoft operating systems WIN 7, WIN 8.1 and WIN 10 and also using Microsoft Edge (under WIN 10) and Internet Explorer (under WIN 8.1). It has also been tested using Safari on MacBook under operating system Mac OS 10–13 (High Sierra).

The SAS implementation. The SAS implementation is provided as the macro RREst_trend.sas and is available from Zenodo (Lee et al., 2019). The documentation of the SAS implementation is also given on that website as RREst_trend SAS.pdf. Also available at that site are the method for estimating goodness of fit (described in the file Goodness of fit tests for fitted RRs.pdf) and details of the testing of the SAS code: see RREst_Trend_Test_Files.zip, which contains Testing of RREst_Trend in R SAS and Excel.pdf (describing the testing carried out) and the file SAS_RREst_Test_Results.pdf which provides the details of each test.

Users need a licenced installation of SAS in order to use the SAS code provided.

The macro has the following parameters:

  • ds1 - Dataset 1, the name of the input dataset containing the OR/RR, Lower CI, Upper CI, contrast and dose values for each of the exposed levels.

  • ds2 - Dataset 2, the name of the input dataset containing the 2x2 table.

  • Type - Study type. Values: CC (case-control), PR (prospective), XS (cross-sectional); (default value CC).

  • levels - How the study data is categorised. Values: EX (by exposure), DI (by disease); (default value EX).

  • out - Name of the output dataset that will hold the pseudo-numbers (default _RREst_).

  • alpha - Error probability used for the confidence intervals of the data entered (default value 0.05, equivalent to 95% CI).

  • trend - Report the trend tests? Values: 1 = Yes, 0 = No (default value 0).

  • details - Output the detailed results (details of each iteration and the final P’ and Z’ values)? Values: 1 = Yes, 0 = No (default value 0).

  • grid - Step size (out of 0–1) that should be used for finding a starting point for the iterative process (default value 0.01).

  • ini_beta - Starting point for the iterative process (default value: use the “grid” parameter’s starting point).

The first two of these parameters must be specified. If they are entered as the first two parameters, the parameter names (ds1 and ds2) are assumed and so need not be entered. The other parameters are optional. They are specified using the format parameter name = value. If not specified, the default value will be used. For example,

%RREst(mydata1,mydata2);

Here the macro is called only specifying the two input datasets containing, respectively, the details of the exposure categories and the 2x2 table. All other parameters take their default values, including study type case-control with data presented by levels of exposure and giving the pseudo-numbers dataset the name _RREst. If the study is prospective, the SAS macro could be called using

%RREst(mydata1,mydata2,type=PR) ;

Dataset 1 (ds1) should contain the details of each exposure level (or disease type) that makes up the study population, including the unexposed level. The fields within the dataset should be named as follows:

  • level - Equivalent to the Category column in the Excel implementation, giving a description of the exposure category (or disease type).

  • Est - OR/RR value. Ignored for the unexposed level.

  • lower - Lower confidence limit. Ignored for the unexposed level.

  • upper - Upper confidence limit. Ignored for the unexposed level.

  • dose - Mean exposure level (trend coefficients). Equivalent to the Dose column in the Excel implementation. If not included, dose values are assumed to be 0, 1, 2, and so on.

In addition, one or more contrast fields should be entered. These are equivalent to the Contrast column in the Excel implementation. Results will be generated for each contrast specified. Exceptionally, these fields can be given any name.

Dataset 2 (ds2) is equivalent to the 2x2 table in the Excel implementation. It must contain two fields, their names depending on study type and categorisation:

  • Any study categorised by disease type - “Exposed” and “Unexposed”

  • Case-control study, by exposure - “Cases” and “Controls”

  • Prospective study, by exposure - “Cases” and “At_Risk”

  • Cross-sectional, by exposure - Cases” and “Non-cases”

In order to contain the 2x2 table values, it needs to have two data rows.

The output from the SAS macro is presented in the output window. This output includes the trend results using the “Uniform Scale” and the goodness-of-fit results as described above for the R implementation. The output is also written to output files, as described in the detailed documentation (available from Zenodo, file RREst_trend SAS.pdf) (Lee et al., 2019).

This SAS code has been tested using SAS 9.4 (64 bit) under Microsoft operating system WIN 7 and using SAS 9.4 (32 bit) under WIN 10.

Results

Data used

In this section we give examples of applying the methodology. We do this using the results presented in two papers, one a report of smoking habits and lung cancer risk in Norway (Engeland et al., 1996) and the other a report on initiation of tobacco use among adolescents in the USA (Karcher & Finn, 2005).

We used the first of these (Engeland et al., 1996) to demonstrate using the method to estimate the effect per unit dose of exposure. This paper reports several dose measures, including age of starting smoking and intensity of pipe smoking. We considered the dose measure intensity of cigarette smoking, measured as the number of cigarettes habitually smoked per day. The outcome of interest was lung cancer of any type. The paper reports a study of 8,905 men born between 1893 and 1927 who were followed up for 28 years. It provides a risk assessment with confidence interval for each of five categories of number of cigarettes smoked per day. We use these results to estimate the increase in risk associated with the consumption of one extra cigarette per day.

The second paper (Karcher & Finn, 2005) reports a cross-sectional study of 303 middle and high school students from a rural town in Midwest USA. It used a Measure of Adolescent Connectedness (MAC) instrument to assess the adolescents’ degree of caring for and involvement in specific relationships. This instrument included an assessment of parental connectedness, reported as low, medium and high connectedness. The paper reports odds ratios for experimental smoking by levels of parental connectedness. These levels of connectedness cannot be assessed as a numerical dose, but the “Uniform Scale” approach can be used.

All data input are available as underlying data (Lee et al., 2019).

Effect estimates per unit of exposure

Identifying the data to be used. In the cohort study on lung cancer risk in Norwegian men and women, Engeland et al. (1996) presented data in men on the numbers of lung cancer cases and of person-years for seven groups – never smokers (the reference group), former smokers and current smokers of, respectively, 1–4, 5–9, 10–14, 15–19 and 20+ cigarettes per day. Dividing the numbers of person-years by the number of years of follow-up (28) to give an approximate indicator of the numbers at risk, and combining the results for all the exposed groups (the last six groups), the numbers of cases were 27 in never and 306 in ever smokers, while the corresponding numbers at risk were 2,097 and 6,017. These numbers were used to populate the 2x2 table of numbers in the Excel program. Note that this table is only used as a starting point to the iterative process for estimating the pseudo-numbers so does not have to be precise.

The estimation of the pseudo-numbers also requires the adjusted RRs (95% CIs) for the seven groups, which are given in the paper as, respectively, 1.0, 1.3 (0.8 to 2.2), 1.4 (0.6 to 3.7), 4.1 (1.7 to 10.0), 7.0 (2.9 to 17.0), 11.0 (4.2 to 28.0) and 15.0 (6.1 to 37.0). For dose assessment the midpoint consumption level for each category of current smokers are also needed, which we derived using the standard distribution described by Fry & Lee (2000) as 2.5, 6.5, 10.88, 15.83 and 26.03.

As shown in Table 1, we entered the relevant data into the Excel spreadsheet. Although we subsequently excluded the result for former smokers from analysis, the whole study population was counted in the 2x2 table of numbers of participants and details of each exposure level (never smokers, former smokers and the five levels of smoking intensity) were entered because the estimation of pseudo-numbers would then be based on all the information we have about the study. For this example study we have available the numbers of participants (cases and at risk) for each of the exposure levels. Often a study will report only a summary of the numbers of participants such as the totals exposed and unexposed or the overall total numbers of participants and the proportions exposed (cases and at risk). These summary details are sufficient to allow the method to be used. The results shown indicate the following:

Table 1. Example of deriving effect estimates per unit of exposure for number of cigarettes per day (Engeland et al., 1996).

Entered data
Study typeProspective
Categorised byExposure
Number of participantsCasesAt riskc
Unexposeda272,097
All exposedb3066,017

Exposure categoryRRLower
CI
Upper
CI
ContrastdDose
Never1.0--00
Former1.30.82.2−10
Current 1–4 cigs/day1.40.63.712.5
Current 5–9 cigs/day4.11.710.016.5
Current 10–14 cigs/day7.02.917.0110.88
Current 15–19 cigs/day11.04.228.0115.83
Current 20+ cigs/day15.06.137.0126.03

Results
Pseudo-numbersExposureCasesAt risk
Never19.0200661.9306
Former61.97581659.1342
Current 1–4 cigs/day5.8414145.2091
Current 5–9 cigs/day5.755748.8557
Current 10–14 cigs/day5.239226.0478
Current 15–19 cigs/day3.734011.8138
Current 20+ cigs/day3.54718.2298

Overall risk for the specified
contrast
3.4950 (1.9517 to 6.2585)
Heterogeneityχ2 = 69.3160, p-value 0.0000
Trend (Breslow, 1980)χ2 = 66.7185, p-value 0.0000
Trend: rate of increase in risk
per unit dose
 Using the doses as enterede1.1228 (1.0879 to 1.1588)

a The reference group in the study population: the never smokers

b All non-reference participants in the study population: the sum of values for former smokers and the five groups of current smokers

c Approximate indicators – see text

d These should be 0 or 1 for the levels to be included in the analysis. The Overall risk result compares the exposed group levels (value 1) with the reference group level(s) (value 0).

e We have valid dose information for the exposure categories so results based on the “Uniform Scale” are ignored.

Table of pseudo-numbers. This table is consistent with the input RR and 95% CI values in that using the standard formulae for estimating each RR (CI) from a 2 x k table will generate the results input. It is notable that the numbers of cases in smokers are much lower than the original numbers cited in the paper. This is partly due to the effects of adjustment, but also because the original numbers given in the paper are not consistent with the varying widths of the 95% CIs, which are much narrower for former smokers than for each level of current smoking.

Overall risk for the specified contrast. This calculation is identical to that provided in the original software. Given the contrast values chosen of 0, −1, 1, 1, 1, 1 and 1 the overall risk value is an estimate of the RR in all current smokers combined (contrast value 1) relative to never smokers (contrast value 0) with former smokers excluded (contrast value -1). The contrast values are not relevant to the trend analysis, except that the selection of −1 for former smokers causes that exposure level to be excluded from analysis, so the trend analysis is based only on the data for never and current smokers.

Heterogeneity and trend (Breslow, 1980). These values are identical to those given in the original software, and confirm that the trend is very highly significant, and also that the trend explains the major part of the heterogeneity between levels.

Trend: Rate of increase in risk per unit dose. Using the dose values input, the estimated RR (95% CI) is 1.1228 (1.0879 to 1.1588). This 12.28% increase per unit dose implies that the fitted risk for the five current smoking groups is 1.34, 2.12, 3.52, 6.25 and 20.26, as compared with the input values of 1.4, 4.1, 7, 11 and 15. As discussed elsewhere (Fry et al., 2013), the shape of the dose-response relationship of lung cancer risk to amount smoked may be better fitted by alternative non-linear models.

Effect estimates using the “Uniform Scale”

In the cross-sectional study relating connectedness to experimental smoking among rural youth, Karcher & Finn (2005) reported that there were 135 experimental smokers, 43 with low parental connectedness, 54 with medium connectedness and 38 with high connectedness. As these numbers formed respectively 55%, 50% and 32% of the numbers of youths at each level of connectedness, we could estimate that the numbers who were not experimental smokers were 35 for low, 54 for medium and 81 for high parental connectedness, giving totals by level of 78, 108 and 119.

The exposure scale is qualitative so the method of calculating effect estimates using actual dose values cannot be used. However, our “Uniform Scale” methodology can be used to give a result that is comparable with those for other measures of connectedness in the same or a similar population.

This study provides the numbers of participants in each exposure level (78, 108 and 119 for low, medium and high connectedness respectively) so, as a demonstration of the method, we can calculate the “Uniform Scale” scores without using the software. Calculating N1/2, N1 + N2/2 and N1 + N2 + N3/2, gives the values of 39, 132 and 245.5. Scaling these to the range 0–1 is achieved by dividing by the total number of participants (305), giving scores on the “Uniform Scale” of 0.1279, 0.4328 and 0.8049.

The software uses the same method to derive “Uniform Scale” scores but bases them not on the actual numbers of participant by exposure level (which are often not available in a study report) but instead on the table of pseudo-numbers. Using this example we can compare the scores and the related trend results estimated using the pseudo-numbers with those based on the actual numbers of participants (above).

The authors also presented adjusted ORs (95% CIs) of 1.26 (0.69 to 2.27) for low vs medium parental connectedness and of 2.55 (1.40 to 4.66) for low vs high parental connectedness. As we wish to estimate ORs relative to low, we inverted these to get 0.7937 (0.4405 to 1.4493) for medium vs low and 0.3922 (0.2146 to 0.7143) for high vs low.

As shown in Table 2, we entered the relevant data into the Excel spreadsheet. In the Dose column we entered the “Uniform Scale” scores calculated above (based on the actual numbers of participants) so that the two sets of trend results generated both related to the “Uniform Scale” scores, one using the actual numbers of participants and the other based on the pseudo-numbers. Papers do not usually give numbers of participants in each exposure level and so “Uniform Scale” dose values based on actual numbers of participants cannot be calculated and entered. In these circumstances the results based on the Dose column values (Trend “Using the doses as entered”) should be ignored.

Table 2. Example of use of the “Uniform Scale” based on parental connectedness data (Karcher & Finn, 2005).

Entered data
Study typeCross-sectional
Categorised byExposure
Number of participantsCasesNon-cases
Unexposeda4335
All exposedb92135
Total135170

Exposure categoryORLower
CI
Upper
CI
ContrastcDosed
Low1--00.1279
Medium0.79370.44051.449310.4328
High0.39220.21460.714310.8049

Results
Pseudo-numbersExposureCasesNon-cases
Low41.237733.6214
Medium51.529752.9358
High36.914676.7467

“Uniform Scale” values based
on the pseudo-numbers
0.1278 (low)
0.4338 (medium)
0.8060 (high)
Overall risk for the specified
contrast
0.5560 (0.3274 to 0.9443)
Heterogeneityχ2 = 11.0022, p-value 0.0041
Trend (Breslow, 1980)χ2 = 10.4889, p-value 0.0012
Trend: rate of increase in risk per
unit dose
 Using the doses as enteredd0.2382 (0.0990 to 0.5730)
 Using the “Uniform Scale”
derived from the pseudo-numbers
0.2388 (0.0994 to 0.5738)

a The reference group in the study population: low parental connectedness

b All non-reference participants in the study population: medium + high parental connectedness

c These should be 0 or 1 for the levels to be included in the analysis. The Overall risk result compares the exposed group levels (value 1) with the reference group level(s) (value 0).

d In this example these have been set to the “Uniform Scale” values calculated using the actual numbers of participants in each level

The results include the table of pseudo-numbers. This table is again consistent with the input OR and 95% CI values. The numbers are slightly lower than the original numbers, due to the increase in variance following adjustment.

Another result presented is the overall risk for the specified contrast. Given the input contrast values of 0, 1 and 1 the resulting OR relates to the pairwise comparison of medium and high combined to low. The contrast values are not relevant to the trend analysis except that entering a contrast value of −1 would have caused the exposure level to be excluded from analysis, though not from the calculation of the pseudo-numbers.

The output also shows the results of the trend analysis – the rate of increase in risk per unit dose. Using the scores calculated above, based on the actual numbers, the estimated OR (95% CI) for a 1 unit difference in exposure (most exposed compared with least possible exposure) is 0.2382 (0.0990 to 0.5730). This is very similar to the estimate based on the pseudo-numbers, of 0.2388 (0.0994 to 0.5738), because the “Uniform Scale” scores based on the pseudo-numbers are very similar to those based on the actual numbers. Note that the estimated reduction in risk is larger than that for the high vs low comparison (OR 0.3922, 95% CI 0.2146 to 0.7143), as that comparison is only based on an estimated difference in exposure of about 0.68 units rather than being based on a difference of 1 unit.

Additional examples of using the “Uniform Scale”

In addition to the example above, it is useful to give some other examples where the nature of the information presented needs particular consideration in order to provide effect estimates in the “Uniform Scale”.

One example is the study by Lloyd-Richardson et al. (2002), which reports results from a cross-sectional study based on analyses of 19 818 adolescents, of whom 10 924 had at least experimented with smoking. The authors present an OR (CI) of 1.16 (1.03–1.30) for low school connectedness, assessed using an eight item scale, the number of levels per item not being given. The authors note that, in coding the data, the variable was standardized by “subtracting off the median and dividing them by the distance between the median and the third quartile”. As the median corresponds to a mean score of 0.5 and the third quartile to a mean score of 0.625 (midway between 0.5 and 0.75) this suggests the difference in mean score is 0.125, so that the OR (CI) should be raised to the eighth power (1/0.125 = 8) to give the required result of 3.28 (1.27–8.16).

Another example is the study by Simons-Morton & Haynie (2003) in which 973 students completed surveys at the beginning and end of the sixth grade. Their Table 4 gives an OR for high v low social competence of 0.71 (0.52–0.98). The authors note that social competence is measured using eight items, with a mean of 22.37 (SD 5.39). It is unclear how many levels there are for each item, though possibly four given the mean. Nor is it clear whether high v low is just a simple breakdown of the population into two groups, in which case the OR (CI) should be squared, as noted above, to give 0.50 (0.27–0.96). Alternatively, if high is the highest quartile, and low the lowest quartile, one is comparing groups with mean scores of 0.875 and 0.125, a difference of 0.75 so that the OR (CIs) should be raised to the power of 1/0.75 = 4/3, giving a result of 0.63 (0.42–0.97). More information would be needed before this study could be included in any meta-analysis.

Finally we consider the study by Kandel et al. (2004) who compared smoking onset factors in 5,374 adolescents who had never smoked based on data from the National Longitudinal Study. They report ORs of 0.59 (0.45–0.77) for positive scholastic attitudes and 0.93 (0.75–1.16) for parent-child connectedness. For positive scholastic attitudes they averaged four 5-point items, so presumably the OR related to a 1 point difference in a scale that can vary by 4 points. This indicates that one should power up the 0.59 (0.45–0.77) by 5 to give 0.07 (0.02–0.27). For parent connectedness the authors refer to a 13 item scale, but state neither the number of points in each scale nor whether the values for individual items were summed or averaged. The authors refer to Resnick et al. (1997) as using this 13-item scale, that paper stating that this variable was standardized “to a mean of 0 and an SD of 1”, so that “parameter estimates can be interpreted as standardized B”. Assuming that the paper considered here did the same, and noting that, as derived earlier, the standard deviation of our “Uniform Scale” is approximated by 1/√12 = 0.2887 for large N, one needs to power the OR up by √12 to give the units we require. This gives an OR for parent-child connectedness of 0.78 (0.37–1.67).

Note that in all three of these last examples, the new software is not required, the user only having to raise reported effect estimates to the appropriate power to obtain the required estimate, corresponding to the difference in risk between the most and least possible exposures.

Note also, that when conducting meta-analyses it is important to be sure that the effect estimates are calculated in the same direction. If some give results for high v low and some for low v high, it will be necessary to invert the estimates as appropriate to ensure combinability.

Comparing results using a known dose scale and using the “Uniform Scale”

It is of interest to see what results would have been obtained in the example shown in Table 1 had we not had estimates of the dose for never smokers and for current smokers by amount smoked, but instead used the “Uniform Scale” methodology to estimate doses for each level based on the distribution of the at risk population. Here the OR (CI) is estimated as 29.5493 (11.0943 to 78.7039) as compared with the original trend estimate of 1.1228 (1.0879 to 1.1588). The OR of 29.5493 is equal to 1.1228 raised to the 29.23th power. As the original estimate is per cigarette per day, this would imply that the “Uniform Scale” relates to a full range of 29.23 cigarettes per day. While no doubt some smokers smoke more cigarettes per day, this result seems not to be implausible given the original calculation was based on dose means, with that for the heaviest smoking group estimated as 26.03 cigarettes per day.

Discussion/conclusions

Since we published our original paper in 2008 (Hamling et al., 2008), we have carried out numerous meta-analyses making extensive use of the software (e.g. (Forey et al., 2011; Lee et al., 2012; Lee et al., 2017; Lee et al., 2016a; Lee et al., 2016b; Lee & Hamling, 2009; Lee & Hamling, 2016)). We have also carried out dose-response meta-analyses (Fry et al., 2013), though we have only recently updated the Excel software. While the methodology has proved useful to us and to other researchers in situations where the raw data for a study are not available, there are some difficulties in applying it. These include presentation of the original data to insufficient accuracy, not having available the required 2x2 table (e.g. cases/controls x unexposed/exposed for a case-control study) and so having to use approximate estimates; and sometimes finding that the estimation of the pseudo-numbers does not converge to a solution, even after using different starting points for the iteration process. Nevertheless, the method gave what appeared plausible estimates in many practical applications.

Extending the software to estimate increases in risk per unit of exposure does bring a few additional problems. One is the difficulty in obtaining an estimate of the midpoint exposure for data grouped by ranges of exposure, especially when the highest range is open-ended. Another is that the trend estimation assumes an underlying dose-response shape that may not necessarily apply. Nevertheless, the extensions to the software to include trend estimation does provide the meta-analyst with a useful tool for combining dose-response results from different studies that use varying ways of grouping dose.

The extension of the software to use the “Uniform Scale” also allows the meta-analyst to attempt combination of results from studies using qualitative rather than quantitative estimates of exposure, possibly derived using a variety of measures of some underlying exposure. By attempting to derive an effect estimate for the range from the most exposed possible (with score 1) to the least exposed possible (with score 0), the meta-analyst is given a way of combining results on a consistent basis. Clearly, the underlying assumption - that the study population is made up of individuals with successively increasing exposure by an equal amount - is dubious, but we nevertheless feel that the method is useful. Other approaches, but with a different specification of the distribution of exposure, may well be possible, but we have not investigated these so far.

Data availability

Underlying data

Zenodo: Software for use in meta-analysis, providing Effect estimates per unit of exposure, including the Uniform scale. http://doi.org/10.5281/zenodo.3582481 (Lee et al., 2019)

This project contains the following underlying data:

  • RREst_Trend_Test_Files.zip (various files describing the testing carried out, including .xlsm files used in testing the Excel version and .pdf files describing the R and SAS testing: the Methods section above names the files relevant to the testing of each implementation).

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

Archived source code at time of publication: http://doi.org/10.5281/zenodo.3582481 (Lee et al., 2019)

License: Creative Commons Attribution 4.0 International license (CC-BY 4.0)

This contains the following files:

  • app.R (the code for the R implementation)

  • RREst_trend.SAS (the SAS implementation)

  • RREst_trend.xlsm (the Excel implementation)

  • RREst_trend.pdf (the Excel implementation documentation)

  • RREst_trend SAS.pdf (the SAS implementation documentation)

  • Goodness of fit tests for fitted RRs.pdf (a document describing how the goodness-of-fit statistics are calculated)

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 22 Jan 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Lee PN, Hamling J, Fry JS et al. Using the “Uniform Scale” to facilitate meta-analysis where exposure variables are qualitative and vary between studies – methodology, examples and software [version 1; peer review: 1 approved with reservations]. F1000Research 2020, 9:33 (https://doi.org/10.12688/f1000research.21900.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Peer review discontinued

At the request of the author(s), this article is no longer under peer review. What does this mean?
Version 1
VERSION 1
PUBLISHED 22 Jan 2020
Views
30
Cite
Reviewer Report 11 Feb 2020
Chang Xu, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, China 
Approved with Reservations
VIEWS 30
Lee et al. have presented an interesting and important work for estimating the missing data in dose-response meta-analysis. It is so appreciated for these authors for their contribution in this area. I have some further comments that hope will be ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Xu C. Reviewer Report For: Using the “Uniform Scale” to facilitate meta-analysis where exposure variables are qualitative and vary between studies – methodology, examples and software [version 1; peer review: 1 approved with reservations]. F1000Research 2020, 9:33 (https://doi.org/10.5256/f1000research.24147.r58931)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 04 Apr 2024
    Peter Lee, Director, P.N. Lee Statistics and Computing Ltd, Sutton, SM2 5DA, UK
    04 Apr 2024
    Author Response
    I thank the reviewer for his kind comments.  However have decided not to follow his specific suggestions for amending the paper.
    Competing Interests: None
COMMENTS ON THIS REPORT
  • Author Response 04 Apr 2024
    Peter Lee, Director, P.N. Lee Statistics and Computing Ltd, Sutton, SM2 5DA, UK
    04 Apr 2024
    Author Response
    I thank the reviewer for his kind comments.  However have decided not to follow his specific suggestions for amending the paper.
    Competing Interests: None

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 22 Jan 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.