Grading Scale
Evidence Table Column Explanation
One Two Three Four Five Six Seven Eight Nine Ten
Condition Study Design Author / Year N Statistically Significant? Quality of study
0-2=poor
3-4=good
5=excellent
Magnitude of Benefit Absolute Risk Reduction Number Needed to Treat Comments
Condition

Refers to the medical condition or disease targeted by a therapy.

Study Design

Common types include:

  • Randomized controlled trial (RCT): An experimental trial in which participants are assigned randomly to receive either an intervention being tested or placebo. Note that Natural Standard defines RCTs as being placebo-controlled, while studies using active controls are classified as equivalence trials (see below). In RCTs, participants and researchers are often blinded (i.e., unaware of group assignments), although unblinded and quasi-blinded RCTs are also often performed. True random allocation to trial arms, proper blinding, and sufficient sample size are the basis for an adequate RCT.
  • Equivalence trial: An RCT which compares two active agents. Equivalence trials often compare new treatments to usual (standard) care, and may not include a placebo arm.
  • Before and after comparison: A study that reports only the change in outcome in each group of a study, and does not report between-group comparisons. This is a common error in studies that claim to be RCTs.
  • Case series: A description of a group of patients with a condition, treatment, or outcome (e.g., 20 patients with migraine headache underwent acupuncture and 17 reported feeling better afterwards). Case series are considered weak evidence of efficacy.
  • Case-control study: A study in which patients with a certain outcome are selected and compared to similar patients (without the outcome) to see if certain risk factors/predictors are more common in patients with that outcome. This study design is not common in the complementary & alternative medicine literature.
  • Cohort study: A study which assembles a group of patients with certain baseline characteristics (for example, use of a drug), and follows them forward in time for outcomes. This study design is not common in the complementary & alternative medicine literature.
  • Meta-analysis: A pooling of multiple trials to increase statistical power (often used to pool data from a number of RCTs with small sample sizes, none which demonstrates significance alone but in aggregate can achieve significance). Multiple difficulties are encountered when designing/reviewing these analyses; in particular, outcomes measures or therapies may differ from study to study, hindering direct comparison.
  • Review: An author’s description of his or her opinion based on personal, non-systematic review of the evidence.
  • Systematic review: A review conducted according to pre-specified criteria in an attempt to limit bias from the investigators. Systematic reviews often include a meta-analysis of data from the included studies.
  • P: Pending verification.
Author, Year

Identifies the study being described in a row of the table.

N

The total number of subjects included in a study (treatment group plus placebo group). Some studies recruit a larger number of subjects initially, but do not use them all because they do not meet the study’s entry criteria. In this case, it is the second, smaller number that qualifies as N. N includes all subjects that are part of a study at the start date, even if they drop out, are lost to follow-up, or are deemed unsuitable for analysis by the authors. Trials with a large number of drop-outs that are not included in the analysis are considered to be weaker evidence for efficacy. (For systematic reviews the number of studies included is reported. For meta-analyses, the number of total subjects included in the analysis or the number of studies may be reported.) P = pending verification.

Statistically Significant?

Results are noted as being statistically significant if a study’s authors report statistical significance, or if quantitative evidence of significance is present (such as p values). P = pending verification.

Quality of Study

A numerical score between 0-5 is assigned as a rough measure of study design/reporting quality (0 being weakest and 5 being strongest). This number is based on a well-established, validated scale developed by Jadad et al. (Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clinical Trials 1996;17[1]:1-12). This calculation does not account for all study elements that may be used to assess quality (other aspects of study design/reporting are addressed in the "Evidence Discussion" sections of monographs).

  • A Jadad score is calculated using the seven items in the table below. The first five items are indications of good quality, and each counts as one point towards an overall quality score. The final two items indicate poor quality, and a point is subtracted for each if its criteria are met. The range of possible scores is 0 to 5. (P = Pending Verification)
Jadad Score Calculation  
Item Score
Was the study described as randomized (this includes words such as randomly, random, and randomization)? 0/1
Was the method used to generate the sequence of randomization described and appropriate (table of random numbers, computer-generated, etc)? 0/1
Was the study described as double blind? 0/1
Was the method of double blinding described and appropriate (identical placebo, active placebo, dummy, etc)? 0/1
Was there a description of withdrawals and dropouts? 0/1
Deduct one point if the method used to generate the sequence of randomization was described and it was inappropriate (patients were allocated alternately, or according to date of birth, hospital number, etc). 0/-1
Deduct one point if the study was described as double blind but the method of blinding was inappropriate (e.g., comparison of tablet vs. injection with no double dummy). 0/-1
Magnitude of Benefit

This summarizes how strong a benefit is: small, medium, large, or none. If results are not statistically significant "NA" for "not applicable" is entered. In order to be consistent in defining small, medium, and large benefits across different studies and monographs, Natural Standard defines the magnitude of benefit in terms of the standard deviation (SD) of the outcome measure. Specifically, the benefit is considered (P = Pending Verification):

  • Large: if >1 SD
  • Medium: if 0.5 to 0.9 SD
  • Small: if 0.2 to 0.4 SD

In many cases, studies do not report the standard deviation of change of the outcome measure. However, the change in the standard deviation of the outcome measure (also known as effect size) can be calculated, and is derived by subtracting the mean (or mean difference) in the placebo/control group from the mean (or mean difference) in the treatment group, and dividing that quantity by the pooled standard deviation (Effect size=[Mean Treatment - Mean Placebo]/SDp).

Absolute Risk Reduction

This describes the difference between the percent of people in the control/placebo group experiencing a specific outcome (control event rate), and the percent of people in the experimental/therapy group experiencing that same outcome (experimental event rate). Mathematically, Absolute risk reduction (ARR) equals experimental event rate minus control event rate. ARR is better able to discriminate between large and small treatment effects than relative risk reduction (RRR), a calculation that is often cited in studies ([control event rate – experimental event rate]/control event rate). Many studies do not include adequate data to calculate the ARR, in which cases "NA" is entered into this column. (P = Pending Verification)

Number Needed to Treat

This is the number of patients who would need to use the therapy under investigation, for the period of time described in the study, in order for one person to experience the specified benefit. It is calculated by dividing the Absolute Risk Reduction into 1 (1/ARR). (P = Pending Verification)

Comments

When appropriate, this brief section may comment on design flaws (inadequately described subjects, lack of blinding, brief follow-up, not intention-to treat, etc.), notable study design elements (crossover, etc.), dosing, and/or specifics of study group/sub-groups (age, gender, etc). More detailed description of studies is found in the "Evidence Discussion" section that follows the "Evidence Table" in Natural Standard monographs.