Stepped-wedge trial

In medicine, a stepped-wedge trial is a type of randomised controlled trial. An RCT is a scientific experiment that is designed to reduce bias when testing a new medical treatment, a social intervention, or another testable hypothesis.
In a traditional RCT, the researcher randomly divides the experiment participants into two groups at the same time:

One group receives the treatment
The other group does not get the treatment.

In a SWT, a logistic constraint typically prevents the simultaneous treatment of some participants, and instead, all or most participants receive the treatment in waves or "steps".
For instance, a researcher wants to measure whether teaching college students how to make several meals increased their propensity to cook at home instead of eating out.

In a traditional RCT, a sample of students would be selected and some would be trained on how to cook these meals, whereas the others would not. Both groups would be monitored to see how frequently they ate out. In the end, the number of times the treatment group ate out would be compared to the number of times the control group ate out, most likely with a t-test or some variant.
If, however, the researcher could only train a limited number of students each week, then the researcher could employ an SWT, randomly assigning students to which week they would be trained.

The term "stepped wedge" was coined by The Gambia Hepatitis Intervention Study due to the stepped-wedge shape that is apparent from a schematic illustration of the design. The crossover is in one direction, typically from control to intervention, with the intervention not removed once implemented. The stepped-wedge design can be used for individually randomized trials, i.e., trials where each individual is treated sequentially, but is more commonly used as a cluster randomized trial (CRT).

Experiment design

The stepped-wedge design involves the collection of observations during a baseline period in which no clusters are exposed to the intervention. Following this, at regular intervals, or steps, a cluster is randomized to receive the intervention and all participants are once again measured. This process continues until all clusters have received the intervention. Finally, one more measurement is made after all clusters have received the intervention.

Appropriateness

Hargreaves and colleagues offer a series of five questions that researchers should answer to decide whether SWT is indeed the optimal design, and how to proceed in every step of the study. Specifically, researchers should be able to identify:
;The reasons SWT is the preferred design:If measuring a treatment effect is the primary goal of research, SWT may not be the optimal design. SWTs are appropriate when the research focus is on the effectiveness of the treatment rather than on its mere existence. Overall, if the study is pragmatic, logistical and other practical concerns are considered to be the best reasons to turn to a stepped wedge design. Also, if the treatment is expected to be beneficial, and it would not be ethical to deny it to some participants, then SWT allows all participants to have the treatment while still allowing a comparison with a control group. By the end of the study, all participants will have the opportunity to try the treatment. Note there may still be ethical issues raised by delaying access to the treatment for some participants.
;Which SWT design is more suitable:SWTs can feature three main designs employing a closed cohort, an open cohort, and a continuous recruitment with short exposure. :In the closed cohort, all subjects participate in the experiment from beginning to end. All the outcomes are measured repeatedly at fixed time points which may or may not be related to each step.
;Which analysis strategy is appropriate :Linear Mixed Models, Generalized Linear Mixed Models, and Generalized Estimating Equations are the principal estimators recommended for analyzing the results. While LMM offers higher power than GLMM and GEE, it can be inefficient if the size of clusters vary, and the response is not continuous and normally distributed. If any of those assumptions are violated, GLMM and GEE are preferred.
;How big the sample should be: Power analysis and sample size calculation are available. Generally, SWTs require smaller sample size to detect effects since they leverage both between and within-cluster comparisons.
;Best practices for reporting the design and results of the trial :Reporting the design, sample profile, and results can be challenging, since no Consolidated Standards Of Reporting Trials (CONSORT) have been designated for SWTs. However, some studies have provided both formalizations and flow charts that help reporting results, and sustaining a balanced sample across the waves.

Model

While there are several other potential methods for modeling outcomes in an SWT, the work of Hussey and Hughes "first described methods to determine statistical power available when using a stepped wedge design." What follows is their design.
Suppose there are samples divided into clusters. At each time point, preferably equally spaced in actual time, some number of clusters are treated. Let be if cluster has been treated at time and otherwise. In particular, note that if then.
For each participant in cluster, measure the outcome to be studied at time. Note that the notation allows for clustering by including in the subscript of,,, and. We model these outcomes as: where:

is a grand mean,
is a random, cluster-level effect on the outcome,
is a time point-specific fixed effect,
is the measured effect of the treatment, and
is the residual noise.

This model can be viewed as a hierarchical linear model where at the lowest level where is the mean of a given cluster at a given time, and at the cluster level, each cluster mean.

Estimate of variance

The design effect of a stepped wedge design is given by the formula:
where:ρ is the intra-cluster correlation (ICC),n is the number of subjects within a cluster,k is the number of steps,t is the number of measurements after each step, andb is the number of baseline measurements.
To calculate the sample size it is needed to apply the simple formula:
where:N_sw is the required sample size for the SWTN_u is the total unadjusted sample size that would be required for a traditional RCT.
Note that increasing either k, t, or b will result to decreasing the required sample size for an SWT.
Further, the required cluster c size is given by:
To calculate how many clusters c_s need to switch from the control to the treatment condition, the following formula is available:
If c and c_s are not integers, they need to be rounded to the next larger integer and distributed as evenly as possible among ''k.''

Advantages

Stepped wedge design features many comparative advantages to traditional RCTs.

First, SWTs are most appropriate both ethically and practically when the intervention is expected to produce a positive outcome. Since all subjects will eventually receive the benefits of the intervention, ethical concerns can be appeased, and the recruitment of participants may become easier.
Secondly, SWTs "can reconcile the need for robust evaluations with political or logistical constraints." Specifically, it can be used to measure the effects of treatment when resources for performing an intervention are scarce.
Thirdly, since each cluster receives both the control and the treatment condition by the end of the trial, both between and within-cluster comparisons are possible. This way statistical power increases while keeping the sample significantly smaller than it would be needed in a traditional RCT.
Fourth, a design effect has been established, which has shown that the stepped wedge CRT could reduce the number of patients required in the trial compared to other designs.
Finally, because each cluster switches randomly from control to treatment condition in different time points, it is possible to examine time effects. For example, it is possible to study how repeated or long-term exposure to experimental stimuli affects the efficiency of the treatment. Repeated measurements in regular time frames can average the noise out, which in turn increases the precision of estimates. This advantage becomes most apparent when measurement is noisy, and outcome autocorrelation is low.

Disadvantages

SWT may suffer from certain drawbacks.

First, since in SWTs the study period lasts longer and all the subjects eventually receive the treatment, costs may increase significantly. Because the design can be expensive, SWTs may not be the optimal solution when measurement precision and outcome autocorrelation are high. Moreover, since everyone is eventually treated, SWTs do not facilitate downstream analysis.
Secondly, in an SWT, more clusters are exposed to the intervention at later than earlier time periods. As such, it is possible that an underlying temporal trend may confound the intervention effect, and so the confounding effect of time must be accounted for in both pre-trial power calculations and post-trial analysis. Specifically, in post-trial analysis, the use of generalized linear mixed models or generalized estimating equations is recommended.
Finally, the design and analysis of stepped-wedge trials is therefore more complex than for other types of randomized trials. Previous systematic reviews highlighted the poor reporting of sample size calculations and a lack of consistency in the analysis of such trials. Hussey and Hughes were the first authors to suggest a structure and formula for estimating power in stepped-wedge studies in which data was collected at each and every step. This has now been expanded for designs in which observations are not made at each step as well as multiple layers of clustering.

Ongoing work

The number of studies using the design have been on the increase. In 2015, a thematic series was published in the journal Trials. In 2016, the first international conference dedicated to the topic was held at the University of York.