Chapter 7 Improving Precision and Power: Blocked Designs

7.1 Introduction

In our discussion of the vendor examples in Section 3.3, we observed huge gains in power and precision when allocating each vendor to one of two samples per mouse, contrasting the resulting enzyme levels for each pair of samples individually, and then averaging the resulting differences.

This design is an example of blocking, where we organize the experimental units into groups (or blocks), such that units within the same group are more similar than those in different groups. For \(k\) treatments, a block size of \(k\) experimental units per group allows us to randomly allocate each treatment to one unit per block and we can estimate any treatment contrast within each block and then average these estimates over the blocks. Hence, this strategy removes any differences between blocks from the contrast estimates and tests, resulting in lower residual variance and increased precision and power without increases in sample size. As R. A. Fisher noted:

Uniformity is only requisite between objects whose response is to be contrasted (that is, objects treated differently). (Fisher 1971, 33)

To be effective, blocking requires that we find some property by which we can group our experimental units such that variances within each group are smaller than between groups. This property can be intrinsic, such as animal body weight, litter, or sex, or non-specific to the experimental units, such as day of measurement in a longer experiment, batch of some necessary chemical, or the machine used for incubating samples or recording measurements.

A simple yet powerful design is the randomized complete block design (RCBD), where each block has as many units as there are treatments, and we randomly assign each treatment to one unit in each block. We can extend it to a generalized randomized complete block design (GRCBD) by using more than one replicate of each treatment per block. If the block size is smaller than the number of treatments, a balanced incomplete block design (BIBD) still allows treatment allocations balanced over blocks such that all pair contrasts are estimated with the same precision.

Several blocking factors can be combined in a design by nesting—allowing estimation of each blocking factor’s contribution to variance reduction—or crossing—allowing simultaneous removal of several independent sources of variation. Two classic designs with crossed blocks are latin squares and Youden squares.

7.2 The Randomized Complete Block Design

7.2.1 Experiment and Data

We continue with our example of how three drug treatments in combination with two diets affect enzyme levels in mice. To keep things simple, we only consider the low fat diet for the moment, so the treatment structure only contains Drug with three levels. Our aim is to improve the precision of contrast estimates and increase the power of the omnibus \(F\)-test. To this end, we arrange (or block) mice into groups of three and randomize the drugs separately within each group, such that each drug occurs once per group. Ideally, the variance between animals in the same group is much smaller than between animals in different groups. A common choice for blocking mice is by litter, since sibling mice often show more similar responses as compared to mice from different litters (Perrin 2014). Litter sizes are typically in the range of 5–7 animals in mice (Watt 1934), which would easily allow us to select three mice from each litter for our experiment.

Our experiment is illustrated in Figure 7.1A: we use \(b=8\) litters of \(n=3\) mice each, resulting in an experiment size of \(N=bn=24\) mice. In contrast to a completely randomized design, our randomization is restricted since we require that each treatment occurs exactly once per litter, and we randomize drugs independently for each litter.

The data are shown in Figure 7.1B, where we connect the three observed enzyme levels in each block by a line, akin to an interaction plot. The vertical dispersion of the lines indicates that enzyme levels within each litter are systematically different from those in other litters. The lines are roughly parallel, which shows that all three drug treatments are affected equally by these systematic differences, there is no litter-by-drug interaction, and treatment contrasts are unaffected by systematic differences between litters.

Comparing enzyme levels for placebo, D1, and D2 under low fat diet using randomized complete block design. A: Data layout with eight litters of size three, treatments independently randomized in each block. B: Observed enzyme levels for each drug; lines connect responses of mice from the same litter.

Figure 7.1: Comparing enzyme levels for placebo, D1, and D2 under low fat diet using randomized complete block design. A: Data layout with eight litters of size three, treatments independently randomized in each block. B: Observed enzyme levels for each drug; lines connect responses of mice from the same litter.

7.2.2 Model and Hasse Diagram

Hasse Diagram

Deriving the experiment structure diagram in Figure 7.2C poses no new challenges: the treatment structure contains the single treatment factor Drug with three levels while the unit structure contains the blocking factor (Litter), which groups the units from (Mouse). We consider the specific eight litters in our experiment as a random sample, which makes both unit factors random. The factors Drug and (Litter) are crossed and their combinations are levels of the interaction factor (Litter:Drug); since (Litter) is random, so is the interaction. The randomization allocates levels of Drug on (Mouse), and the experiment structure is similar to a two-way factorial design, but (Litter) originates from the unit structure, is random, and is not randomly allocated to (Mouse), but rather groups levels of (Mouse) by an intrinsic property of each mouse.

Randomized complete block design for determining effect of two different drugs and placebo using eight mice per drug and a single measurement per mouse. Each drug occurs once per block and assignment to mice is randomized independently for each block.

Figure 7.2: Randomized complete block design for determining effect of two different drugs and placebo using eight mice per drug and a single measurement per mouse. Each drug occurs once per block and assignment to mice is randomized independently for each block.

The Block-by-Treatment Interaction

Each treatment occurs only once per block, and the variations due to interactions and residuals are completely confounded. This design is called a randomized complete block design (RCBD). If we want to analyze data from an RCBD, we need to assume that the block-by-treatment interaction is negligible. We can then merge the interaction and residual factors and use the sum of their variation for estimating the residual variance (Fig. 7.2D).

An appreciable block-by-treatment interaction means that the differences in enzyme levels between drugs depend on the litter. Treatment contrasts are then litter-specific and this systematic heterogeneity of treatment effects complicates the analysis and precludes a straightforward interpretation of the results. Such interaction is likely caused by other biologically relevant factors that influence the effect of (some) treatments but that have not been accounted for in our experiment and analysis.

We cannot test the interaction factor and therefore require a non-statistical argument to justify ignoring the interaction. Since we have full control over which property we use for blocking the experimental units, we can often employ subject-matter knowledge to exclude interactions between our chosen blocking factor and the treatment factor. In our particular case, for example, it seems unlikely that the litter affects drugs differently, which justifies treating the litter-by-drug interaction as negligible.

Linear Model

Because the blocking factor levels are random, so are the cell means \(\mu_{ij}\). Moreover, there is a single observation per cell and each cell mean is confounded with the residual for that cell; a two-factor cell means model is therefore not useful for an RCBD.

The parametric model (including block-by-treatment interaction, Fig. 7.2C) is \[ y_{ij} = \mu + \alpha_i + b_j + (\alpha b)_{ij} + e_{ij}\;, \] where \(\mu\) is the grand mean, \(\alpha_i\) the deviation of the average response for drug \(i\) compared to the grand mean, and \(b_j\) is the random effect of block \(j\). We assume that the residuals and block effects are normally distributed as \(e_{ij}\sim N(0,\sigma_e^2)\) and \(b_j\sim N(0,\sigma_b^2)\), and are all mutually independent. The interaction of drug and block effects is a random effect with distribution \((\alpha b)_{ij}\sim N(0,\sigma^2_{\alpha b})\).

For an RCBD, the interaction is completely confounded with the residuals; its parameters \((\alpha b)_{ij}\) cannot be estimated from the experiment without replicating treatments within each block. If interactions are negligible, then \((\alpha b)_{ij}=0\) and we arrive at the additive model (Fig. 7.2D) \[\begin{equation} y_{ij} = \mu + \alpha_i + b_j + e_{ij} \tag{7.1}\;. \end{equation}\] The average response to treatment level \(i\) is then \(\mu_i=\mu+\alpha_i\); its unbiased estimator is \[ \hat{\mu}_i = \frac{1}{b}\sum_{j=1}^b y_{ij}\;, \] with variance \(\text{Var}(\hat{\mu}_i)=(\sigma_b^2+\sigma_e^2)/b\). We then base our contrast analysis on these estimated marginal means, effectively using a cell means model for the treatment factor after correcting for the block effects.

Analysis of Variance

The total sum of squares for the RCBD with an additive model decomposes into sums of squares for each factor involved, such that \[ \text{SS}_\text{tot} = \text{SS}_\text{trt} + \text{SS}_\text{block} + \text{SS}_\text{res}\;, \] and the degrees of freedom decompose accordingly.

The treatment factor is tested using the \(F\)-statistic \[ F = \frac{\text{SS}_\text{trt}/\text{df}_\text{trt}}{\text{SS}_\text{res}/\text{df}_\text{res}}\;, \] and mean squares are formed by dividing each sum of squares by its corresponding degrees of freedom.

For \(k\) treatment factor levels and \(b\) blocks, the test statistic has a \(F_{k-1,\, (b-1)(k-1)}\)-distribution under the omnibus null-hypothesis \(H_0:\mu_1=\mu_2=\cdots =\mu_k\) that all treatment group means are equal. Note that without blocking, the denominator sum of squares would be \(\text{SS}_\text{block} + \text{SS}_\text{res}\) with corresponding loss of power.

We derive the model specification from the Hasse diagrams in Figure 7.2: since fixed and random factors are exclusive to the treatment respectively unit structure, the corresponding terms are 1+drug and Error(litter/mouse); we reduce these terms to drug and Error(litter) and the model is fully specified as y~drug+Error(litter). The analysis of variance table then contains two error strata: one for the block effect and one for the within-block residuals.

We randomized the treatment on the mice within litters, and the within-block variance \(\sigma_e^2\) therefore provides the mean squares for the \(F\)-test denominator; this agrees with the fact that (Mouse) is the closest random factor below Drug in the experiment structure diagram (Fig. 7.2D). The between-block error stratum contains no further factors or tests, since there are no factors randomized on (Litter) in this design. The ANOVA table is then

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Error stratum: Litter
Residuals	7	37.08	5.3
Error stratum: Within
drug	2	74.78	37.39	84.55	1.53e-08
Residuals	14	6.19	0.44

The omnibus \(F\)-test for the treatment factor provides clear evidence that the drugs affect the enzyme levels differently and the differences in average enzyme levels between drugs is about 85 times larger than the residual variance.

The effect size \(\eta^2_\text{drug}=\) 63% is large, but measures the variation explained by the drug effects compared to the overall variation, including the variation removed by blocking. It is more meaningful to compare the drug effect to the within-block residuals alone using the partial effect size \(\eta_\text{p, drug}^2=\) 92%, since the litter-to-litter variation is removed by blocking and has no bearing on the precision of the treatment effect estimates. The large effect sizes confirm that the overwhelming fraction of the variation of enzyme levels in each litter is due to the drugs acting differently.

In contrast to a two-way ANOVA with factorial treatment structure, we cannot simplify the analysis to a one-way ANOVA with a single treatment factor with \(b\cdot k\) levels. This is because the blocking factor is random, and the resulting one-way factor would also be a random factor. The omnibus \(F\)-test for this factor is difficult to interpret, and a contrast analysis would be a futile exercise, since we would compare randomly sampled factor levels among each other.

Linear Mixed Model

In contrast to our previous designs, the analysis of an RCBD requires two variance components in our model: the residual within-block variance \(\sigma_e^2\) and the between-block variance \(\sigma_b^2\). The classical analysis of variance handles these by using one error stratum per variance component, but this makes subsequent analyses more tedious. For example, estimates of the two variance components are of direct interest, but while the estimate \(\hat{\sigma}_e^2=\text{MS}_{\text{res}}\) is directly available from the ANOVA result, an estimate for the between-block variance is \[ \hat{\sigma}_b^2 = \frac{\text{MS}_{\text{block}}-\text{MS}_{\text{res}}}{n}\;, \] and requires manual calculation. For our example, we find \(\hat{\sigma}_e^2=\) 0.44 and \(\hat{\sigma}_b^2=\) 1.62 based on the ANOVA mean squares.

An attractive alternative is the linear mixed model, which explicitly considers the different random factors for estimating variance components and parameters of the linear model in Equation (7.1). Linear mixed models offer a very general and powerful extension to linear regression and analysis of variance, but their general theory and estimation are beyond the scope of this book. For our purposes, we only need a small fraction of their possibilities and we use the lmer() function from package lme4 for all our calculations. In specifying a linear mixed model, we use terms of the form (1|X) to introduce a random offset for each level of the factor X; this construct replaces the Error()-term from aov(). The fixed effect part of the model specification remains unaltered. For our example, the model specification is then y~drug+(1|litter), which asks for a fixed effect (\(\alpha_i\)) for each level of Drug, and allows a random offset (\(b_j\)) for each litter.

The Fixed effects section of the result gives the parameter estimates for \(\alpha_i\), but they are of secondary interest and depend on the coding (cf. Section 4.7).

	Estimate	Std. Error	df	t value	Pr(>\|t\|)
(Intercept)	10.01	0.51	9.4	19.73	5.73e-09
drugD1	4.23	0.33	14.0	12.71	4.46e-09
drugD2	2.91	0.33	14.0	8.74	4.83e-07

Note that the degrees of freedom for estimating the intercept (which here corresponds to the average enzyme level in the placebo group) is no longer an integer, because block effects and residuals are used with different weights in its estimation. The Random effects section of the result provides the variance and standard deviation for each variance component:

Group	Name	Variance	Std. Dev.
litter	(Intercept)	1.62	1.27
Residual		0.44	0.67

These correspond to the two variance components \(\hat{\sigma}_b^2=\) 1.62 (litter) and \(\hat{\sigma}_e^2=\) 0.44 (Residual), in agreement with the ANOVA results.

We calculate our familiar ANOVA table from an estimated linear mixed model m using anova(m), which yields

	Sum Sq	Mean Sq	NumDF	DenDF	F value	Pr(>F)
drug	74.78	37.39	2	14	84.55	1.53e-08

Linear mixed models and ‘traditional’ analysis of variance use the same linear model to analyze data from a given design. Their main difference is the way they handle models with multiple variance components. Analysis of variance relies on the crude concept of error strata, which makes direct estimation of variance more cumbersome and leads to loss in efficiency if information on effects is distributed between error strata (such as in incomplete block designs, discussed in Section 7.3). Linear mixed models use different techniques for estimation of the model’s parameters that make use of all available information. Variance estimates are directly available, and linear mixed models do not suffer from problems with unbalanced group sizes. Colloquially, we can say that the analysis of variance approach provides a convenient framework to phrase design questions, while the linear mixed model provides a more general tool for the subsequent analysis.

7.2.3 Contrasts

In a blocked design with random block factor, the only useful contrasts are between levels of (fixed) treatment factors, since a contrast involving levels of the blocking factor would define a comparison of specific instances of random factor levels. To estimate the difference between the average enzyme level of the first and second litter, for example, only has a useful interpretation if these two arbitrary litters are again used in a replication of the experiment. This is also reflected in the population marginal means, of which there are only three—\(\mu_1,\dots,\mu_3\), one per treatment group—in our example, rather than one per block-treatment combination. We then define and estimate treatment contrasts exactly as before, and briefly exemplify the procedure for our experiment with three contrasts: the comparison of \(D1\) respectively \(D2\) to placebo, and the comparison of the average of \(D1\) and \(D2\) to placebo. The contrasts are \[ \Psi(\mathbf{w}_1) = \mu_2-\mu_1 \,,\quad \Psi(\mathbf{w}_2)=\mu_3-\mu_1 \,,\quad\text{and}\quad \Psi(\mathbf{w}_3)=\frac{\mu_2+\mu_3}{2}-\mu_1\;. \] The following code defines and estimates the contrasts based on the linear mixed model:

# Estimate linear mixed model
lmm.rcbd = lmer(y~drug+(1|litter), data=drugs_litters)
# Estimate group means
em.lmm = emmeans(lmm.rcbd, ~drug)
# Manually define contrasts
psi = list("D1 - Placebo"=c(-1,+1,0),
           "D2 - Placebo"=c(-1,0,+1),
           "(D1+D2)/2 - Placebo"=c(-1,+1/2,+1/2))
# Estimate contrasts
ct.lmm= contrast(em.lmm, psi)
# Get 95%-CI
ci.lmm = confint(ct.lmm)

The contrast estimates for our experiment are shown in Table 7.1 for an analysis based on aov(), the above analysis using lmer(), and an incorrect analysis of variance that ignores the blocking factor and treats the data as coming from a completely randomized design. Ignoring the blocking factor results in a residual variance of \(\sigma_b^2+\sigma_e^2\) instead of \(\sigma_e^2\) for the correct analysis. The resulting decrease in precision is clearly visible.

Table 7.1: Contrast estimates and confidence intervals based on three linear models.
Contrast	Estimate	se	df	LCL	UCL
ANOVA: aov(y~drug+Error(litter))
D1 - Placebo	4.23	0.33	14	3.51	4.94
D2 - Placebo	2.91	0.33	14	2.19	3.62
(D1+D2)/2 - Placebo	3.57	0.29	14	2.95	4.18
Mixed model: lmer(y~drug+(1\|litter))
D1 - Placebo	4.23	0.33	14	3.51	4.94
D2 - Placebo	2.91	0.33	14	2.19	3.62
(D1+D2)/2 - Placebo	3.57	0.29	14	2.95	4.18
INCORRECT: aov(y~drug)
D1 - Placebo	4.23	0.72	21	2.73	5.72
D2 - Placebo	2.91	0.72	21	1.41	4.40
(D1+D2)/2 - Placebo	3.57	0.62	21	2.27	4.86

7.2.4 Evaluating and Choosing a Blocking Factor

Evaluating a Blocking Factor

Once the data are recorded, we are interested in quantifying how ‘good’ the blocking performed in the experiment. This information would allow us to better predict the expected residual variance for a power analysis of our next experiment and to determine if we should continue using the blocking factor.

One way of evaluating the blocking is by treating the blocking factor as fixed, and specify the ANOVA model as y~drug+litter. The ANOVA table then contains a row for the blocking factor with an associated \(F\)-test (Samuels, Casella, and McCabe 1991). This test only tells us if the between-block variance is significantly different from zero, but does not quantify the advantage of blocking.

A more meaningful alternative is the calculation of appropriate effect sizes, and we can determine the percentage of variation removed from the analysis by blocking using the effect size \(\eta^2_\text{block}\), which evaluates to 31% for our example. In addition, the partial effect \(\eta^2_\text{p, block}=\) 86% shows that the vast majority of non-treatment sum of squares is due to the litter-to-litter variation and blocking by litter was a very successful strategy.

Alternatively, the intraclass correlation coefficient \(\text{ICC}=\sigma_b^2 / (\sigma_b^2+\sigma_e^2)\) uses the proportion of variance and we find an ICC of 79% for our example, confirming that the blocking works well. The ICC is directly related to the relative efficiency of the RCBD compared to a non-blocked CRD, since \[\begin{equation} \text{RE}(\text{CRD},\text{RCBD}) = \frac{\sigma_b^2+\sigma_e^2}{\sigma_e^2} = \frac{1}{1-\text{ICC}} \tag{7.2} \;, \end{equation}\] and \(\sigma_b^2+\sigma_e^2\) is the residual variance of a corresponding CRD. From our estimates of the variance components, we find a relative efficiency of 4.66; to achieve the same precision as our blocked design with 8 mice per treatment group would require about 37 mice per treatment group for a completely randomized design.

We can alternatively find the relative efficiency as \[ \text{RE}(\text{CRD},\text{RCBD}) = \frac{\widehat{\text{MS}}_\text{res, CRD}}{\text{MS}_\text{res, RCBD}}\;, \] where we estimate the residual mean squares of the CRD from the ANOVA table of the RCBD as a weighted average of the block and the residual sums of squares, weighted by the corresponding degrees of freedom: \[ \widehat{\text{MS}}_\text{res, CRD} = \frac{\text{df}_\text{block}\cdot \text{MS}_\text{block}\, +\, (\text{df}_\text{trt}+\text{df}_\text{res})\cdot \text{MS}_\text{res}}{\text{df}_\text{block}+\text{df}_\text{trt}+\text{df}_\text{res}}\;. \] For our example, \(\widehat{\text{MS}}_\text{res, CRD}=\) 1.92, resulting in a relative efficiency estimate of RE(CRD,RCBD)=4.34. It differs slightly from our previous estimate because it uses an approximation based on the ANOVA results rather than estimates of the variance components based on the linear mixed model.

Choosing a Blocking Factor

The main purpose of blocking is to decrease the residual variance for improving the power of tests and precision of estimates by grouping the experimental units before the allocation of treatments, such that units within the same block are more similar than units from different blocks. The blocking factor must therefore describe a property of experimental units that affects all treatments equally to avoid a block-by-treatment interaction, and that is independent of the treatment outcome, which precludes, e.g., grouping ‘responders’ and ‘non-responders’ based on the measured response.

Some examples of grouping units using intrinsic properties are (i) litters of animals; (ii) parts of a single animal or plant, such as left/right kidney or leafs; (iii) initial weight; (iv) age or co-morbidities; (v) biomarker values, such as blood pressure, state of tumor, or vaccination status. Properties non-specific to the experimental units include (i) batches of chemicals used for the experimental unit; (ii) device used for measurements; or (iii) date in multi-day experiments. These are often necessary to account for systematic differences from the logistics of the experiment (such as batch effects); they also increase the generalizability of inferences due to the broader experimental conditions.

The golden rule for designing experiments with blocks is

Block what you can, randomize what you can not.

As always, practical considerations should be taken into account when deciding upon blocking and randomization. There is no point in designing an overly complicated blocked experiment that becomes too difficult to implement correctly. On the other hand, there is no harm if we use a blocking factor that turns out to have no or minimal effect on the residual variance.

Creating a randomized complete block design from a completely randomized design. A: A CRD randomizes drugs on mice. B: Introducing a blocking factor 'above' groups the experimental units. C: Subdividing each experimental units and randomizing the treatments on the lower level creates a new experimental unit factor 'below'.

Figure 7.3: Creating a randomized complete block design from a completely randomized design. A: A CRD randomizes drugs on mice. B: Introducing a blocking factor ‘above’ groups the experimental units. C: Subdividing each experimental units and randomizing the treatments on the lower level creates a new experimental unit factor ‘below.’

We can think about creating a blocked design by starting from a completely randomized design and ‘splitting’ the experimental unit factor into a blocking and a nested (potentially new) unit factor. Two examples are shown in Figure 7.3, starting from the CRD (Figure 7.3A) randomly allocating drug treatments on mice. In the first RCBD (Figure 7.3B), we create a blocking factor ‘above’ the original experimental unit factor and group mice by their litters. This restricts the randomization of Drug to mice within litters. In the second RCBD (Figure 7.3C), we subdivide the experimental unit into smaller units by taking multiple samples per mouse. This re-purposes the original experimental unit factor as the blocking factor and introduces a new factor ‘below,’ but requires that we now randomize Drug on (Sample) to obtain an RCBD and not pseudo-replication.

7.2.5 Power Analysis and Sample Size

Power analysis for a randomized complete block design with no block-by-treatment interaction is very similar to a completely randomized design, the only difference being that we use the within-block variance \(\sigma_e^2\) as our residual variance. For a main effect with parameters \(\alpha_i\), the noncentrality parameter is thus \(\lambda_R=n\sum_i \alpha_i^2/\sigma_e^2=n\cdot k\cdot f_R^2\), while it is \(\lambda_C=n\sum_i \alpha_i^2/(\sigma_b^2+\sigma_e^2)=n\cdot k \cdot f_C^2\) for the same treatment structure in a completely randomized design. The two effect sizes are different for the same raw effect size, and \(f_C^2\) is smaller than \(f_R^2\) for the same values of \(n\), \(k\), and \(\alpha_i\); this directly reflects the increased precision in an RCBD.

The denominator degrees of freedom are \((k-1)(b-1)=kb-k-(b-1)\) for an RCBD and we loose \(b-1\) degrees of freedom compared to the \(k(b-1)=kb-k\) degrees of freedom for a CRD. The corresponding loss in power is very small and more than compensated by the simultaneous increase in power due to the smaller residual variance unless the blocking is very ineffective and the number of blocks large.

As an example, we consider a single treatment factor with \(k=\) 3 levels, a total sample size of \(N=b\cdot k=\) 30, and an effect size of \(f_{C,0}^2=\) 0.0625 for the treatment main effect at a significance level of \(\alpha=5\%\). The residual variance is \(\sigma^2=\) 2. Under a completely randomized design, we have a power of 20% for detecting the effect. This power increases dramatically to 61% when we organize the experimental units into \(b=\) 10 blocks with between-block variance \(\sigma_b^2=\) 1.5 and within-block variance \(\sigma_e^2=\) 0.5. The much smaller residual variance increases the standardized effect size to \(f_{R,0}^2=(\sigma_b^2+\sigma_e^2)/\sigma_e^2\cdot f^2_{C,0}=\) 0.25; the increase is exactly the relative efficiency (Eq. (7.2)).

Power for different sample sizes n for a completely randomized design with residual variance two and four randomized complete block designs with same overall variance, but four different within-block residual variances. Inset: same curves for small sample sizes show lower power of RCBD than CRD for identical residual variance due to lower residual degrees of freedom in the RCBD.

Figure 7.4: Power for different sample sizes n for a completely randomized design with residual variance two and four randomized complete block designs with same overall variance, but four different within-block residual variances. Inset: same curves for small sample sizes show lower power of RCBD than CRD for identical residual variance due to lower residual degrees of freedom in the RCBD.

The relation between power and within-block variance is shown in Figure 7.4 for five scenarios with significance level \(\alpha=5\%\) and an omnibus-\(F\) test for a main effect with \(k=\) 3 treatment levels. The solid grey curve shows the power for a CRD with residual variance \(\sigma_e^2+\sigma_b^2=2\) as a function of sample size per group and a minimal detectable effect size of \(f^2_\text{C}=\) 0.0625. The black curves show the power for an RCBD with four different within-block variances from \(\sigma_e^2=0.25\) to \(\sigma_e^2=2\). For the same sample size, the RCBD then has larger effect size \(f^2_\text{R}\), which translates into higher power. The inset figure shows that an RCBD has negligibly less power than a CRD for the same residual variance of \(\sigma_e^2=2\), due to the loss of \(b-1\) degrees of freedom.

7.2.6 Replication Within Blocks

Within-block replication by using each treatment multiple times in each block is usually not a profitable strategy, and we should increase replication in an RCBD by introducing more blocks. A rare exception occurs when the blocking factor is not very efficient or measurement error is large, and the residual variance \(\sigma_e^2\) is large compared to the between-block variance \(\sigma_b^2\). Observing each treatment multiple times per block might then increase the precision of within-block contrast estimates.

In some scenarios, however, it is necessary to use more than one replicate of each treatment per block. This is typically the case if our blocking factor is non-specific and introduced to capture grouping due to the logistics of the experiment. For example, we might conduct our drug comparisons in two different laboratories. We then replicate a full CRD in the two laboratories, leading to a design with (Lab) as a blocking factor, but multiple observations of each treatment in each laboratory. This design is helpful to account for laboratory-specific differences.

Another purpose for using a multi-laboratory experiment is to broaden the inference for our experiment. If we find stable estimates of contrasts between laboratories despite the expected systematic differences, then we demonstrated experimentally that our scientific findings are independent of ‘typical’ alterations in the protocol or chemicals used, for example.

The resulting design is called a generalized randomized complete block design (GRCBD) and its diagrams are shown in Figure 7.5 for eight mice per drug and two laboratories

A (generalized) randomized complete block design (GRCBD) with four mice per drug and laboratory. Completely randomized design for estimating drug effects of three drugs replicated in two laboratories.

Figure 7.5: A (generalized) randomized complete block design (GRCBD) with four mice per drug and laboratory. Completely randomized design for estimating drug effects of three drugs replicated in two laboratories.

Each mouse is used in exactly one laboratory and (Lab) thus groups (Mouse). We use the same drugs in both laboratories, making (Lab) and Drug crossed factors. The replication of treatments in each block allows estimation of the lab-by-drug interaction, which provides information about the stability of the effects and contrasts. The linear model with interaction is \[ y_{ijk} = \mu + \alpha_i + b_j + (\alpha b)_{ij} + e_{ijk}\;, \] where \(\alpha_i\) are the (fixed) treatment effect parameters, \(b_j\) the random block effect parameters with \(b_j\sim N(0,\sigma_b^2)\), \((\alpha b)_{ij}\sim N(0,\sigma_{\alpha b}^2)\) describes the interaction of treatment \(i\) and laboratory \(j\), and \(e_{ijk}\sim N(0,\sigma_e^2)\) are the residuals. The specification for a linear mixed model is y ~ drug + (1|lab) + (1|lab:drug) for our example.

If the lab-by-treatment interaction is negligible with \(\sigma_{\alpha b}^2\approx 0\), we remove the interaction term from the analysis and arrive at the additive model. Then, (Lab) acts as a blocking factor just like (Litter) in the previous example and eliminates the lab-to-lab variation from treatment contrasts.

If the interaction is appreciable and cannot be neglected, then the contrasts between drugs are different between the two labs, and we cannot determine generalizable drug effects. Now, (Lab:Drug) is the first random factor below Drug and provides the factor for comparing levels of Drug, and (Mouse) is sub-sampled from (Lab:Drug). Consequently, the variation against which drug effects are tested is larger and replication is lower, leading to decrease in precision and power for treatment contrasts.

7.2.7 Fixed Blocking Factors

We can typically consider the levels of a blocking factor as randomly drawn from a set of potential levels. The blocking factor is then random, and we are not interested in contrasts involving its levels, for example, but rather use the blocking factor to increase precision and power by removing parts of the variation from treatment contrasts.

A typical example of a non-random classification factor is the sex of an animal. The Hasse diagrams in Figure 7.6 show an experiment design to study the effects of our three drugs on both female and male mice. Each treatment group has eight mice, with half of them female, the other half male. The experiment design looks similar to a factorial design of Chapter 6, but the interpretation of its analysis is rather different. Most importantly, while the factor Sex is fixed with only two possible levels, its levels are not randomly assigned to mice. This is reflected in the fact that Sex groups mice by an intrinsic property and hence belongs to the unit structure. In contrast, levels of Drug are randomly assigned to mice, and Drug therefore belongs to the treatment structure of the experiment.

A blocked design using sex of mouse as a fixed classification factor, with four female and four male mice in of three each treatment groups.

Figure 7.6: A blocked design using sex of mouse as a fixed classification factor, with four female and four male mice in of three each treatment groups.

In contrast to a randomized complete block design, we cannot increase the number of blocks to increase the replication. With two levels of Sex fixed, we instead need to increase the experiment size by using multiple mice of each sex for each drug. Since Sex and Drug are fixed, so is their interaction, and all fixed factors are tested against the same error term from (Mouse).

The specification for the analysis of variance is y~drug*sex, and it looks exactly like the specification of a two-way ANOVA. However, since Sex is a unit factor that is not randomized, the roles of Sex and Drug are not symmetric and the interpretation is rather different from our previous two-way ANOVA with treatment factors Drug and Diet. In the factorial design, the presence of a Drug:Diet interaction means that we can modify the effect of a drug by using a different diet. In the fixed-block design, a Sex:Drug interaction means that the effect of drugs is not stable over the sexes and some drugs affect female and male mice differently. We can use a drug to alter the enzyme level in a mouse, but we cannot alter the sex of that mouse to modulate the drug effect.

7.2.8 Factorial Treatment Structure

It is straightforward to extend an RCBD from a single treatment factor to factorial treatment structures by crossing the entire treatment structure with the blocking factor. Each block then contains one full replicate of the factorial design, and the required block size rapidly increases with the number of factors and factor levels. In practice, only smaller factorials can be blocked by this method since heterogeneity between experimental units often increases with block size, diminishing the advantages of blocking. We discuss more sophisticated designs for blocking factorials that overcome this problem by using only a fraction of all treatment combinations per block in Chapter 9.

Figure 7.7: Experiment layout for three drugs under two diets, four litter blocks. Two-way crossed treatment structure within each block.

We illustrate the extension using our drug-diet example from Chapter 6 where three drugs were combined with two diets. The six treatment combinations can still be accommodated when blocking by litter, and using four litters of size six results in the experimental layout shown in Figure 7.7 with 24 mice in total. Already including a medium-fat diet would require atypical litter sizes of nine mice, however.

Model and Hasse Diagram

For constructing the Hasse diagrams in Figure 7.8 we note that the blocking factor is crossed with the full treatment structure—both treatment factors and the treatment interaction factor. We assume that our blocking factor does not differentially change the diet or drug effects and consequently ignore all three block-by-treatment interactions (Litter:Diet), (Litter:Drug), and (Litter:Diet:Drug).

Randomized complete block design for determining effect of two different drugs and placebo combined with two different diets using four mice per drug and diet, and a single response measurement per mouse. All treatment combinations occur once per block and are randomized independently.

Figure 7.8: Randomized complete block design for determining effect of two different drugs and placebo combined with two different diets using four mice per drug and diet, and a single response measurement per mouse. All treatment combinations occur once per block and are randomized independently.

We can easily derive a corresponding linear model from the Hasse diagrams. It is identical to our ‘usual’ model for an RCBD, except that its mean structure additionally contains the parameters for the second treatment factor and the interaction: \[ y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + b_k + e_{ijk}\;, \] where \(\alpha_i\) and \(\beta_j\) are the main effects of drug and diet, respectively, \((\alpha\beta)_{ij}\) the interaction effects between them, \(b_k\sim N(0,\sigma_b^2)\) are the random block effects, and \(e_{ijk}\sim N(0,\sigma_e^2)\) are the residuals. Again we assume that all random effects are mutually independent.

The enzyme levels \(y_{ijk}\) are shown in Figure 7.9 as dark grey points in an interaction plot separately for each block. We can clearly see the block effects, which systematically shift enzyme levels for all conditions. After estimating the linear model, the shift \(b_k\) can be estimated for each block, and the light grey points show the resulting ‘normalized’ data \(y_{ijk}-\hat{b}_k\) after accounting for the block effects.

Enzyme levels for placebo (point), drug D1 (triangle), and drug D2 (square). Data are shown separately for each of four litters (blocks). Lines connect mean values over litters of same drug under low and high fat diets. Dark grey points are raw data, light grey points are enzyme levels adjusted for block effect.

Figure 7.9: Enzyme levels for placebo (point), drug D1 (triangle), and drug D2 (square). Data are shown separately for each of four litters (blocks). Lines connect mean values over litters of same drug under low and high fat diets. Dark grey points are raw data, light grey points are enzyme levels adjusted for block effect.

ANOVA Table

We analyze the resulting data using either analysis of variance or a linear mixed model. The two model specifications derive directly from the Hasse diagrams and consist of two main effects and interaction for the treatment factors and an error stratum respectively random intercept for the litter unit factor. The specifications are y~drug*diet+Error(litter), respectively y~drug*diet+(1|litter).

The resulting ANOVA table based on the linear mixed model is

	Sum Sq	Mean Sq	NumDF	DenDF	F value	Pr(>F)
drug	74.1	37.05	2	15	91.54	3.93e-09
diet	2.59	2.59	1	15	6.39	2.32e-02
drug:diet	15.41	7.71	2	15	19.04	7.65e-05

Blocking by litter reduces the residual variance from 1.19 for the non-blocked design in Chapter 6 to 0.64, the relative efficiency is RE(CRD,RCBD)= 1.87. Due to the increase in power, the Diet main effect is now significant.

Contrasts

The definition and analysis of linear contrasts work exactly as for the two-way ANOVA in Section 6.6, and contrasts are defined on the six treatment group means. For direct comparison with our previous results, we estimate the two interaction contrasts of Table 6.6 in the blocked design. They compare the difference in enzyme levels for D1 (resp. D2) under low and high fat diet to the corresponding difference in the placebo group; estimates and Bonferroni-corrected confidence intervals are shown in Table 7.2.

Table 7.2: Interaction contrast estimates and Bonferroni-adjusted 95%-confidence intervals.
Contrast	Estimate	se	df	LCL	UCL
(D1 low - D1 high) - (Pl. low - Pl. high)	0.54	0.64	15	-1.04	2.13
(D2 low - D2 high) - (Pl. low - Pl. high)	3.64	0.64	15	2.06	5.22

As expected, the estimates are similar to those we found previously, but confidence intervals are substantially narrower due to the increase in precision.

7.3 Incomplete Block Designs

7.3.1 Introduction

The randomized complete block design can be too restrictive if the number of treatment levels is large or the available block sizes small. Fractional factorial designs offer a solution for factorial treatment structures by confounding some treatment effects with blocks (Chapter 9). For a single treatment factor, we can use incomplete block designs (IBD), where we deliberately relax the complete balance of the previous designs and use only a subset of the treatments in each block.

The most important example is the balanced incomplete block design (BIBD), where each pair of treatments is allocated to the same number of blocks, and so are therefore all individual treatments. This specific type of balance ensures that pair-wise contrasts are all estimated with the same precision, but precision decreases if more than two treatment groups are compared.

We illustrate this design with our drug example, where three drug treatments are allocated to 24 mice on a low fat diet. If the sample preparation is so elaborate that only two mice can be measured at the same time, then we have to run this experiment in twelve batches, with each batch containing one of three treatment pairs: (placebo, \(D1\)), (placebo, \(D2\)), or (\(D1\), \(D2\)). A possible balanced incomplete block design is shown in Figure 7.10, where each treatment pair is used in four blocks, and the treatments are randomized independently to mice within each block. The resulting data are shown in Table 7.3.

Figure 7.10: Experiment layout for a balanced incomplete block design with three drugs in twelve batches of size two.

Table 7.3: Data for a BIBD with 12 blocks of size 2 and 3 treatment levels.

Batch	Drug	y
Block 1	Placebo	8.79
Block 1	D1	12.89
Block 2	D2	13.49
Block 2	Placebo	10.58
Block 3	D1	12.76
Block 3	D2	11.64
Block 4	Placebo	10.14
Block 4	D1	14.65

Batch	Drug	y
Block 5	D2	14.05
Block 5	Placebo	11.93
Block 6	D1	12.98
Block 6	D2	11.47
Block 7	Placebo	8.06
Block 7	D1	12.80
Block 8	D2	11.22
Block 8	Placebo	9.10

Batch	Drug	y
Block 9	D1	13.87
Block 9	D2	13.92
Block 10	Placebo	9.03
Block 10	D1	13.67
Block 11	D2	13.42
Block 11	Placebo	10.62
Block 12	D1	15.63
Block 12	D2	12.74

7.3.2 Defining a Balanced Incomplete Block Design

The key requirement of a BIBD is that all pairs of treatments occur the same number of times in a block. This requirement restricts the possible combinations of number of treatments, block size, and number of blocks. With three pairs of treatments as in our example, the number of blocks in a BIBD is restricted to be a multiple of three. The relations between the parameters of a BIBD are known as the two defining equations \[\begin{equation} rk = bs \quad\text{and}\quad \lambda\cdot (k-1) = r\cdot (s-1)\;. \tag{7.3} \end{equation}\] They relate the number of treatments \(k\), number of blocks \(b\), block size \(s\) to the number of times \(r\) that each treatment occurs and the number of times \(\lambda\) that each pair of treatments occurs in the same block. If these equations hold, then the corresponding design is a BIBD.

The equations are derived as follows: with \(b\) blocks of \(s\) units each, the product \(bs\) is the total number of experimental units in the experiment. This number has to equal \(rk\), since each of the \(k\) treatments occurs in \(r\) blocks. Moreover, each particular treatment occurs \(\lambda\) times with each of the remaining \(k-1\) treatments. It also occurs in \(r\) blocks, and is combined with \(s-1\) other treatments in each block.

For our example, we have \(k=3\) treatment levels and each treatment occurs in \(r=8\) blocks, we have \(b=12\) blocks each of size \(s=2\), resulting in \(\lambda=4\) co-occurrences of each pair in the same block. This satisfies the defining equations and our design is a BIBD. In contrast, for only \(b=5\) blocks, we are unable to satisfy the defining equations and no corresponding BIBD exists.

In order to generate a balanced incomplete block design, we need to find parameters \(s\), \(r\), and \(\lambda\) such that the defining equations (7.3) hold. A particularly simple way is by generating an unreduced or combinatorial design and use as many blocks as there are ways to select \(s\) out of \(k\) treatment levels. The number of blocks \(b\) is then \(b={k \choose s}=k!/(k-s)!s!\) and increases rapidly with \(k\).

We can often improve on the unreduced design, and generate a valid BIBD with substantially fewer blocks. Experimental design books sometimes contain tables of BIBDs for different number of treatment and block sizes, see for example Cochran and Cox (1957). As a further example, we consider constructing a BIBD for \(k=6\) treatments in blocks of size \(s=3\). The unreduced combinatorial design requires \(b={6 \choose 3}=20\) blocks (experiment size \(N=60\)). Not all of these are needed to fulfill the defining equations, however, and a smaller BIBD exists with only \(b=10\) blocks (and \(N=30\)).

7.3.3 Analysis

The balanced incomplete block design is not fully balanced: since only a fraction of the available treatment levels occur in each block, some combinations of block and treatment levels are not observed, resulting in some-cells-empty data. The treatment and block main effects are no longer independent, but the requirements encoded in the two defining equations ensure that unbiased estimates exist for all treatment effects, and that standard ANOVA techniques are available for the analysis. However, the treatment and block factor sums of squares are not independent, and while their sum remains the same, their individual values now depend on the order of the two factors in the model specification. Moreover, part of the information about treatment effects is captured by the differences in blocks; in particular, there are two omnibus \(F\)-tests for the treatment factor, the first based on the inter-block information and located in the ANOVA table in the block-factor error stratum, and the second based—as for an RCBD—on the intra-block information and located in the residual error stratum. A linear mixed model automatically combines these information, and analysis of variance and mixed model results no longer concur exactly for a BIBD.