Breeding theory

 

 

Statistics review – linear models, means, variances, LSDs, and repeatability.

 

 

  • Review the linear model for plot measurements in variety trials and nurseries, and the statistics derived from it.

  • Understand the purpose of replication in breeding programs

  • Review the concept of experimental repeatability

  • Model the relationship between replication, the standard error of a cultivar mean (SEM), the least significant difference between the means of 2 cultivars, and repeatability

 

 

 

 

 

 

 

Introduction

One of the most important tasks of the breeder is to distinguish between the effects of genotype and environment in nurseries and trials. The genotypic differences between lines in a nursery or trial can be thought of as the “signal” the breeder is trying to detect; the effects of experimental units (plots) are “noise”. Genotype and environmental effects are both part of the phenotypic value, or measurement taken on a field plot.  

 

Genotypic and environmental effects are incorporated into the linear models used to analyze the results of field trials. A familiarity with the linear model for phenotypic value is a prerequisite to further discussions on how to increase the precision of variety trials, maximize selection response and use breeding resources efficiently.  

 

In this lesson, we will examine the effect of field variability and replication on the precision with which cultivar effects are estimated.

 

 

 

 

 

 

1. The linear model for plot measurements

 

Genotype means

 

Plot measurements are analysed using a linear model that permits the separation of genotypic effects from random error or “noise” resulting from plot differences in soil quality or water availability. In a completely randomized design ( CRD), or a design with no blocking, a measurement on a plot (P) in a nursery or variety trial can be decomposed into the following components:

 

 

Where:

μ

= the mean of all plots

Gi

= the effect of the ith genotype, expressed as a deviation from the mean of all plots

ej

= the “residual” effect of the jth plot, expressed as a deviation from the mean of all plots

 

 

 

Thus, in an unreplicated nursery, measurements on single plots completely confound (mix) genotype and plot effects. In replicated trials or nurseries, the mean of a variety j over plots () has the expected value  μ + Gi . In practice, the mean of a variety estimated from a trial is:

 

 

where r is the number of replicates.

 

As all agronomists know, some locations in the field are more productive than others. If we randomly assign a genotype to many different plots (i.e., if replication is high) the positive and negative plot effects tend to cancel each other out, and ∑ej approaches 0.  However, when replication is low ( r<4), the residuals (∑ej/r) can cause the mean we estimate from plots to differ greatly from the true genotype mean μ + Gi. The reason we replicate breeding trials and nurseries is to minimize the effect of the plot residuals on our estimate of genotype mean, not to generate an error term for testing the significance of differences (and not to compensate for lost plots)!

 

 

 

Variances

The variance of a genotype mean is:

 

 

 

The standard error of a mean (SEM) is the square root of σ2Y.

 

Breeders are often interested in the variance of the difference between 2 varieties.  The difference (D) may be expressed as follows:

 

 

For two independent and uncorrelated variables, the variance of the sum equals the sum of the variances. Note that if Y=cX, where c is a constant, then σ2Y = c2σ2X. Therefore,

 

 

 

The standard error of a difference between 2 means (SED) is the square root of σ2D.

 

The Least Significant Difference (LSD) is the critical value of a t-test of the difference between 2 means. It is calculated as:

 

 

 

Where t.α/2,edf is the t value for a significance level of α/2, and the number of degrees of freedom in the error term (edf) of the analysis of variance of the trial or nursery.  

 

For α/2, t.α/2,edf is always approximately 2.  As an approximation,

 

 

 

It is important to be clear about the meaning of the LSD value.

 

The LSD.05 is the difference between estimates of the means of 2 lines with the same genotypic value that is expected to be exceeded by chance in 5% of the repetitions of a variety trial.

 

 

 

Put in another way, suppose you took 2 samples of seed of one variety, harvested from the same plot, and put each sample in a different bag, giving each a different name, and then you took these 2 samples and compared them in a replicated yield trial in different sets of plots. By chance, the difference in mean yield between these samples of identical material would exceed the LSD.05 value 5% of the time.

 

 

The SEM, SED, and LSD are important measures of the precision of a variety trial, or its ability to detect small differences.

 

 

 

 

 

 

 

2. Genetic and phenotypic variances

 

The genetic (or genotypic) variance in a cultivar trial is the variance of the cultivar effects, or the Gi’s in equation 5.1.  

It is denoted σ2G.

 

The phenotypic variance in a cultivar trial is the variance of cultivar means across reps. It is denoted σ2P.

Because means are based on plot measurements, which contain both Gi’s and ej’s, σ2P contains both the genetic variance and a portion of the residual variance:

 

 

where is the plot residual or error variance from the ANOVA, and r is the number of reps.   

 

and are estimated from the ANOVA as follows:

 

Source of variation

Df

Mean square

Expected mean square

Replicates

 r-1

 

 

Genotypes

 g-1

MSG

2G + σ2e

Residual

(r-1)(g-1)

MSE

σ2e

 

Thus, to estimate σ2G :

 

 

σ2e is estimated directly as the error variance of the experiment.

 

 

 

3. The repeatability of variety trials

 

The repeatability of a variety trial is the proportion of the variation among line means that is due to the variation in genotype effects. This statistic, also denoted broad-sense heritability (H), is calculated as:

 

 

H is an important measure of the precision (reliability) of an experiment. It is an estimate of the correlation expected between different repetitions or runs of exactly the same experiment, or the correlation you would expect between means estimated for the same set of cultivars, estimated twice by re-randomizing the cultivars to different plots in the same field. Values of H are between 0 and 1. H is not a constant! As the number of replicates increases, or as the error variance decreases, H approaches 1. Thus, you can “buy” higher H for your experiment by investing in more replication.

 

Note that H estimates pertain only to the trial in which they were estimated. H estimates from trials with fewer than 30 varieties are not very reliable.

 

Note also that H does not  tell us anything about the Mendelian transmissibility of a trait from parent to offspring. It is a purely statistical measure of the repeatability of a trait in a particular experiment.

 

 

 

 

 

 

4. Modelling the effect of replication on the precision and repeatability of variety trials

 

You can use equations

 

and

 

 

 

to model the effect of changing the replicate number in a trial.  

 

These equations are therefore useful to breeders in planning their programs.  

 

Because replication is expensive; it is important for breeders to examine whether trials are over or under-replicated.

 

 

 

 

 

 

Modelling Exercise:

 

 

 

Let's conclude

 

Summary

 

  1. In field trials and nurseries, genotype and plot effects are always confounded.  Plot effects are the “noise” in breeding experiments that obscures estimates of genotypic value.

  2. In breeding programs, the purpose of replication is to separate plot and genotype effects,  reducing their confounding, improving the precision of trials and increasing our ability to identify superior genotypes.

  3. Genotypic and error variances estimated from the analysis of variance can be used to estimate both the LSD and repeatability or broad-sense H.

  4. Increasing replication will reduce the LSD, increase H, and increase the precision and repeatability of the trial.

  5. The gains in precision from increasing replication diminish quickly with each additional replicate. It is rarely cost-effective to use more than 4 replicates.

  6. Equations for LSD and H can be used to model the effect of increasing or decreasing replication on the precision and repeatability of trials.

 

 

Next lesson

 

In the next lesson, we will discuss choosing parents and managing a pedigree breeding program .