## Abstract

Epistasis contributes significantly to intrapopulation variation in floral morphology, development time, and male fitness components of *Mimulus guttatus*. This is demonstrated with a replicated line-cross experiment involving slightly over 7000 plants. The line-cross methodology is based on estimates for means. It thus has greater power than the variance partitioning approaches historically used to estimate epistasis within populations. The replication of the breeding design across many pairs of randomly extracted, inbred lines is necessary given the diversity of multilocus genotypes residing within an outbred deme. Male fitness is shown to exhibit synergistic epistasis, an accelerating decline in fitness with inbreeding. Synergism is a necessary, but not sufficient, condition for a mutational deterministic hypothesis for the evolutionary maintenance of sexual reproduction. Unlike male fitness measures, flower morphology and development time yield positive evidence of epistasis but not of synergism. The results for these traits suggest that epistatic effects are variable across genetic backgrounds or sets of interacting loci.

EPISTASIS is the interaction between different genetic loci in determining phenotype. The prevalence and nature of epistasis are central to a range of questions in evolutionary biology. Sewall Wright developed his shifting balance theory to provide a mechanism for the evolution of favorable gene combinations (Wright 1931; Wade 1992). The evaluation of this theory, and more generally of the interaction between epistasis and population structure, has been a major impetus to experimental work (Mallet and Singer 1987; Husband and Barrett 1992a,b; see reviews by Coyne *et al.* 1997; Wade and Goodnight 1998). Ronald Fisher, the principal antagonist to the shifting balance model, himself invoked “modifiers” in a number of evolutionary hypotheses (*e.g.*, Fisher 1958, Chap. 3). Modifiers, which are epistatic by definition, were also a recurring theme of Fisher's colleagues in the ecological genetics school (Ford 1971). More recently, evolutionary geneticists have explored the role of gene interactions on the evolution of sex and recombination (Peters and Lively 2000), developmental integration (Rice 1998), speciation (Rieseberg *et al.* 1996; Johnson 2000), and the rate and pattern of adaptive diversification (Wade 2000).

There is a great deal of evidence that epistasis contributes to the genetic differences among species (*e.g.*, Doebley *et al.* 1995; Fishman and Willis 2001) and among populations within a species (*e.g.*, Hard *et al.* 1992; Burton *et al.* 1999; Fenster and Galloway 2000). This has been established by comparing the average phenotype of various hybrids (the F_{1}, F_{2}, or backcross progeny) to the corresponding values of the parental species or lines. Without epistasis, the mean phenotype of F_{2} individuals should be exactly intermediate between the F_{1} and midparent means (the midparent is the average of the parental species). The often observed “F_{2} breakdown,” where the average F_{2} phenotype is substantially lower than expected, is thus an example of epistasis (Grant 1975; Burton 1990). Fine mapping of the variants responsible for hybrid failure in interspecies crosses has subsequently revealed complicated patterns of gene interaction (Cabot *et al.* 1994; Palopoli and Wu 1994; but also see Zeng *et al.* 2000).

The finding of epistasis in interspecies crosses is not compelling evidence that selection acted on gene combinations to produce the phenotypic divergence between these species (Whitlock *et al.* 1995, pp. 612–613). Nor does it directly imply that epistatic variation was present within the ancestral species. In fact, the loci interacting to cause hybrid failure may never have been simultaneously polymorphic within the ancestral population. The classical Dobzhansky-Muller model of hybrid breakdown yields interspecies epistasis without selection ever acting directly on gene combinations (Dobzhansky 1936; Muller 1939; Orr 1995). We need to consider the genetic variability present within species to evaluate the evolutionary consequences of epistasis.

Barker (1979) reviewed attempts to estimate the contribution of epistasis to quantitative trait variation within contiguous natural populations, *i.e*., demes. Considering studies of different types (*e.g.*, variance partitioning, response to selection and inbreeding, and linkage disequilibria), he concluded that the evidence was “equivocal and unsatisfying.” This could be interpreted to mean that epistasis is simply more important in determining differences among highly divergent genotypes (species or lines) than the more modest differences among individuals within demes. Alternatively, the differing results from inter- and intraspecific studies may simply reflect differences in statistical power (Fenster *et al.* 1997). The line-cross method used in agricultural and interspecies studies is based on statistical comparisons of mean phenotypic values. In contrast, the variance-partitioning approaches frequently used to estimate epistasis within populations (*e.g.*, Miller *et al.* 1963; Barker 1974) are based on second moments, variances, and covariances. For a given sample size, means are typically estimated with far greater accuracy than variances or covariances.

The traditional method of model evaluation in line-cross studies is the joint scaling test (Mather 1949; Anderson and Kempthorne 1954; Hayman 1960; Mather and Jinks 1982). Following Lynch and Walsh (1998, Chap. 9), I refer to the various categories of individuals such as the parental lines, F_{1}, F_{2}, and backcrosses as “line-cross derivatives.” In a joint scaling test, the means for each line-cross derivative are used to calibrate and then test models of inheritance. Analysis generally begins by considering the simplest model, additive inheritance. With strict additivity, the means of all line-cross derivatives can be predicted given only two parameters, the overall mean (*M*) and the additive effect (*A*). A set of expected values for each line-cross derivative is obtained by estimating these parameters. The correspondence between observed and expected means, and hence the sufficiency of the model, can then be evaluated using the chi-square test. If the strictly additive model is rejected, one then proceeds to evaluate a model allowing dominance (three parameters: *M*, *A*, and *D*, the dominance deviation). The rejection of this model implies epistasis. With a sufficient number of distinct line-cross derivates, increasingly complicated models of gene interaction can be tested (*e.g.*, Jinks and Perkins 1969).

A frequently cited shortcoming of line-cross studies is that the results, estimates for dominance and epistasis, are specific to the particular pair of inbred lines chosen as parentals. This is not a serious difficulty for many agricultural applications, because a limited number of cultivars/varieties/genotypes are widely used by farmers. However, there are an enormous number of distinct multilocus genotypes in an outbred deme. It is difficult, perhaps even impossible, to realistically concentrate this variability into a single pair of inbred lines.

This article describes a study using the line-cross methodology to investigate variation resident within a single natural population. I replicate the joint scaling test across a large collection of inbred lines that are representative of the ancestral outbred population in terms of allele frequencies at quantitative trait loci. Lines were randomly paired without consideration of their phenotypic attributes and each pair of lines was used to synthesize a set of line-cross derivatives (Figure 1). The progeny produced from this array of crosses were grown under common conditions and measured for development time, flower morphology, and male fitness components. This experiment addresses two basic questions: Do these traits exhibit significant epistasis? If so, are epistatic effects consistent in their direction across sets of interacting loci?

The characters chosen for study, flower morphology, development time, and pollen production, are primary determinants of plant fitness in the wild. The genetic analysis of measurements on these traits consists of two parts, both of which are based on comparisons of mean phenotypic values. The first considers the phenotypic response to inbreeding. In the context of the ancestral population, line-cross derivatives have inbreeding coefficients (*f*) that range from 0 to 1 (Figure 1). A simple test for epistasis within a diploid population is to determine whether the relationship between the average trait value and *f* is nonlinear. If loci contribute additively to trait variation, the mean phenotype should change linearly with *f* (Wright 1951; Kempthorne 1957). In contrast, consistently positive or consistently negative interactions among loci predict a nonlinear relationship between mean phenotype and *f* (Crow and Kimura 1970, pp. 77–85).

The failure to detect nonlinearity in the relationship between mean phenotype and *f* does not conclusively indicate the absence of epistasis. An approximately linear relationship may be obtained if epistatic effects are important, but variable in direction across loci (Elena and Lenski 1997; Phillips *et al.* 2000). However, such variable epistatic effects should be evident as deviations between observed and expected means within individual cross-families. For example, the mean value of F_{2} plants may be significantly greater than the expected values in some families, but less than those in others. These deviations may cancel when considering the population as a whole because all F_{2} plants have the same inbreeding coefficient. However, these fluctuating deviations are appropriately integrated in the *Replicated joint scaling test* to produce a population-wide test for epistasis.

## MATERIALS AND METHODS

#### Study system:

*Mimulus guttatus* is rapidly becoming one of the most thoroughly studied plants in ecology and evolutionary biology (Grant 1924; Kiang 1972, 1973; Vickery 1978; Macnair and Cumbes 1989; Fenster and Ritland 1992, 1994a,b; Willis 1996, 1999a,b; Dudash and Carr 1998). Yellow monkeyflower is a self-compatible hermaphrodite that occurs throughout western North America. It reproduces by a mixture of outcrossing and self-fertilization, but the selfing rate differs greatly among natural populations (Ritland and Ganders 1987a,b; Dudash and Ritland 1991; Dole and Ritland 1993; Willis 1993b). Local populations also differ extensively in floral morphology (Waser *et al.* 1982), in the amount of genetic variation for floral traits (Carr and Fenster 1994; Robertson *et al.* 1994; Kelly and Arathi 2003), and in the timing of selfing over the life span of the flower (Dole 1990; Dudash and Ritland 1991; Leclerc-Potvin and Ritland 1994; Arathi *et al.* 2002).

This experiment investigates genetic variation within one particular natural population called “Iron Mountain,” which is located in Oregon's Western Cascades (see Willis 1996, 1999a,b). Iron Mountain contains several hundred thousand individuals continuously distributed over an area of ∼400 square meters. The population contains a tremendous amount of variation at microsatellite loci (Kelly and Willis 1998, 2002). Some loci have 20–30 alleles and expected heterozygosities >0.8, indicating that the population has not suffered a recent bottleneck in size. Despite limitations on both seed and pollen dispersal, microsatellite polymorphisms exhibit little or no spatial structure within this population (Sweigart *et al.* 1999). The implication is that gene flow is sufficient so that Iron Mountain can be considered a cohesive genetic population for evolutionary studies. However, the microsatellite data do not imply that spatial structure is absent at loci that might be under stronger selection.

John H. Willis initiated ∼1200 independent lines from Iron Mountain in August of 1995. Each line was founded from the seed set of a separate field-collected plant. Each line was subsequently maintained by single-seed descent (self-fertilization) for seven to nine generations. Over the course of these generations, most lines went extinct due to the random fixation of lethal and/or sterile alleles. However, over 300 survived to at least generation seven. The progeny of these surviving lines now have an inbreeding coefficient of >0.99. As expected, the surviving lines are almost completely homozygous at highly polymorphic microsatellite loci with different lines fixed for different alleles (Willis 1999a; L. Holeski, unpublished results).

#### Crossing design:

Two hundred of these lines were used as parental plants in a breeding design (Figure 1). Pairs of lines, P_{1} and P_{2}, were randomly selected to initiate independent “line-cross families.” These lines were crossed to produce an F_{1} family. The F_{1} individuals were self-fertilized to produce F_{2} families and also crossed to each parental line to produce backcross families: Backcross (BC)_{1} is the family derived from the cross between P_{1} and F_{1} while BC_{2} is the family derived from P_{2} and F_{1}. Seeds from the F_{2} family were grown to maturity, self-fertilized to produce an F_{3} family, and backcrossed to an F_{1} individual. I denote progeny from the last cross as BC_{X}. Due to failure of crosses or seed germination, only 76 line-cross families had at least 4 line-cross derivatives, and for reasons described below, the replicated joint scaling tests are limited to these data.

The controlled matings and self-fertilizations described in Figure 1 were conducted over the span of three plant generations, distinct from the generations in which plants were measured (see *Phenotypic measurements and plant growth*). Most of the line-cross derivatives within a line-cross family were synthesized from multiple crosses, each from a pairing of distinct individuals of the parental types. Many of the crosses were done reciprocally, with both parental line-cross derivatives serving as the maternal plant. For example, the collection of F_{1} plants in line-cross family 84 consisted of five full-sib families, each derived from a distinct pairing of P_{1} and P_{2} individuals. Multiple crosses were done for a number of reasons. First, they effectively eliminated any association between the age of seed when sown and the inbreeding coefficient (Lynch and Walsh 1998, pp. 262–263). Seeds for P_{1} and P_{2} were generated in the first generation of the mating scheme, simultaneously with the first set of F_{1} plants. A second set of P_{1} and P_{2} seeds was created in the second generation of the mating scheme, simultaneously with the synthesis of BC_{1}, BC_{2}, and F_{2} plants; and a third set was created in the final generation, simultaneously with the synthesis of BC_{X} and F_{3} plants. Seeds for several of the other derivates, BC_{1}, BC_{2}, F_{1}, and F_{2}, were also generated during different generations of the breeding scheme.

Synthesizing line-cross derivatives from multiple crosses also alleviates potential biases caused by maternal effects. The phenotype of an individual may depend not only on its own inbreeding coefficient, but also on that of its mother (Vogler *et al.* 1999). The maternal inbreeding effect can be estimated by comparing plants from reciprocal crosses in which the parents differ in their respective inbreeding coefficients. In the present crossing design (Figure 1), each backcross family was derived from a cross between a completely inbred plant (P_{1} or P_{2}) and an outbred F_{1}. The two sets of progeny in the reciprocal cross have the same individual inbreeding coefficients (*f* = 0.5), but different maternal inbreeding coefficients (either 0 or 1). A consistent difference in phenotype across line-cross families, say reduced pollen production by plants with inbred moms, would indicate an effect of maternal inbreeding. In addition to the backcrosses, the cross generating BC_{X} was done reciprocally between parents with differing inbreeding coefficients.

Three of the eight line-cross derivatives (P_{1}, P_{2}, and F_{1}) are genetically homogenous. All individuals within these categories are genetically identical, both within and among full-sib families (at least for the autosomal genome within a particular line-cross family). The F_{1} plants may be heterogeneous in the cytoplasmic genome, as it may be descended from different parental lines in different full-sib families. The other five line-cross derivatives (BC_{1}, BC_{2}, F_{2}, F_{3}, and BC_{X}) will contain a mixture of genotypes even in the autosomal genome. The parents are genetically homogeneous for the first three of these derivates, BC_{1}, BC_{2}, and F_{2}. Thus, for the autosomal genome, there is no greater genetic relatedness of individuals within full-sib families than among full-sib families (again we are referring to the different progeny sets within these line-cross derivatives of a particular line-cross family). However, for the last two derivates, F_{3} and BC_{X}, at least one of the parents is an F_{2} individual. Since the F_{2} is genetically heterogeneous, there is increased genetic relatedness of individuals within full-sibs of the F_{3} and BC_{X} line-cross derivatives. These considerations do not directly impact the calculations described below, because they are based entirely on the mean values of line-cross derivatives. However, they are relevant to the more rigorous treatment that includes the variance within line-cross derivatives to be described in a future article (J. K. Kelly, unpublished results).

#### Phenotypic measurements and plant growth:

Phenotypic data were collected from plants grown in three successive intervals. The first “block” was initiated on May 8, 2003, using only six of the line-cross derivatives within each line-cross family, the P_{1}, P_{2}, F_{1}, BC_{1}, BC_{2}, and F_{2}. The second block of plants, initiated on July 30, 2003, also consisted of plants from the first six line-cross derivatives. The third block, the seeds of which were sown on Jan 20, 2004, consisted of plants from all eight line-cross derivatives. Seeds were germinated under standard conditions in the greenhouse (see Kelly and Arathi 2003), and 2 weeks later, individual seedlings were transplanted into 98-well plug trays. These trays were maintained in a growth room with 18-hr days and 6-hr nights. Plants were fertilized once per week by bottom watering using a high P nutrient mixture. Trays were rotated daily. I noted the day on which each plant produced its first flower and then measured corolla width, corolla length, pistil length, and anther length. Stigma-anther separation is calculated as the difference between the latter two measurements.

Anthers were collected from the first flower into a microcentrifuge tube and stored. Pollen counts were later done using a Coulter (Miami) counter model Z1 dual. The sample from each tube was then diluted into 20 ml of electrolyte solution and run through the machine to produce particle counts within two size ranges. My estimate for total pollen per flower is the sum of these two counts × 20 (since only 1 ml of solution is processed in a run of the machine). The pollen size index (PSI) is the proportion of grains in the larger size category. There is a strong positive correlation between PSI and pollen viability as measured through standard staining techniques (Kelly *et al.* 2002). The proportion of viable grains (PV) is accurately estimated from the equation: PV = 0.l + PSI. For the rare samples in which PSI > 0.9, I set PV = 1.0. Viable pollen per flower was calculated as the product of PV and pollen number. In data analysis, I treat pollen traits as characteristics of the parental plant (the sporophyte).

## ANALYSIS AND RESULTS

Table 1 summarizes the means, standard deviations, and sample sizes for each trait in each of the three blocks. Since each block contains different proportions of inbred and outbred individuals, the differences in mean trait values are only partly reflective of a “block effect.” A substantial number of measurements are missing from the 7053 plants in the study. Day of first flower was noted for all plants and the great majority were measured for corolla diameter. The remaining measurements were taken on most, but not on all plants.

Each trait was subjected to two analyses, first the *combined population analysis* and then the *replicated joint scaling test*. Both involve comparisons of different statistical models. The model parameters are estimated from means and standard errors using generalized least squares. The fitted model is then used to establish a set of “expected values” for each mean. Observed and expected values are compared, using chi-square statistics to assess significance. I wrote C programs to conduct the relevant calculations following the theory described in Lynch and Walsh (1998, pp. 198–204 and 216–219). The code for these programs and a sample input file (plant data) are provided in the supplemental materials (http://www.genetics.org/supplemental/).

#### Combined population analysis:

This procedure determines the effects of block, individual inbreeding, and maternal inbreeding on each character. Let μ denote the outbred grand mean for a particular trait and *B _{i}* represent the effect of block

*i*. I assume that, if maternal inbreeding matters, the effect (

*v*) is linearly related to the inbreeding coefficient of the maternal plant (

*F*

_{m}). Finally, δ

_{j}represents the effect of individual inbreeding coefficient

*j*on trait values. Individual inbreeding coefficients were 0 (F

_{1}plants), 0.5 (BC

_{1}, BC

_{2}, F

_{2}, and BC

_{X}), 0.75 (F

_{3}), or 1 (P

_{1}and P

_{2}) in this experiment (Figure 1). The general model for the expected mean phenotype can then be written as(1)There are seven distinct parameters in this model for this study (μ,

*B*

_{2},

*B*

_{3},

*v*, δ

_{0.5}, δ

_{0.75}, and δ

_{1.0}). The parameter μ essentially estimates the mean of outbred plants with outbred mothers when grown in block 1.

Of the 36 possible combinations of block, individual inbreeding, and maternal inbreeding, 14 were actually represented in this study. By generalized least squares, I estimated the seven parameters in Equation 1 from this set of means and standard errors. A chi-square value was calculated from the observed and expected means, and this statistic was used in tests for phenotypic responses to both maternal and individual inbreeding. Both of the null hypotheses described below are defined as simplified versions of Equation 1.

To test whether the level of maternal inbreeding had an effect on a character, I set *v* = 0 and then reestimated the remaining parameters of Equation 1. The chi-square value produced when the expected values from this reduced model are compared to the observed means must be greater than or equal to the corresponding chi-square from the full model. A sufficiently large difference indicates a significant effect of maternal inbreeding. Since the two models differ by a single parameter (1 d.f.), the appropriate critical value from the chi-square distribution is 3.84. To test whether the response to individual inbreeding is linear, I estimated a model in which δ_{0.5} = 0.5*b*, δ_{0.75} = 0.75*b*, and δ_{1.0} = *b*, where *b* is the slope. This reduces the number of parameters by 2 and the appropriate critical value is 5.99. By similar means, one could address additional hypotheses, *e.g.*, for an effect of block or for whether an individual inbreeding coefficient has any effect on trait values. I do not present these analyses because the effects are obvious, have been demonstrated previously, or are tangential to the purposes of this article.

Table 2 summarizes model tests for each trait on its original scale of measurement and also for log-transformed male fitness measures. The pollen traits (pollen number, PSI, and viable pollen) were log-transformed because, in many situations, the “null model” for fitness assumes that loci and environmental effects combine multiplicatively (Crow and Kimura 1970). These dependences become additive after log-transformation. Only number of days to flower exhibits a significant effect of maternal inbreeding, although log-transformed viable pollen is only marginally nonsignificant. Response to individual inbreeding is approximately linear for each morphological trait and for total pollen number (Figure 2A). However, both PSI and viable pollen show a significantly nonlinear response to inbreeding. Both fitness measures exhibit “synergistic” or “reinforcing” epistasis (Crow and Kimura 1970, pp. 80–81), an accelerating decline in value as inbreeding coefficient increases (Figure 2B).

#### Replicated joint scaling test:

For each character, I first applied a standard joint scaling test to each line-cross family. Prior to this analysis, the data were corrected for the effects of maternal inbreeding (for days to flower) and block effects (for all characters) as estimated in the *Combined population analysis*. Then, by generalized least squares, I estimated the three parameters of the additive-dominance model from the observed means and standard errors of each line-cross derivative. Using the formulation of Lynch and Walsh (1998, p. 209), the expected values are given in Table 3. The parameters *M*, *A*, and *D* are specific to each line-cross family, their values defined by the genotypes of the parental lines.

If the number of estimated line-cross derivative means exceeds the number of estimated parameters, then a test of model sufficiency is possible. Thus, we require only four line-cross derivatives to test the additive-dominance model, but more derivatives (up to eight here) increase power to detect epistasis. The hypothesis test for a particular line-cross family is similar to that used previously. I first established expected values for each line-cross derivative using the estimated values for *M*, *A*, and *D*. Deviations between observed and expected means were measured using a chi-square statistic. This statistic was then compared to the chi-square distribution with the number of degrees of freedom equal to the number of line-cross derivatives minus the number of estimated parameters (three).

Figure 3 provides a sample of results for corolla width and PSI within three line-cross families. The degree of correspondence between the observed means of line-cross derivatives and those predicted from the nonepistatic model is highly variable, both among line-cross families for a particular character and among traits within a line-cross family. For most characters, the magnitude and direction of deviations for particular derivatives are idiosyncratic. For example, the observed mean corolla width of F_{3} plants is substantially less than expected in line-cross family 6, slightly less than expected in family 41, but substantially greater than expected in family 61 (Figure 3, left column). However, the nonlinear relationship between PSI and inbreeding coefficient (Figure 2) does yield some patterns for this trait. Consistent with a convex relationship, observed means are often less than expected for the line-cross derivatives at the ends of the range of inbreeding coefficients (P_{1}, P_{2}, and F_{1}), but greater than expected for derivates with intermediate inbreeding coefficients (Figure 3, right column).

Results from the full collection of line-cross families can be synthesized into a single test by noting that each family is independent. Each one is based on measurements of distinct plants derived from distinct lines, randomly extracted from the ancestral population. As a consequence, I can simply sum the chi-square values and degrees of freedom across families. This provides an overall test for each trait and eliminates any “multiple-tests problem” associated with attempting to draw conclusions from the collection of *P*-values from individual line-cross families. The cumulative chi-square values and degrees of freedom are given in Table 4. In the full data set, all traits exhibit highly significant deviations from the nonepistatic model.

The complete collection of estimated values (across line-cross families) allows a range of comparisons. For example, the correlations of F_{1} means for different traits are given in Table 5. Measures of flower size (corolla width, corolla length, anther length, and pistil length) exhibited strong, positive correlations. The number of pollen grains and the estimated number of viable grains showed lower, but still significantly positive correlations with flower size. There was a significant positive relationship between number of pollen grains and PSI (see also Willis 1999b). Stigma-anther separation was negatively correlated with PSI. Development time was not significantly correlated with any other measure, at least not for F_{1} mean values.

Given the variability among line-cross families, it is worthwhile to consider the factors most closely related to the magnitude of deviations from the nonepistatic model. One might expect the chi-square value to increase with the amount of phenotypic differentiation between parental lines (given the greater evidence for epistasis in inter- *vs.* intraspecific crosses, see Introduction). However, this expectation is not born out in these data: There was no relationship between chi square and the amount of difference between P_{1} and P_{2} for any character.

The majority of estimates for *D*, the dominance deviation, were negative for days to flower, but positive for other traits. These observations are consistent with “heterosis,” the tendency for crossing of lines to increase fitness or yield. Moll *et al.* (1965) noted that the magnitude of heterosis in corn appears to be greatest in crosses between lines that are divergent to an intermediate degree (Lynch and Walsh 1998, pp. 225–226). I plotted estimates of *D vs.* the absolute *phenotypic* difference between P_{1} and P_{2} for each character. As with chi-square values, the relationships between line differentiation and *D*-estimates were weak and generally nonsignificant.

#### Scale transformation:

The logarithm is only one of many nonlinear but monotonic functions that can be applied to measurements prior to analysis. The simplest form of epistasis is scalar. Loci affecting a trait contribute additively to a genetic value, *g*. The phenotype, *z*, is equal to *F*(*g* + *e*), where *e* is the environmental effect and *F*(*) is a simple nonlinear, but generally monotonic, function. Scalar epistasis can be “removed” by applying a transformation to the measurements: *z*′ = *F*^{−1}(*z*) = *g* + *e*. The standard model accurately describes inheritance of the transformed variable, *z*′.

It is thus useful to determine if the significant deviations from the additive-dominance model (documented in Table 6) can be eliminated by applying scale transformations. To evaluate this possibility, I applied the power series transformations over the four orders of magnitude spanning the original scale of measurement (Box and Cox 1964; Sokal and Rohlf 2000, p. 419). Letting *Z* denote the original values, the following series of transformations were applied to each trait: 1/*Z*, 1/, Ln(*Z*), , *Z*^{2}, and *Z*^{3}. While scale transformation did reduce cumulative chi-square values of some traits, all replicated joint scaling tests remained highly significant.

The same set of scale transformations can also be applied to measurements prior to the combined population analysis. For most characters this is unwarranted given the approximately linear response to inbreeding on the original scale of measurement. The reciprocal (1/*Z*) and 1/ transformations do reduce the magnitude of nonlinearity in PSI and viable pollen, but deviations remain significant. Moreover, since scale of measurement is not arbitrary for fitness components (or at least not for their evolutionary interpretation), an explicit nonlinear regression is more informative (Figure 2B, see discussion).

## DISCUSSION

Epistasis seems inevitable when one considers the interconnected nature of biochemical pathways and gene networks. Yet evidence of epistasis in quantitative character variation is surprisingly weak, at least within contiguous natural populations (demes). This study supports the view that our failure to see epistasis is not due to its absence. Statistical power is a central issue in practically all studies of quantitative variation. Attempts to distinguish the contributions of additive effects, dominance deviations, and gene interactions to trait variation are confounded by the large statistical error associated with variance estimates. This analysis of *M. guttatus*, based on comparisons of phenotypic means, provides clear evidence of epistasis in development time, floral morphology, and male fitness components.

The increasing use of genomic techniques, *e.g.*, QTL mapping, is frequently cited as the key to elucidating the contribution of epistasis to quantitative trait variation (*e.g.*, Erickson *et al.* 2004). The measured genotype approach, in which phenotype is determined for collections of individuals sharing the same molecular genotypes, has yielded both positive and negative results regarding the importance of epistasis (positive results: Long *et al.* 1995; Clark and Wang 1997; Elena and Lenski 1997; Routman and Cheverud 1997; Lukens and Doebley 1999; Templeton 2000) (negative results: Tanksley 1993 and references therein). In some studies (*e.g.*, Weber *et al.* 1999), results have been ambiguous despite extensive replication. The ability to isolate particular genomic regions and directly associate phenotype with a limited collection of genotypes is certainly a favorable feature of measured genotype studies. However, their power to detect epistasis is largely a consequence of the fact that these studies are simple comparisons of mean values, those associated with different multilocus genotypes.

While based on phenotypic values, the line-cross methodology provides a surprisingly simple and direct test for epistasis (Mather and Jinks 1982). The method capitalizes on the essential simplification of the additive/dominance model, the direct extrapolation from individual loci to the whole genotype. For a single locus with two alleles, there are three distinct genotypes and thus three mean values to be estimated. Given estimates for these genotypic values, one can predict the mean phenotype of all descendants from a cross between the alternative homozygotes. Under the additive/dominance model, the contributions of different loci sum and single-locus effects conveniently distill into aggregate parameters (Table 3, *M*, *A*, and *D*). These are sufficient to predict the means of all line-cross derivatives. If there is epistasis, single-locus effects do not combine in a simple fashion and the aggregate parameters will not accurately predict the means of all line-cross derivatives. This is the basis of the test.

When epistasis is detected in line-cross studies, the general practice is to parse deviations from the nonepistatic model into components, *e.g.*, the additive-by-additive and additive-by-dominance components (Mather and Jinks 1982; Clark and Wang 1997; Routman and Cheverud 1997). I have not done this here. The main practical difficulty is that many of the line-cross families are incomplete, with one or more line-cross derivatives absent. Because the type and complexity of model that can be fit to the data depend on the number and type of derivatives that are present, different sorts of model would have to be fit to different line-cross families. This hinders interpretation because the same genetic effect would likely be absorbed into different terms (*e.g.*, additive-by-additive instead of additive-by-dominance) in different families.

The larger conceptual difficulty is that, beyond documenting the existence of epistasis, interaction variance statistics have limited inferential value. They are useful in predicting how a population will respond to inbreeding or genetic drift (Kempthorne 1957; Goodnight 1988), but essentially uninformative about the nature of gene interactions (Mackay 2001). Phillips *et al.* (2000, p. 26) go as far at to state that the “epistatic variance is so far removed from the actual interactions as to be basically useless for determining the impact of gene interactions on evolutionary processes.” There does not seem to be any simple, general relationship between interaction variance statistics and response to selection. In contrast, the standard additive genetic variance is useful for predicting response to selection even when there is epistasis (Turelli and Barton 1994, p. 933; see below).

#### The nature of gene interactions:

The standard nonepistatic model of quantitative genetics is often considered a linear approximation of the complicated mapping from genotype to phenotype. It is thus natural to attempt to characterize gene interactions with an explicit nonlinear function that predicts phenotype from one or more genetic metrics. These simple kinds of gene interaction, denoted “directional epistasis” by Phillips *et al*. (2000), have been considered in a variety of evolutionary models. Epistatic interactions among deleterious mutations have been characterized using a function with fitness as the dependent variable and the total number of deleterious mutations (summed across loci) as the independent variable. Different treatments have considered a quadratic function (Kimura and Maruyama 1966), an exponential quadratic function (Charlesworth 1990; Charlesworth *et al.* 1991), or a step function (Kondrashov 1985, 1988). Rice (1998, 2000) models the phenotype as a nonlinear function of multiple “underlying factors.” These factors may then be treated as genetic and environmental effects in the standard quantitative genetic model. Turelli and Orr (1995) model hybrid breakdown by allowing loci to contribute additively to a “breakdown scale” and assume that fitness is a monotonically decreasing function of this breakdown value (see also Turelli and Orr 2000). Finally, Hansen and Wagner (2001) analyze a model in which the effects of allelic differences at a particular locus are a linear function of the genetic background.

The utility of these approaches, in which the interaction of many loci is distilled into a simple mathematical function, has become a subject of debate. These models will probably not prove to be mechanistically accurate in most cases. However, they may nonetheless capture essential features of the genotype-to-phenotype mapping, allowing theorists to consider otherwise intractable problems and providing experimentalists with meaningful parameters to estimate. Countering this view, Phillips *et al.* (2000) argue that the multidimensional nature of genotypic space cannot be justifiably reduced to simple one- or two-dimensional representations.

The main purpose of the combined population analysis (Table 2, Figure 2) is to identify simple directional forms of gene interaction, at least types that generate a nonlinear relationship between average trait values and *f*, the inbreeding coefficient of plants. If loci contribute additively to trait variation, the mean phenotype should change linearly with *f* (Wright 1951; Kempthorne 1957). If loci contribute multiplicatively, the relationship should be log-linear. Floral morphology and pollen number vary in an approximately linear way with *f*, while PSI and viable pollen exhibited significant deviations from linearity. The relationship between log-transformed PSI and *f* is more accurately described by a quadratic function (Figure 2B, dashed line) than by a line. This suggests that pollen viability corresponds more closely to the exponential quadratic model than to either the additive or the multiplicative model.

The response to inbreeding test for epistasis has been applied to a substantial number of species, both natural and agricultural (Lynch and Walsh 1998, Chap. 10). Both linear (*e.g.*, Hallauer and Sears 1973) and nonlinear (*e.g.*, Carr and Dudash 1997) responses have been documented, but much of the evidence is equivocal. Most experiments fail to include appropriate controls or suffer statistical flaws (Lynch 1988; Lynch and Walsh 1998, p. 272). Oftentimes, studies have been conducted over multiple generations with different levels of inbreeding assessed in different generations, which makes it impossible to distinguish inbreeding effects from environmental effects (*e.g.*, differences in growth conditions in different generations). Statistical power is also limited when the maximum inbreeding coefficient is low (<0.25). A linear model will likely provide a good fit within a limited range of values for the inbreeding coefficient, regardless of the level of epistasis.

Perhaps the most serious difficulty in interpreting previous results concerns the potential effects of genetic purging. The linear prediction is based on the assumption that experimental inbreeding changes genotype frequencies without changing allele frequencies. Random fluctuations in allele frequencies can be accounted for in a statistical analysis, but deterministic changes are assumed to be absent. Unfortunately, some allele frequency changes are inevitable during line formation. Lines that become homozygous for mutations that are lethal or cause sterility in homozygous form will go extinct and eliminate these alleles from the experimental population. Selection within lines will have similar effects even in lineages that survive. As a consequence, the frequencies of deleterious mutations will differ between inbred and outbred individuals. Depending on circumstances, genetic purging of this kind can either produce artificial nonlinearities or obscure genuine epistasis.

This experiment was designed to avoid this list of difficulties. An advantage of using fully inbred lines as parents in a biometric breeding design is that they are already fully purged. Subsequent inbreeding or outcrossing should result in predictable changes in homozygosity without changes in allele frequency (apart from the random changes due to finite sample sizes). This experiment also grew plants spanning the full range of inbreeding coefficients simultaneously. The multigenerational crossing scheme eliminated any association between seed age and inbreeding coefficient. Reciprocal crosses between plants inbred to differing degrees disentangle individual and maternal inbreeding effects.

Lynch (1988) also noted that many statistical analyses of response to inbreeding have failed to account for the genetic relatedness of individuals within an inbreeding lineage. This can elevate the type I error rate, increasing the likelihood of “significant” nonlinearity even when there is no epistasis. The family structure of the present experiment is fully accounted in *Replicated joint scaling test*, but not in *Combined population analysis*. However, the nonlinearity apparent for log-transformed male fitness measures (Table 2, Figure 2B) is not a statistical artifact. Seventy-three line-cross families produced at least one plant that was fully outbred (*f* = 0), inbred to an intermediate level (*f* = 0.5), and fully inbred (*f* = 1). We can distill each of these families into a pair of values: Δ_{1} is the difference in mean trait value between plants with *f* = 0 and *f* = 0.5 and Δ_{2} is the difference between mean values for plants with *f* = 0.5 and *f* = 1. If the true relationship between trait values and inbreeding coefficient is linear, we expect that Δ_{1} will be >Δ_{2} in about half of the families. With synergistic epistasis, Δ_{2} should be >Δ_{1} in most line-cross families. For Ln[PSI] in this study, 52 of 73 cross families exhibited a greater decline at higher inbreeding coefficients (Δ_{2} > Δ_{1}). For Ln[viable pollen], Δ_{2} > Δ_{1} in 48 of 73 families. Both trends are statistically significant (Wilcoxon sign rank test, *P* < 0.01).

#### Synergism in monkeyflowers:

The response of *M. guttatus* to inbreeding has been investigated extensively (Willis 1993a,b, 1996, 1999a,b; Carr and Dudash 1995, 1996, 1997; Carr *et al.* 1997; Dudash and Carr 1998; Kelly and Willis 2001; Carr and Eubanks 2002; Carr *et al.* 2003). Willis (1993a) and Carr and Dudash (1997) measured fitness-related characters of plants with a range of inbreeding coefficients and their results are thus comparable to those in Table 2. Carr and Dudash (1997) generated plants of six different inbreeding levels (*f* = 0, 0.5, 0.75, 0.875, 0.938, and 0.969) from two California populations, “S” and “T.” They estimated the relationship between inbreeding coefficient and mean trait values to be approximately linear for female fitness measures in both populations. However, male fitness in population T exhibited an accelerating decline with *f*, a result consistent with synergistic epistasis. Their measure of male fitness is most directly comparable to viable pollen of this study.

Willis (1993a) synthesized plants with four levels of inbreeding (*f* = 0, 0.25, 0.5, and 0.75) from two Oregon populations, “Cone Peak” and the Iron Mountain population that I consider here. Pollen viability exhibited synergistic epistasis for the Cone Peak population, but the relationship with *f* was not significantly nonlinear for the Iron Mountain plants. In contrast, the present study of Iron Mountain provides compelling evidence for synergism in log-transformed values of both pollen viability (as estimated by PSI) and total viable pollen (Figure 2B). The difference may simply reflect statistical power. Willis (1993a) did note a nonsignificant trend toward synergism in his study (see Figure 1f of that article) and the present study of Iron Mountain considers a larger range of inbreeding coefficients with ∼10 times as many plants.

Perhaps the simplest explicit genetic model that yields synergism is complementary gene action (Brookfield 1997). Consider two loci involved in pollen production. Each locus has recessive mutations that reduce pollen viability. If the two loci are complementary (functional duplicates in the extreme case), pollen viability is significantly reduced only in individuals that are homozygous for mutant alleles at both loci. Loss or reduction of function at only one locus is effectively compensated by full function of the complementary locus. This mode of action has been demonstrated for some chlorophyll deficiency mutations in *M. guttatus* (Willis 1992). More generally, the high levels of genetic redundancy apparent in the Arabidopsis genome suggest that complementary gene action may not be uncommon.

#### Variable epistatic effects:

Flower traits, development time, and pollen number did not exhibit nonlinear responses to inbreeding. However, all yielded highly significant evidence of epistasis in the replicated joint scaling tests (Figure 3, Table 4). Moreover, the results from each character remained significant even after a full series of scale transformations (Table 6), which indicates gene interactions are not scalar in nature. This difference in outcomes between the two analyses suggests that epistatic interactions may be variable across the loci influencing these traits (see, for example, de Visser *et al.* 1997). Certain pairs of loci may interact strongly and others may act more or less independently. However, it would be premature to conclude that these data are inconsistent with more complicated models of directional epistasis (*e.g.*, Rice 1998, 2000; Hansen and Wagner 2001)only on the basis of Table 6.

Scale transformations could often eliminate significant deviations between observed and expected values within individual line-cross families. For example, cubing values for corolla width reduces chi square for line-cross family 6 from 15.5 to 9.7, which is not significant given 5 d.f. (see Figure 3, top left). Unfortunately, the same transformation exaggerates deviations in other families such that the cumulative chi square actually increases. This result is reminiscent of Powers' studies of tomato fruit characters, in which transformations useful in some crosses (yielding approximate additivity) were detrimental in others (Powers 1950, 1951).

Comparisons among line-cross families are also informative about the nature of gene interactions. Noting the greater evidence of epistasis in interspecies crosses, one might expect that line-cross families in which P_{1} and P_{2} are very different would yield higher chi-square values than line-cross families constituted from phenotypically similar lines. This prediction may also follow from the idea that the nonepistatic model is a “linear approximation” of the nonlinear mapping from genotype to phenotype. Linear approximations are generally taken around a specific point on the phenotypic axis and are most accurate close to that point. It is thus noteworthy that, in contrast to these expectations, I failed to find any relationship between the level of phenotypic differentiation of parental lines and the strength of evidence for epistasis.

#### Maternal effects:

In both plants and animals, an organism's phenotype is often substantially influenced by the characteristics of its mother. Seed provisioning, in terms of both nutrients and genetic instruction (mRNAs), is a likely avenue for maternal effects in plants. Since inbreeding clearly influences individual traits, it is reasonable to expect that it will also affect the quality of maternal effects. In fact, this has been clearly demonstrated in both animals (White 1972) and plants (Melchinger *et al.* 1986; Vogler *et al.* 1999).

Surprisingly, I found little evidence of maternal inbreeding effects in this study, despite both large sample sizes and strong comparisons (Table 2). The latter refers to line-cross derivatives synthesized by reciprocal crosses between parents with substantially different inbreeding coefficients (BC_{1}, BC_{2}, and BC_{X}). Which parent served as mother seemed to have minimal effects on most traits. Only days to flower exhibited a significant effect. All else being equal, plants with fully inbred mothers produced their first flower an average of 0.25 days later than plants with outbred mothers.

Maternal effects may be significant but unrelated to level of inbreeding or at least only weakly related. For this reason, the plants derived from a single cross (a full-sib family) may exhibit a consistent deviation from the expected value of their line-cross derivative even without epistasis. These deviations average out if each line-cross derivative is synthesized from a substantial number of distinct crosses (*e.g.*, Jinks and Perkins 1969). However, with a limited number of crosses per derivative, maternal effects must be accounted statistically to distinguish them from epistasis. The sequel to this article develops the theory necessary for a more rigorous treatment of the data, on the basis of not only the means of line-cross derivatives but also patterns of variability within and between derivatives (J. K. Kelly, unpublished results). A maximum-likelihood-based application of this theory to the present data set confirms that both maternal effects and epistasis contribute to deviations between observed and expected line-cross means. However, epistasis remains the primary cause.

#### Evolutionary implications:

The purpose of this study was to determine if gene interactions influence characters of *M. guttatus*, and if they do, to investigate the nature of these interactions. Regarding the first part, there does seem to be significant epistasis affecting development time, floral morphology, and male fitness components. Male fitness involves synergism, a tendency for alleles at different loci that reduce pollen viability to have greater effects when combined than in isolation (Figure 2B). Synergism among deleterious mutations is a central tenet of the mutational deterministic hypothesis for the evolutionary maintenance of sexual reproduction (Charlesworth 1990; Kondrashov 1993). Synergistic interactions can substantially reduce the mutational load in a sexual population, but not in an asexual population (Kimura and Maruyama 1966). This, combined with a sufficiently high rate of deleterious mutation, can allow sexual genotypes to outcompete asexual genotypes in the face of the twofold cost of sex.

Despite the results of Figure 2B, support for the mutational deterministic model from *M. guttatus* remains equivocal. First, the quantitative relationship between viable pollen per flower (as measured here) and evolutionary fitness in the wild has yet to be determined empirically. Synergism between mutations affecting the latter is necessary for the model. Second, estimates for the genomic deleterious mutation rate in *M. guttatus* fall largely within the range 0.1–1.0 (Willis 1999a; Kelly 2003). These estimates are somewhat lower than what is thought necessary to favor sex and, for a variety of reasons, are more likely to be overestimates than underestimates.

Flower morphology and development time yielded negative results in the combined population analysis, but positive evidence of epistasis in the replicated joint scaling test. Such a difference is expected if epistatic effects are variable across genetic backgrounds or sets of interacting loci. Similar results have been obtained in measured genotype studies (de Visser *et al.* 1997; Elena and Lenski 1997). These findings are intriguing in light of theoretical studies exploring the implications of variable interactions among loci (Phillips et al 2000). To consider an example noted previously, consistent synergistic epistasis among deleterious mutations can favor increased rates of recombination (Charlesworth 1990; Barton 1995). If interactions are variable, however, modifiers that reduce the rate of recombination may actually be favored (Otto and Feldman 1997).

An important question is how the existence of epistasis affects the application of quantitative genetics methods. Should we disregard analyses or conclusions based on the additive/dominance model if there is epistasis? Perhaps the most fundamental of such applications is the “breeder's equation” for predicting response to selection (Lush 1937). Under this model, the change in mean phenotype, , equals the product of the narrow sense heritability () and the selection differential (*S*). When selection is imposed on multiple traits, the predicted response is proportional to the additive genetic variance/covariance matrix (Lande and Arnold 1983; Falconer and Mackay 1996). In either case, the genetic quantities can be estimated from the resemblance of relatives, independent of the observed response to selection.

The breeder's equation, and its multitrait generalization, is one of the most important predictive models in evolutionary biology (Grant and Grant 1995) and it is worth considering whether it fails when there is epistasis. This can be directly evaluated for corolla width of *M. guttatus*, one of the traits shown to exhibit epistasis here (Table 4). We have estimated the heritability of this character for the Iron Mountain population (Kelly and Arathi 2003) and, in other experiments, have imposed selection on corolla diameter (*e.g.*, Kelly and Willis 2001). In the largest of these studies, bidirectional truncation selection was imposed for seven generations. The observed divergence between the means of high- and low-selected populations (5.2 phenotypic standard deviations) differs from the prediction of the breeder's equation by <6% (J. K. Kelly, unpublished results). Given the myriad of factors other than epistasis that can generate deviations, one has to conclude that the model performs quite well.

These results from *M. guttatus* are complementary to those from previous studies. Clayton *et al.* (1957) demonstrated the accuracy of the breeder's equation in predicting short-term evolution of bristle number in *Drosophila melanogaster*. While this trait was often cited as an example of additive inheritance, mapping studies have recently demonstrated that variation is substantially influenced by interactions among QTL (Long *et al.* 1995, 1996; Mackay 2001). Two complementary conclusions emerge from these selection experiments. First, short-term responses to selection that are consistent with the additive/dominance model (breeder's equation) do not imply the absence of epistasis in the selected traits. Second, the existence of epistasis does not necessarily invalidate conclusions derived from simpler, nonepistatic models.

It seems likely that the additive genetic variance is itself a function of genetic interactions among QTL. In fact, theorists have begun to explore the interaction of epistasis, mutation, and natural selection in determining the genetic architecture of quantitative characters (*e.g.*, Hermisson *et al.* 2003). Experimentalists will continue to identify and describe particular gene interactions, at least in model species (Templeton 2000). At present, there is a substantial gap between theory and experiment. I suggest that this is due, at least in part, to the absence of meaningful statistics, measurable quantities with clear relevance to evolutionary questions. Identifying such statistics, if they do exist, should facilitate the synthesis of theory and experiment.

## Acknowledgments

I thank T. Marriage, J. Gleason, L. Holeski, M. Orive, D. Houle, and two anonymous reviewers for comments on this manuscript. Kim Milne, L. Holeski, B. Harris, R. How, J. Schaller, D. Deole, L. Richman, H. Nelson, and S. Jablonski each helped to complete the enormous volume of crossing and plant measurements.

## Footnotes

Communicating editor: D. Houle

- Received February 3, 2005.
- Accepted May 1, 2005.

- Copyright © 2005 by the Genetics Society of America