The aim of experimental design is to efficiently obtain sufficient data—with the least effort and cost—from which scientifically and statistically valid conclusions can be drawn. Understanding the experimental goals and processes as well as the amount and sources of experimental variability are all important to designing successful siRNA experiments. This article will describe different sources of variation and how to determine the number of replicates that are ideal for your experiments.
Sources of Variation
All experimental data have variability that comes from several sources. Understanding these sources can lead to improved experimental design and results.
Biological Variation. Biological variation depends on the characteristic of the population being studied. For example, measuring the height of a random group of people will have a larger variability than a study limited to people of one age or sex. Also, for human gene expression, the coefficient of variation ranges from 20 to 100%.
Process Variation. Process variation refers to variability in the data that is exhibited when the same sample is run independently multiple times. Process variation results from the following:
Random (or Common-Cause) Variation
These include unpredictable and natural variations that may affect some, but not all, samples (e.g., a pipetting error). Efforts should be made to identify and reduce them, but they can never be completely eliminated. Taking the most accurate measurements possible and carefully following the experimental protocol or standard operating procedure (SOP) are also part of controlling random variations.
Systemic (or Special) Variation
Systemic variation affects the experimental process, so samples may be biased. Examples of systemic variation are equipment that is out of calibration and causes a bias, an unexpected temperature change during the experiment, or a delay from the normal timing that results in a change to the experimental procedure and affects the samples, process, and outcome. Another example is a cell culture plate with a large row effect [1] that is sufficient to mask or distort biological effects. The scientist may or may not be aware of process variations.
System Variation. System variation comes from the instrument used to take measurements. The variability of the measurement system contributes to the process variability and can be a common cause or a special cause.
A standard ruler is an example of a measurement system. Accuracy is usually taken to be half of the smallest division mark (e.g., ±0.5 mm, if the ruler has 1 mm marks). This is based on the assumption that estimating halfway between any two marks is relatively easy, while smaller fractions are not as accurately estimated by eye. Additional implicit assumptions are that the person taking the masurement has good eyesight and that the ruler markings are accurate. If the ruler manufacturer mismarked the ruler, the ruler would have a bias.
Experimental Variation. Experimental variation is the total variation seen in an experiment and comes from both the process and biological population variability.
Biological Variation. Biological variation depends on the characteristic of the population being studied. For example, measuring the height of a random group of people will have a larger variability than a study limited to people of one age or sex. Also, for human gene expression, the coefficient of variation ranges from 20 to 100%.
Process Variation. Process variation refers to variability in the data that is exhibited when the same sample is run independently multiple times. Process variation results from the following:
Random (or Common-Cause) Variation
These include unpredictable and natural variations that may affect some, but not all, samples (e.g., a pipetting error). Efforts should be made to identify and reduce them, but they can never be completely eliminated. Taking the most accurate measurements possible and carefully following the experimental protocol or standard operating procedure (SOP) are also part of controlling random variations.
Systemic (or Special) Variation
Systemic variation affects the experimental process, so samples may be biased. Examples of systemic variation are equipment that is out of calibration and causes a bias, an unexpected temperature change during the experiment, or a delay from the normal timing that results in a change to the experimental procedure and affects the samples, process, and outcome. Another example is a cell culture plate with a large row effect [1] that is sufficient to mask or distort biological effects. The scientist may or may not be aware of process variations.
System Variation. System variation comes from the instrument used to take measurements. The variability of the measurement system contributes to the process variability and can be a common cause or a special cause.
A standard ruler is an example of a measurement system. Accuracy is usually taken to be half of the smallest division mark (e.g., ±0.5 mm, if the ruler has 1 mm marks). This is based on the assumption that estimating halfway between any two marks is relatively easy, while smaller fractions are not as accurately estimated by eye. Additional implicit assumptions are that the person taking the masurement has good eyesight and that the ruler markings are accurate. If the ruler manufacturer mismarked the ruler, the ruler would have a bias.
Experimental Variation. Experimental variation is the total variation seen in an experiment and comes from both the process and biological population variability.
Variability in siRNA Experiments
In TechNotes 14.3, we explained that both accuracy and precision are components of experimental variability but are not related to each other [2]. For siRNA experiments, scientists usually run replicates to help determine both accuracy and precision. For definitions, see in sidebar
Accuracy versus Precision. There are two types of replicates:
Biological replicates are separate biological samples that have undergone the same treatment. Examples include animals, tissues, parts of an organ, or wells of a cell culture plate that were treated using the same protocol.
Technical replicates are multiple aliquots from the same source run through the process independently.
Technical replicates address the variability of the process. Biological replicates address the variability of the population but are also subject to the variability of the process.
Biological replicates are separate biological samples that have undergone the same treatment. Examples include animals, tissues, parts of an organ, or wells of a cell culture plate that were treated using the same protocol.
Technical replicates are multiple aliquots from the same source run through the process independently.
Technical replicates address the variability of the process. Biological replicates address the variability of the population but are also subject to the variability of the process.
How Many Replicates?
The goal of the experiment is the first determinant for how many and what type of replicates are needed. Questions that can be asked include:
The reason the data are being collected determines the quality of the data required. Additional technical replicates will provide information about the process variability but no further information about the population variability. If the sample population is unknown or has a higher variability, more biological replicates are needed. Increasing the technical replicates might be important for a more variable process or less experienced technician. Multi-stage processes might require different types of technical replicates.
An example would be an siRNA procedure that involves transfection of cell cultures in a 96-well plate, cell lysis and RNA isolation, cDNA synthesis, and finally RT-PCR assays to measure silencing of a specific gene. The biological replicates come from the transfection of the same siRNA in different wells of the 96-well plate. Technical replicates could be created at the RNA isolation stage, cDNA generation stage, or the RT-PCR procedure. The earlier in the process a technical replicate is added, the larger the increase in the workload, effect on sample throughput, and cost of the experiment. One way to limit the number of replicates is to add technical replicates only at the more variable steps of the process. For most biological assays, the largest variability comes from the sample population, and the number of biological replicates is usually the largest and most important factor.
Figure 1. Four Representative Populations. Populations A and B have a small system and population variability as well as a large fold difference between the means of the samples. Compared to populations A and B, populations C and D both have a larger dispersion of the data around the mean and a small fold difference between the means.
The greater the fold changes between the means, the fewer biological replicates are needed. The more dispersed the population variability, the more biological replicates are needed. Figure 2 gives an estimate of the sample numbers needed, depending on the experimental precision and expected biological difference. Here are two theoretical examples:
For the populations described in Figure 1, approximately three replicates are sufficient to examine Populations A and B, while 7 to 18 replicates would be needed to examine Populations C and D.
When two siRNAs targeting the same mRNA result in gene expression knockdown that differs from the baseline by 10-fold and the experimental variation is low (e.g., 25%), only 3 biological replicates are needed to obtain reliable numbers to detect the change in expression. In contrast, if gene expression knockdown differs from baseline by only 1.5-fold and the experimental variation is high (e.g., 75%), 38 biological replicates are needed to detect changes in expression.
Figure 2. Estimating the Number of Biological Replicates to Use. Fold difference is the difference between the means of two populations you wish to distinguish, and experimental variation is the standard deviation/mean X 100 (%CV). These numbers are based on the one-tailed t-test, which associates all the area for α in the positive direction. CV = coefficient of variation.
With no experimental precision or population variation history, a minimum of three biological replicates, plus two or three technical replicates should be performed. As the population variability and the precision of the experimental process becomes clearer, the number of technical and biological replicates may be adjusted to achieve the quality of data needed.
Understanding the purpose of the experiment and the capability of the procedure provides the most accurate results at the lowest cost. It enables the scientist to best determine how to balance the number of biological or technical replicates against the cost of adding either type of replicates. It also increases the confidence in the accuracy of the data and defines what factors affect assay precision.
Scientific Contributors
Ann Hartman • Applied Biosystems, Austin, TX
John Pfeifer • Applied Biosystems, Houston, TX
- Is it a research assay, screening assay, or release assay?
- Will the procedure be run many times or infrequently?
- How much validation was done?
The reason the data are being collected determines the quality of the data required. Additional technical replicates will provide information about the process variability but no further information about the population variability. If the sample population is unknown or has a higher variability, more biological replicates are needed. Increasing the technical replicates might be important for a more variable process or less experienced technician. Multi-stage processes might require different types of technical replicates.
An example would be an siRNA procedure that involves transfection of cell cultures in a 96-well plate, cell lysis and RNA isolation, cDNA synthesis, and finally RT-PCR assays to measure silencing of a specific gene. The biological replicates come from the transfection of the same siRNA in different wells of the 96-well plate. Technical replicates could be created at the RNA isolation stage, cDNA generation stage, or the RT-PCR procedure. The earlier in the process a technical replicate is added, the larger the increase in the workload, effect on sample throughput, and cost of the experiment. One way to limit the number of replicates is to add technical replicates only at the more variable steps of the process. For most biological assays, the largest variability comes from the sample population, and the number of biological replicates is usually the largest and most important factor.
Figure 1. Four Representative Populations. Populations A and B have a small system and population variability as well as a large fold difference between the means of the samples. Compared to populations A and B, populations C and D both have a larger dispersion of the data around the mean and a small fold difference between the means.
The greater the fold changes between the means, the fewer biological replicates are needed. The more dispersed the population variability, the more biological replicates are needed. Figure 2 gives an estimate of the sample numbers needed, depending on the experimental precision and expected biological difference. Here are two theoretical examples:
For the populations described in Figure 1, approximately three replicates are sufficient to examine Populations A and B, while 7 to 18 replicates would be needed to examine Populations C and D.
When two siRNAs targeting the same mRNA result in gene expression knockdown that differs from the baseline by 10-fold and the experimental variation is low (e.g., 25%), only 3 biological replicates are needed to obtain reliable numbers to detect the change in expression. In contrast, if gene expression knockdown differs from baseline by only 1.5-fold and the experimental variation is high (e.g., 75%), 38 biological replicates are needed to detect changes in expression.
Figure 2. Estimating the Number of Biological Replicates to Use. Fold difference is the difference between the means of two populations you wish to distinguish, and experimental variation is the standard deviation/mean X 100 (%CV). These numbers are based on the one-tailed t-test, which associates all the area for α in the positive direction. CV = coefficient of variation.
With no experimental precision or population variation history, a minimum of three biological replicates, plus two or three technical replicates should be performed. As the population variability and the precision of the experimental process becomes clearer, the number of technical and biological replicates may be adjusted to achieve the quality of data needed.
Understanding the purpose of the experiment and the capability of the procedure provides the most accurate results at the lowest cost. It enables the scientist to best determine how to balance the number of biological or technical replicates against the cost of adding either type of replicates. It also increases the confidence in the accuracy of the data and defines what factors affect assay precision.
Scientific Contributors
Ann Hartman • Applied Biosystems, Austin, TX
John Pfeifer • Applied Biosystems, Houston, TX