In engineering, science, and statistics, replication is the process of repeating a study or experiment under the same or similar conditions to support the original claim, which crucial to confirm the accuracy of results as well as for identifying and correcting the flaws in the original experiment.^[1] ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."

For a full factorial design, replicates are multiple experimental runs with the same factor levels. You can replicate combinations of factor levels, groups of factor level combinations, or even entire designs. For instance, consider a scenario with three factors, each having two levels, and an experiment that tests every possible combination of these levels (a full factorial design). One complete replication of this design would comprise 8 runs (2^3). The design can be executed once or with several replicates.^[2]

There are two main types of replication in statistics. First, there is a type called “exact replication” (also called "direct replication"), which involves repeating the study as closely as possible to the original to see whether the original results can be precisely reproduced.^[3] For instance, repeating a study on the effect of a specific diet on weight loss using the same diet plan and measurement methods. The second type of replication is called “conceptual replication.” This involves testing the same theory as the original study but with different conditions.^[3] For example, Testing the same diet's effect on blood sugar levels instead of weight loss, using different measurement methods.

Both exact (direct) replications and conceptual replications are important. Direct replications help confirm the accuracy of the findings within the conditions that were initially tested. On the hand conceptual replications examine the validity of the theory behind those findings and explore different conditions under which those findings remain true. In essence conceptual replication provides insights, into how generalizable the findings are.^[4]

The difference between replicates and repeats

Replication is not the same as repeated measurements of the same item. Both repeat and replicate measurements involve multiple observations taken at the same levels of experimental factors. However, repeat measurements are collected during a single experimental session, while replicate measurements are gathered across different experimental sessions.^[2] Replication in statistics evaluates the consistency of experiment results across different trials to ensure external validity, while repetition measures precision and internal consistency within the same or similar experiments.^[5]

Replicates Example: Testing a new drug's effect on blood pressure in separate groups on different days.

Repeats Example: Measuring blood pressure multiple times in one group during a single session.

Statistical methods in replication

In replication studies within the field of statistics, several key methods and concepts are employed to assess the reliability of research findings. Here are some of the main statistical methods and concepts used in replication:

P-Values: The p-value is a measure of the probability that the observed data would occur by chance if the null hypothesis were true. In replication studies p-values help us determine whether the findings can be consistently replicated. A low p-value in a replication study indicates that the results are not likely due to random chance.^[6] For example, if a study found a statistically significant effect of a test condition on an outcome, and the replication find statistically significant effects as well, this suggests that the original finding is likely reproducible.

Confidence Intervals: Confidence intervals provide a range of values within which the true effect size is likely to fall. In replication studies, comparing the confidence intervals of the original study and the replication can indicate whether the results are consistent.^[6] For example, if the original study reports a treatment effect with a 95% confidence interval of [5, 10], and the replication study finds a similar effect with a confidence interval of [6, 11], this overlap indicates consistent findings across both studies.

Example

As an example, consider a continuous process which produces items. Batches of items are then processed or treated. Finally, tests or measurements are conducted. Several options might be available to obtain ten test values. Some possibilities are:

One finished and treated item might be measured repeatedly to obtain ten test results. Only one item was measured so there is no replication. The repeated measurements help identify observational error.
Ten finished and treated items might be taken from a batch and each measured once. This is not full replication because the ten samples are not random and not representative of the continuous nor batch processing.
Five items are taken from the continuous process based on sound statistical sampling. These are processed in a batch and tested twice each. This includes replication of initial samples but does not allow for batch-to-batch variation in processing. The repeated tests on each provide some measure and control of testing error.
Five items are taken from the continuous process based on sound statistical sampling. These are processed in five different batches and tested twice each. This plan includes proper replication of initial samples and also includes batch-to-batch variation. The repeated tests on each provide some measure and control of testing error.
For proper sampling, a process or batch of products should be in reasonable statistical control; inherent random variation is present but variation due to assignable (special) causes is not. Evaluation or testing of a single item does not allow for item-to-item variation and may not represent the batch or process. Replication is needed to account for this variation among items and treatments.

Each option would call for different data analysis methods and yield different conclusions.

References

Bibliography

ASTM E122-07 Standard Practice for Calculating Sample Size to Estimate, With Specified Precision, the Average for a Characteristic of a Lot or Process
"Engineering Statistics Handbook", NIST/SEMATEK
Pyzdek, T, "Quality Engineering Handbook", 2003, ISBN 0-8247-4614-7.
Godfrey, A. B., "Juran's Quality Handbook", 1999, ISBN 9780070340039.

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

v t e Design of experiments
Scientific method	Scientific experiment Statistical design Control Internal and external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size
Treatment and blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable
Models and inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison
Designs Completely randomized	Factorial Fractional factorial Plackett–Burman Taguchi Response surface methodology Polynomial and rational modeling Box–Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test
Glossary Category Mathematics portal Statistical outline Statistical topics

The difference between replicates and repeats

Statistical methods in replication

Example

See also

References

Bibliography