The significance of non-significance. (1/2102)

We discuss the implications of empirical results that are statistically non-significant. Figures illustrate the interrelations among effect size, sample sizes and their dispersion, and the power of the experiment. All calculations (detailed in Appendix) are based on actual noncentral t-distributions, with no simplifying mathematical or statistical assumptions, and the contribution of each tail is determined separately. We emphasize the importance of reporting, wherever possible, the a priori power of a study so that the reader can see what the chances were of rejecting a null hypothesis that was false. As a practical alternative, we propose that non-significant inference be qualified by an estimate of the sample size that would be required in a subsequent experiment in order to attain an acceptable level of power under the assumption that the observed effect size in the sample is the same as the true effect size in the population; appropriate plots are provided for a power of 0.8. We also point out that successive outcomes of independent experiments each of which may not be statistically significant on its own, can be easily combined to give an overall p value that often turns out to be significant. And finally, in the event that the p value is high and the power sufficient, a non-significant result may stand and be published as such.  (+info)

A simulation study of confounding in generalized linear models for air pollution epidemiology. (2/2102)

Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter [less than and equal to] 10 microm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.  (+info)

Laboratory assay reproducibility of serum estrogens in umbilical cord blood samples. (3/2102)

We evaluated the reproducibility of laboratory assays for umbilical cord blood estrogen levels and its implications on sample size estimation. Specifically, we examined correlation between duplicate measurements of the same blood samples and estimated the relative contribution of variability due to study subject and assay batch to the overall variation in measured hormone levels. Cord blood was collected from a total of 25 female babies (15 Caucasian and 10 Chinese-American) from full-term deliveries at two study sites between March and December 1997. Two serum aliquots per blood sample were assayed, either at the same time or 4 months apart, for estrone, total estradiol, weakly bound estradiol, and sex hormone-binding globulin (SHBG). Correlation coefficients (Pearson's r) between duplicate measurements were calculated. We also estimated the components of variance for each hormone or protein associated with variation among subjects and variation between assay batches. Pearson's correlation coefficients were >0.90 for all of the compounds except for total estradiol when all of the subjects were included. The intraclass correlation coefficient, defined as a proportion of the total variance due to between-subject variation, for estrone, total estradiol, weakly bound estradiol, and SHBG were 92, 80, 85, and 97%, respectively. The magnitude of measurement error found in this study would increase the sample size required for detecting a difference between two populations for total estradiol and SHBG by 25 and 3%, respectively.  (+info)

A note on power approximations for the transmission/disequilibrium test. (4/2102)

The transmission/disequilibrium test (TDT) is a popular method for detection of the genetic basis of a disease. Investigators planning such studies require computation of sample size and power, allowing for a general genetic model. Here, a rigorous method is presented for obtaining the power approximations of the TDT for samples consisting of families with either a single affected child or affected sib pairs. Power calculations based on simulation show that these approximations are quite precise. By this method, it is also shown that a previously published power approximation of the TDT is erroneous.  (+info)

Comparison of linkage-disequilibrium methods for localization of genes influencing quantitative traits in humans. (5/2102)

Linkage disequilibrium has been used to help in the identification of genes predisposing to certain qualitative diseases. Although several linkage-disequilibrium tests have been developed for localization of genes influencing quantitative traits, these tests have not been thoroughly compared with one another. In this report we compare, under a variety of conditions, several different linkage-disequilibrium tests for identification of loci affecting quantitative traits. These tests use either single individuals or parent-child trios. When we compared tests with equal samples, we found that the truncated measured allele (TMA) test was the most powerful. The trait allele frequencies, the stringency of sample ascertainment, the number of marker alleles, and the linked genetic variance affected the power, but the presence of polygenes did not. When there were more than two trait alleles at a locus in the population, power to detect disequilibrium was greatly diminished. The presence of unlinked disequilibrium (D'*) increased the false-positive error rates of disequilibrium tests involving single individuals but did not affect the error rates of tests using family trios. The increase in error rates was affected by the stringency of selection, the trait allele frequency, and the linked genetic variance but not by polygenic factors. In an equilibrium population, the TMA test is most powerful, but, when adjusted for the presence of admixture, Allison test 3 becomes the most powerful whenever D'*>.15.  (+info)

Measurement of continuous ambulatory peritoneal dialysis prescription adherence using a novel approach. (6/2102)

OBJECTIVE: The purpose of the study was to test a novel approach to monitoring the adherence of continuous ambulatory peritoneal dialysis (CAPD) patients to their dialysis prescription. DESIGN: A descriptive observational study was done in which exchange behaviors were monitored over a 2-week period of time. SETTING: Patients were recruited from an outpatient dialysis center. PARTICIPANTS: A convenience sample of patients undergoing CAPD at Piedmont Dialysis Center in Winston-Salem, North Carolina was recruited for the study. Of 31 CAPD patients, 20 (64.5%) agreed to participate. MEASURES: Adherence of CAPD patients to their dialysis prescription was monitored using daily logs and an electronic monitoring device (the Medication Event Monitoring System, or MEMS; APREX, Menlo Park, California, U.S.A.). Patients recorded in their logs their exchange activities during the 2-week observation period. Concurrently, patients were instructed to deposit the pull tab from their dialysate bag into a MEMS bottle immediately after performing each exchange. The MEMS bottle was closed with a cap containing a computer chip that recorded the date and time each time the bottle was opened. RESULTS: One individual's MEMS device malfunctioned and thus the data presented in this report are based upon the remaining 19 patients. A significant discrepancy was found between log data and MEMS data, with MEMS data indicating a greater number and percentage of missed exchanges. MEMS data indicated that some patients concentrated their exchange activities during the day, with shortened dwell times between exchanges. Three indices were developed for this study: a measure of the average time spent in noncompliance, and indices of consistency in the timing of exchanges within and between days. Patients who were defined as consistent had lower scores on the noncompliance index compared to patients defined as inconsistent (p = 0.015). CONCLUSIONS: This study describes a methodology that may be useful in assessing adherence to the peritoneal dialysis regimen. Of particular significance is the ability to assess the timing of exchanges over the course of a day. Clinical implications are limited due to issues of data reliability and validity, the short-term nature of the study, the small sample, and the fact that clinical outcomes were not considered in this methodology study. Additional research is needed to further develop this data-collection approach.  (+info)

Statistical power of MRI monitored trials in multiple sclerosis: new data and comparison with previous results. (7/2102)

OBJECTIVES: To evaluate the durations of the follow up and the reference population sizes needed to achieve optimal and stable statistical powers for two period cross over and parallel group design clinical trials in multiple sclerosis, when using the numbers of new enhancing lesions and the numbers of active scans as end point variables. METHODS: The statistical power was calculated by means of computer simulations performed using MRI data obtained from 65 untreated relapsing-remitting or secondary progressive patients who were scanned monthly for 9 months. The statistical power was calculated for follow up durations of 2, 3, 6, and 9 months and for sample sizes of 40-100 patients for parallel group and of 20-80 patients for two period cross over design studies. The stability of the estimated powers was evaluated by applying the same procedure on random subsets of the original data. RESULTS: When using the number of new enhancing lesions as the end point, the statistical power increased for all the simulated treatment effects with the duration of the follow up until 3 months for the parallel group design and until 6 months for the two period cross over design. Using the number of active scans as the end point, the statistical power steadily increased until 6 months for the parallel group design and until 9 months for the two period cross over design. The power estimates in the present sample and the comparisons of these results with those obtained by previous studies with smaller patient cohorts suggest that statistical power is significantly overestimated when the size of the reference data set decreases for parallel group design studies or the duration of the follow up decreases for two period cross over studies. CONCLUSIONS: These results should be used to determine the duration of the follow up and the sample size needed when planning MRI monitored clinical trials in multiple sclerosis.  (+info)

Power and sample size calculations in case-control studies of gene-environment interactions: comments on different approaches. (8/2102)

Power and sample size considerations are critical for the design of epidemiologic studies of gene-environment interactions. Hwang et al. (Am J Epidemiol 1994;140:1029-37) and Foppa and Spiegelman (Am J Epidemiol 1997;146:596-604) have presented power and sample size calculations for case-control studies of gene-environment interactions. Comparisons of calculations using these approaches and an approach for general multivariate regression models for the odds ratio previously published by Lubin and Gail (Am J Epidemiol 1990; 131:552-66) have revealed substantial differences under some scenarios. These differences are the result of a highly restrictive characterization of the null hypothesis in Hwang et al. and Foppa and Spiegelman, which results in an underestimation of sample size and overestimation of power for the test of a gene-environment interaction. A computer program to perform sample size and power calculations to detect additive or multiplicative models of gene-environment interactions using the Lubin and Gail approach will be available free of charge in the near future from the National Cancer Institute.  (+info)