The significance of non-significance. (1/292)

We discuss the implications of empirical results that are statistically non-significant. Figures illustrate the interrelations among effect size, sample sizes and their dispersion, and the power of the experiment. All calculations (detailed in Appendix) are based on actual noncentral t-distributions, with no simplifying mathematical or statistical assumptions, and the contribution of each tail is determined separately. We emphasize the importance of reporting, wherever possible, the a priori power of a study so that the reader can see what the chances were of rejecting a null hypothesis that was false. As a practical alternative, we propose that non-significant inference be qualified by an estimate of the sample size that would be required in a subsequent experiment in order to attain an acceptable level of power under the assumption that the observed effect size in the sample is the same as the true effect size in the population; appropriate plots are provided for a power of 0.8. We also point out that successive outcomes of independent experiments each of which may not be statistically significant on its own, can be easily combined to give an overall p value that often turns out to be significant. And finally, in the event that the p value is high and the power sufficient, a non-significant result may stand and be published as such.  (+info)

The comparison of mixed distribution analysis with a three-criteria model as a method for estimating the prevalence of iron deficiency anaemia in Costa Rican children aged 12-23 months. (2/292)

BACKGROUND: A maximum likelihood method of mixed distribution analysis (MDA) is presented as a method to estimate the prevalence of iron deficiency anaemia (IDA) in Costa Rican infants 12-23 months old. MDA characterizes the parameters of the admixed distributions of iron deficient anaemics and non-iron-deficient-anaemics (NA) from the frequency distribution of haemoglobin concentration of the total sample population. METHODS: Data collected by Lozoff et al. (1986) from 345 Costa Rican infants 12-23 months old were used to estimate the parameters of the IDA and NA haemoglobin distributions determined by MDA and the widely used three-criteria model of iron deficiency. The estimates of the prevalence of IDA by each of the methods were compared. The sensitivity and specificity of MDA compared to diagnosis by the three-criteria method were assessed. Simulations were carried out to assess the comparability of MDA and the three-criteria method in low and high prevalence scenarios. RESULTS: The mean and standard deviation (SD) of the NA haemoglobin distribution determined by both methods was 12.1 +/- 1.0 g/dL. The IDA haemoglobin distribution determined by MDA had a mean and SD of 10.2 +/- 1.3 g/dL while the IDA distribution by the three-criteria method had a mean and SD of 10.4 +/- 1.3 g/dL. The prevalences of IDA as estimated by MDA and the three-criteria method were 24% and 29%, respectively. The sensitivity and specificity of MDA were 95% and 97%, respectively. The performance of MDA was similar to the three-criteria method at a simulated high prevalence of IDA and less similar at a low prevalence of IDA. CONCLUSIONS: Compared to the reference three-criteria method MDA provides a more accurate estimate of the true prevalence of IDA than the haemoglobin cutoff method in a population of children aged 12-23 months with a moderate to high prevalence of IDA. MDA is a less costly method for estimating the severity of IDA in populations with moderate to high prevalences of IDA, and for assisting in the design, monitoring and evaluation of iron intervention programmes.  (+info)

A genomic screen of autism: evidence for a multilocus etiology. (3/292)

We have conducted a genome screen of autism, by linkage analysis in an initial set of 90 multiplex sibships, with parents, containing 97 independent affected sib pairs (ASPs), with follow-up in 49 additional multiplex sibships, containing 50 ASPs. In total, 519 markers were genotyped, including 362 for the initial screen, and an additional 157 were genotyped in the follow-up. As a control, we also included in the analysis unaffected sibs, which provided 51 discordant sib pairs (DSPs) for the initial screen and 29 for the follow-up. In the initial phase of the work, we observed increased identity by descent (IBD) in the ASPs (sharing of 51.6%) compared with the DSPs (sharing of 50.8%). The excess sharing in the ASPs could not be attributed to the effect of a small number of loci but, rather, was due to the modest increase in the entire distribution of IBD. These results are most compatible with a model specifying a large number of loci (perhaps >/=15) and are less compatible with models specifying +info)

Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure. (4/292)

Detection of linkage to genes for quantitative traits remains a challenging task. Recently, variance components (VC) techniques have emerged as among the more powerful of available methods. As often implemented, such techniques require assumptions about the phenotypic distribution. Usually, multivariate normality is assumed. However, several factors may lead to markedly nonnormal phenotypic data, including (a) the presence of a major gene (not necessarily linked to the markers under study), (b) some types of gene x environment interaction, (c) use of a dichotomous phenotype (i.e., affected vs. unaffected), (d) nonnormality of the population within-genotype (residual) distribution, and (e) selective (extreme) sampling. Using simulation, we have investigated, for sib-pair studies, the robustness of the likelihood-ratio test for a VC quantitative-trait locus-detection procedure to violations of normality that are due to these factors. Results showed (a) that some types of nonnormality, such as leptokurtosis, produced type I error rates in excess of the nominal, or alpha, levels whereas others did not; and (b) that the degree of type I error-rate inflation appears to be directly related to the residual sibling correlation. Potential solutions to this problem are discussed. Investigators contemplating use of this VC procedure are encouraged to provide evidence that their trait data are normally distributed, to employ a procedure that allows for nonnormal data, or to consider implementation of permutation tests.  (+info)

Point and interval estimates of marker location in radiation hybrid mapping. (5/292)

Radiation hybrid (RH) mapping is a powerful method for ordering loci on chromosomes and for estimating the distances between them. RH mapping is currently used to construct both framework maps, in which all markers are ordered with high confidence (e.g., 1,000:1 relative maximum likelihood), and comprehensive maps, which include markers with less-confident placement. To deal with uncertainty in the order and location of markers, marker positions may be estimated conditional on the most likely marker order, plausible intervals for nonframework markers may be indicated on a framework map, or bins of markers may be constructed. We propose a statistical method for estimating marker position that combines information from all plausible marker orders, gives a measure of uncertainty in location for each marker, and provides an alternative to the current practice of binning. Assuming that the prior distribution for the retention probabilities is uniform and that the marker loci are distributed independently and uniformly on an interval of specified length, we calculate the posterior distribution of marker position for each marker. The median or mean of this distribution provides a point estimate of marker location. An interval estimate of marker location may be constructed either by using the 100(alpha/2) and 100(1-alpha)/2 percentiles of the distribution to form a 100(1-alpha) % posterior credible interval or by calculating the shortest 100(1-alpha) % posterior credible interval. These point and interval estimates take into account ordering uncertainty and do not depend on the assumption of a particular marker order. We evaluate the performance of the estimates on the basis of results from simulated data and illustrate the method with two examples.  (+info)

Heritability of cellular radiosensitivity: a marker of low-penetrance predisposition genes in breast cancer? (6/292)

Many inherited cancer-prone conditions show an elevated sensitivity to the induction of chromosome damage in cells exposed to ionizing radiation, indicative of defects in the processing of DNA damage. We earlier found that 40% of patients with breast cancer and 5%-10% of controls showed evidence of enhanced chromosomal radiosensitivity and that this sensitivity was not age related. We suggested that this could be a marker of cancer-predisposing genes of low penetrance. To further test this hypothesis, we have studied the heritability of radiosensitivity in families of patients with breast cancer. Of 37 first-degree relatives of 16 sensitive patients, 23 (62%) were themselves sensitive, compared with 1 (7%) of 15 first-degree relatives of four patients with normal responses. The distribution of radiosensitivities among the family members showed a trimodal distribution, suggesting the presence of a limited number of major genes determining radiosensitivity. Segregation analysis of 95 family members showed clear evidence of heritability of radiosensitivity, with a single major gene accounting for 82% of the variance between family members. The two alleles combine in an additive (codominant) manner, giving complete heterozygote expression. A better fit was obtained to a model that includes a second, rarer gene with a similar, additive effect on radiosensitivity, but the data are clearly consistent with a range of models. Novel genes involved in predisposition to breast cancer can now be sought through linkage studies using this quantitative trait.  (+info)

Replication of linkage studies of complex traits: an examination of variation in location estimates. (7/292)

In linkage studies, independent replication of positive findings is crucial in order to distinguish between true positives and false positives. Recently, the following question has arisen in linkage studies of complex traits: at what distance do we reject the hypothesis that two location estimates in a genomic region represent the same gene? Here we attempt to address this question. Sampling distributions for location estimates were constructed by computer simulation. The conditions for simulation were chosen to reflect features of "typical" complex traits, including incomplete penetrance, phenocopies, and genetic heterogeneity. Our findings, which bear on what is considered a replication in linkage studies of complex traits, suggest that, even with relatively large numbers of multiplex families, chance variation in the location estimate is substantial. In addition, we report evidence that, for the conditions studied here, the standard error of a location estimate is a function of the magnitude of the expected LOD score.  (+info)

Intron-exon structures of eukaryotic model organisms. (8/292)

To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.  (+info)