Extension of multifactor dimensionality reduction for identifying multilocus effects in the GAW14 simulated data. (33/3144)

The multifactor dimensionality reduction (MDR) is a model-free approach that can identify gene x gene or gene x environment effects in a case-control study. Here we explore several modifications of the MDR method. We extended MDR to provide model selection without crossvalidation, and use a chi-square statistic as an alternative to prediction error (PE). We also modified the permutation test to provide different levels of stringency. The extended MDR (EMDR) includes three permutation tests (fixed, non-fixed, and omnibus) to obtain p-values of multilocus models. The goal of this study was to compare the different approaches implemented in the EMDR method and evaluate the ability to identify genetic effects in the Genetic Analysis Workshop 14 simulated data. We used three replicates from the simulated family data, generating matched pairs from family triads. The results showed: 1) chi-square and PE statistics give nearly consistent results; 2) results of EMDR without cross-validation matched that of EMDR with 10-fold cross-validation; 3) the fixed permutation test reports false-positive results in data from loci unrelated to the disease, but the non-fixed and omnibus permutation tests perform well in preventing false positives, with the omnibus test being the most conservative. We conclude that the non-cross-validation test can provide accurate results with the advantage of high efficiency compared to 10-cross-validation, and the non-fixed permutation test provides a good compromise between power and false-positive rate.  (+info)

Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies. (34/3144)

Complex diseases are generally thought to be under the influence of multiple, and possibly interacting, genes. Many association methods have been developed to identify susceptibility genes assuming a single-gene disease model, referred to as single-locus methods. Multilocus methods consider joint effects of multiple genes and environmental factors. One commonly used method for family-based association analysis is implemented in FBAT. The multifactor-dimensionality reduction method (MDR) is a multilocus method, which identifies multiple genetic loci associated with the occurrence of complex disease. Many studies of late onset complex diseases employ a discordant sib pairs design. We compared the FBAT and MDR in their ability to detect susceptibility loci using a discordant sib-pair dataset generated from the simulated data made available to participants in the Genetic Analysis Workshop 14. Using FBAT, we were able to identify the effect of one susceptibility locus. However, the finding was not statistically significant. We were not able to detect any of the interactions using this method. This is probably because the FBAT test is designed to find loci with major effects, not interactions. Using MDR, the best result we obtained identified two interactions. However, neither of these reached a level of statistical significance. This is mainly due to the heterogeneity of the disease trait and noise in the data.  (+info)

Analysis of genes for alcoholism using two-disease-locus models. (35/3144)

Using model-based two-locus methods for mapping genes, we analyzed the family data from the Collaborative Study on the Genetics of Alcoholism. Microsatellite data from 143 families ascertained through having three or more individuals affected with alcohol dependence were used for this investigation. Four regions showing evidence for linkage were identified using single-locus models from previous investigations. We investigated the genetic linkage, pattern of disease inheritance, and pair-wise genetic epistasis of these loci using the TLINKAGE program for two-disease-locus analysis.  (+info)

Evaluating outlier loci and their effect on the identification of pedigree errors. (36/3144)

Homozygosity outlier loci, which show patterns of variation that are extremely divergent from the rest of the genome, can be evaluated by comparison of the homozygosity under Hardy-Weinberg proportions (the sum of the squares of allele frequencies) with the expected homozygosity under neutrality. Such outlier loci are potentially under selection (balancing selection or directional selection) when genome-wide effects (such as bottleneck and rapid population growth) are excluded. Outlier loci show skewed allele frequencies with respect to neutrality and may therefore affect the identification of pedigree errors. However, choosing neutral markers (excluding outlier loci) for the identification of pedigree errors has been neglected thus far. Our results showed that 4.1%, 5.5%, and 1.5% of the microsatellite markers, Illumina single-nucleotide polymorphisms (SNPs), and Affymetrix SNPs, respectively, on the autosomes appear to be under balancing selection (p or=40%) appear to be under balancing selection. Pedigree structure errors in 15 of 143 pedigrees were detected using microsatellite markers from the autosomes and/or selected SNPs from chromosomes 1 to 18 of the Illumina and/or selected SNPs from chromosomes 1 to 16 of the Affymetrix. Outlier loci did not make a major difference to the identification of pedigree errors. The Collaborative Study on the Genetics of Alcoholism data has pedigree errors and some of them may be due to sample mix up.  (+info)

Detection of susceptibility loci by genome-wide linkage analysis. (37/3144)

The objective of this study is to evaluate the efficacy of a model-free linkage statistics for finding evidence of linkage using two different maps and to illustrate how the comparison of results from several populations might provide insight into the underlying genetic etiology of the disease of interest. The results obtained in terms of detection of the risk loci and threshold for declaring linkage and power are very similar for a dense SNP map and a sparser microsatellite map. The populations differed in terms of family ascertainment and diagnosis criteria, leading to different power to detect the individual underlying disease loci. Our results for the individual replicates are consistent with the disease model used in the simulation.  (+info)

Interval estimation of disease loci: development and applications of new linkage methods. (38/3144)

Three variants of the confidence set inference (CSI) procedure were proposed and applied to both the simulated and the Collaborative Study on the Genetics of Alcoholism (COGA) data. For each of the two applications, we first performed a preliminary genome scan study based on the microsatellite markers using the GENEHUNTER+ software to identify regions that potentially harbor disease loci. For each such region, we estimated the sibling identity-by-descent sharing probability distribution at the putative disease locus. Based on these estimated probabilities, the CSI procedures were employed to further localize the disease loci using the single-nucleotide polymorphism markers, leading to confidence intervals/regions for their locations. For our analysis with the simulated data, we had knowledge of the simulating models at the time we performed the analysis.  (+info)

Comparison of single-nucleotide polymorphisms and microsatellites in inference of population structure. (39/3144)

Single-nucleotide polymorphisms (SNPs) are a class of attractive genetic markers for population genetic studies and for identifying genetic variations underlying complex traits. However, the usefulness and efficiency of SNPs in comparison to microsatellites in different scientific contexts, e.g., population structure inference or association analysis, still must be systematically evaluated through large empirical studies. In this article, we use the Collaborative Studies on Genetics of Alcoholism (COGA) data from Genetic Analysis Workshop 14 (GAW14) to compare the performance of microsatellites and SNPs in the whole human genome in the context of population structure inference. A total of 328 microsatellites and 15,840 SNPs are used to infer population structure in 236 unrelated individuals. We find that, on average, the informativeness of random microsatellites is four to twelve times that of random SNPs for various population comparisons, which is consistent with previous studies. Our results also indicate that for the combined set of microsatellites and SNPs, SNPs constitute the majority among the most informative markers and the use of these SNPs leads to better inference of population structure than the use of microsatellites. We also find that the inclusion of less informative markers may add noise and worsen the results.  (+info)

Construction of the model for the Genetic Analysis Workshop 14 simulated data: genotype-phenotype relationships, gene interaction, linkage, association, disequilibrium, and ascertainment effects for a complex phenotype. (40/3144)

The Genetic Analysis Workshop 14 simulated dataset was designed 1) To test the ability to find genes related to a complex disease (such as alcoholism). Such a disease may be given a variety of definitions by different investigators, have associated endophenotypes that are common in the general population, and is likely to be not one disease but a heterogeneous collection of clinically similar, but genetically distinct, entities. 2) To observe the effect on genetic analysis and gene discovery of a complex set of gene x gene interactions. 3) To allow comparison of microsatellite vs. large-scale single-nucleotide polymorphism (SNP) data. 4) To allow testing of association to identify the disease gene and the effect of moderate marker x marker linkage disequilibrium. 5) To observe the effect of different ascertainment/disease definition schemes on the analysis. Data was distributed in two forms. Data distributed to participants contained about 1,000 SNPs and 400 microsatellite markers. Internet-obtainable data consisted of a finer 10,000 SNP map, which also contained data on controls. While disease characteristics and parameters were constant, four "studies" used varying ascertainment schemes based on differing beliefs about disease characteristics. One of the studies contained multiplex two- and three-generation pedigrees with at least four affected members. The simulated disease was a psychiatric condition with many associated behaviors (endophenotypes), almost all of which were genetic in origin. The underlying disease model contained four major genes and two modifier genes. The four major genes interacted with each other to produce three different phenotypes, which were themselves heterogeneous. The population parameters were calibrated so that the major genes could be discovered by linkage analysis in most datasets. The association evidence was more difficult to calibrate but was designed to find statistically significant association in 50% of datasets. We also simulated some marker x marker linkage disequilibrium around some of the genes and also in areas without disease genes. We tried two different methods to simulate the linkage disequilibrium.  (+info)