The relationship between the Plasmodium falciparum parasite ratio in childhood and climate estimates of malaria transmission in Kenya. (49/394)

BACKGROUND: Plasmodium falciparum morbid and fatal risks are considerably higher in areas supporting parasite prevalence > or =25%, when compared with low transmission areas supporting parasite prevalence below 25%. Recent descriptions of the health impacts of malaria in Africa are based upon categorical descriptions of a climate-driven fuzzy model of suitability (FCS) for stable transmission developed by the Mapping Malaria Risk in Africa collaboration (MARA). METHODS: An electronic and national search was undertaken to identify community-based parasite prevalence surveys in Kenya. Data from these surveys were matched using ArcView 3.2 to extract spatially congruent estimates of the FCS values generated by the MARA model. Levels of agreement between three classes used during recent continental burden estimations of parasite prevalence (0%, >0-<25% and > or =25%) and three classes of FCS (0, >0-<0.75 and > or =0.75) were tested using the kappa (k) statistic and examined as continuous variables to define better levels of agreement. RESULTS: Two hundred and seventeen independent parasite prevalence surveys undertaken since 1980 were identified during the search. Overall agreement between the three classes of parasite prevalence and FCS was weak although significant (k = 0.367, p < 0.0001). The overall correlation between the FCS and the parasite ratio when considered as continuous variables was also positive (0.364, p < 0.001). The margins of error were in the stable, endemic (parasite ratio > or =25%) class with 42% of surveys represented by an FCS <0.75. Reducing the FCS value criterion to > or =0.6 improved the classification of stable, endemic parasite ratio surveys. Zero values of FCS were not adequate discriminators of zero parasite prevalence. CONCLUSION: Using the MARA model to categorically distinguish populations at differing intensities of malaria transmission in Kenya may under-represent those who are exposed to stable, endemic transmission and over-represent those at no risk. The MARA approach to defining FCS values of suitability for stable transmission represents our only contemporary continental level map of malaria in Africa but there is a need to redefine Africa's population at risk in accordance with both climatic and non-climatic determinants of P. falciparum transmission intensity to provide a more informed approach to estimating the morbid and fatal consequences of infection across the continent.  (+info)

Base-By-Base: single nucleotide-level analysis of whole viral genome alignments. (50/394)

BACKGROUND: With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. RESULTS: A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. CONCLUSION: Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.  (+info)

Linear fuzzy gene network models obtained from microarray data by exhaustive search. (51/394)

BACKGROUND: Recent technological advances in high-throughput data collection allow for experimental study of increasingly complex systems on the scale of the whole cellular genome and proteome. Gene network models are needed to interpret the resulting large and complex data sets. Rationally designed perturbations (e.g., gene knock-outs) can be used to iteratively refine hypothetical models, suggesting an approach for high-throughput biological system analysis. We introduce an approach to gene network modeling based on a scalable linear variant of fuzzy logic: a framework with greater resolution than Boolean logic models, but which, while still semi-quantitative, does not require the precise parameter measurement needed for chemical kinetics-based modeling. RESULTS: We demonstrated our approach with exhaustive search for fuzzy gene interaction models that best fit transcription measurements by microarray of twelve selected genes regulating the yeast cell cycle. Applying an efficient, universally applicable data normalization and fuzzification scheme, the search converged to a small number of models that individually predict experimental data within an error tolerance. Because only gene transcription levels are used to develop the models, they include both direct and indirect regulation of genes. CONCLUSION: Biological relationships in the best-fitting fuzzy gene network models successfully recover direct and indirect interactions predicted from previous knowledge to result in transcriptional correlation. Fuzzy models fit on one yeast cell cycle data set robustly predict another experimental data set for the same system. Linear fuzzy gene networks and exhaustive rule search are the first steps towards a framework for an integrated modeling and experiment approach to high-throughput "reverse engineering" of complex biological systems.  (+info)

Artificial intelligence in medicine. (52/394)

INTRODUCTION: Artificial intelligence is a branch of computer science capable of analysing complex medical data. Their potential to exploit meaningful relationship with in a data set can be used in the diagnosis, treatment and predicting outcome in many clinical scenarios. METHODS: Medline and internet searches were carried out using the keywords 'artificial intelligence' and 'neural networks (computer)'. Further references were obtained by cross-referencing from key articles. An overview of different artificial intelligent techniques is presented in this paper along with the review of important clinical applications. RESULTS: The proficiency of artificial intelligent techniques has been explored in almost every field of medicine. Artificial neural network was the most commonly used analytical tool whilst other artificial intelligent techniques such as fuzzy expert systems, evolutionary computation and hybrid intelligent systems have all been used in different clinical settings. DISCUSSION: Artificial intelligence techniques have the potential to be applied in almost every field of medicine. There is need for further clinical trials which are appropriately designed before these emergent techniques find application in the real clinical setting.  (+info)

Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data. (53/394)

BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures.A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.  (+info)

Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods. (54/394)

MOTIVATION: A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. Therefore, the elimination of unreliable signal intensities will enhance reproducibility and reliability of gene expression ratios produced from microarray data. In this study, we applied fuzzy c-means (FCM) and normal mixture modeling (NMM) based classification methods to separate microarray data into reliable and unreliable signal intensity populations. RESULTS: We compared the results of FCM classification with those of classification based on NMM. Both approaches were validated against reference sets of biological data consisting of only true positives and true negatives. We observed that both methods performed equally well in terms of sensitivity and specificity. Although a comparison of the computation times indicated that the fuzzy approach is computationally more efficient, other considerations support the use of NMM for the reliability analysis of microarray data. AVAILABILITY: The classification approaches described in this paper and sample microarray data are available as Matlab( TM ) (The MathWorks Inc., Natick, MA) programs (mfiles) and text files, respectively, at http://rc.kfshrc.edu.sa/bssc/staff/MusaAsyali/Downloads.asp. The programs can be run/tested on many different computer platforms where Matlab is available. CONTACT: [email protected].  (+info)

A fuzzy guided genetic algorithm for operon prediction. (55/394)

MOTIVATION: The operon structure of the prokaryotic genome is a critical input for the reconstruction of regulatory networks at the whole genome level. As experimental methods for the detection of operons are difficult and time-consuming, efforts are being put into developing computational methods that can use available biological information to predict operons. METHOD: A genetic algorithm is developed to evolve a starting population of putative operon maps of the genome into progressively better predictions. Fuzzy scoring functions based on multiple criteria are used for assessing the 'fitness' of the newly evolved operon maps and guiding their evolution. RESULTS: The algorithm organizes the whole genome into operons. The fuzzy guided genetic algorithm-based approach makes it possible to use diverse biological information like genome sequence data, functional annotations and conservation across multiple genomes, to guide the organization process. This approach does not require any prior training with experimental operons. The predictions from this algorithm for Escherchia coli K12 and Bacillus subtilis are evaluated against experimentally discovered operons for these organisms. The accuracy of the method is evaluated using an ROC (receiver operating characteristic) analysis. The area under the ROC curve is around 0.9, which indicates excellent accuracy. CONTACT: [email protected].  (+info)

Detecting clusters of different geometrical shapes in microarray gene expression data. (56/394)

MOTIVATION: Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters. RESULTS: We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data. AVAILABILITY: The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://dragon.kaist.ac.kr/gk.  (+info)