Spatial and temporal patterns of imported malaria cases and local transmission in Trinidad. (73/16506)

Over a 30-year period (1968-1997) 213 malaria cases in Trinidad were investigated by the Trinidad and Tobago Ministry of Health. Using a global positional system and a geographic information system, we mapped the precise location of all reported malaria cases, and associated them with breeding habitats of anopheline vectors. The majority of the cases (138, 63%) were individual imported cases around the big port cities. Plasmodium falciparum was the most common parasite, and Africa the most common source of imported cases. Two clusters of cases occurred: an introduced P. vivax outbreak associated with Anopheles aquasalis in 1990-1991, and an autochtonous focus of P. malariae associated with An. bellator and An. homunculus in 1994-1995. Application of a space-time statistic showed a significant clustering of P. malariae cases, and, to a lesser extent of P. vivax cases, but not of P. falciparum cases. Based on potential for occurrence of local transmission, we are developing risk maps to determine surveillance priorities, outbreak potential, and necessary degree and spatial range of control activities following case detections.  (+info)

Genetically distinct dog-derived and human-derived Sarcoptes scabiei in scabies-endemic communities in northern Australia. (74/16506)

Overcrowding is a significant factor contributing to endemic infection with Sarcoptes scabiei in human and animal populations. However, since scabies mites from different host species are indistinguishable morphologically, it is unclear whether people can be infected from scabies-infested animals. Molecular fingerprinting was done using three S. scabiei-specific single locus hypervariable microsatellite markers, with a combined total of 70 known alleles. Multilocus analysis of 712 scabies mites from human and dog hosts in Ohio, Panama and Aboriginal communities in northern Australia now shows that genotypes of dog-derived and human-derived scabies cluster by host species rather than by geographic location. Because of the apparent genetic separation between human scabies and dog scabies, control programs for human scabies in endemic areas do not require resources directed against zoonotic infection from dogs.  (+info)

Transmission dynamics of tuberculosis in a high-incidence country: prospective analysis by PCR DNA fingerprinting. (75/16506)

We have prospectively analyzed the DNA fingerprints of Mycobacterium tuberculosis strains from a random sample of patients with newly diagnosed tuberculosis in Windhoek, Namibia. Strains from 263 smear-positive patients in whom tuberculosis was diagnosed during 1 year were evaluated, and the results were correlated with selected epidemiological and clinical data. A total of 163 different IS6110 fingerprint patterns were observed among the 263 isolates. Isolates from a high percentage of patients (47%) were found in 29 separate clusters, with a cluster defined as isolates with 100% matching patterns. The largest cluster included isolates from 39 patients. One predominant strain of M. tuberculosis caused 15% of cases of smear-positive pulmonary tuberculosis in Windhoek. That strain was also prevalent in the north of the country, suggesting that in contrast to other African countries with isolates with high levels of diversity in their DNA fingerprint patterns, only a restricted number of different strains significantly contribute to the tuberculosis problem in Namibia.  (+info)

Simultaneous infection with two drug-susceptible Mycobacterium tuberculosis strains in an immunocompetent host. (76/16506)

An important assumption for DNA fingerprinting of Mycobacterium tuberculosis is that patients are infected with only one strain at a time. Nonetheless, we demonstrate a case of simultaneous infection with two drug-susceptible strains of M. tuberculosis in an immunocompetent patient by IS6110 restriction fragment length polymorphism and spoligotyping. Epidemiological data prove the patient's involvement in two independent clusters. Thus, double infections should be suspected with fingerprints showing divergent band intensities.  (+info)

Large-scale clustering of cDNA-fingerprinting data. (77/16506)

Clustering is one of the main mathematical challenges in large-scale gene expression analysis. We describe a clustering procedure based on a sequential k-means algorithm with additional refinements that is able to handle high-throughput data in the order of hundreds of thousands of data items measured on hundreds of variables. The practical motivation for our algorithm is oligonucleotide fingerprinting-a method for simultaneous determination of expression level for every active gene of a specific tissue-although the algorithm can be applied as well to other large-scale projects like EST clustering and qualitative clustering of DNA-chip data. As a pairwise similarity measure between two p-dimensional data points, x and y, we introduce mutual information that can be interpreted as the amount of information about x in y, and vice versa. We show that for our purposes this measure is superior to commonly used metric distances, for example, Euclidean distance. We also introduce a modified version of mutual information as a novel method for validating clustering results when the true clustering is known. The performance of our algorithm with respect to experimental noise is shown by extensive simulation studies. The algorithm is tested on a subset of 2029 cDNA clones coming from 15 different genes from a cDNA library derived from human dendritic cells. Furthermore, the clustering of these 2029 cDNA clones is demonstrated when the entire set of 76,032 cDNA clones is processed.  (+info)

Exploring expression data: identification and analysis of coexpressed genes. (78/16506)

Analysis procedures are needed to extract useful information from the large amount of gene expression data that is becoming available. This work describes a set of analytical tools and their application to yeast cell cycle data. The components of our approach are (1) a similarity measure that reduces the number of false positives, (2) a new clustering algorithm designed specifically for grouping gene expression patterns, and (3) an interactive graphical cluster analysis tool that allows user feedback and validation. We use the clusters generated by our algorithm to summarize genome-wide expression and to initiate supervised clustering of genes into biologically meaningful groups.  (+info)

d2_cluster: a validated method for clustering EST and full-length cDNAsequences. (79/16506)

Several efforts are under way to condense single-read expressed sequence tags (ESTs) and full-length transcript data on a large scale by means of clustering or assembly. One goal of these projects is the construction of gene indices where transcripts are partitioned into index classes (or clusters) such that they are put into the same index class if and only if they represent the same gene. Accurate gene indexing facilitates gene expression studies and inexpensive and early partial gene sequence discovery through the assembly of ESTs that are derived from genes that have yet to be positionally cloned or obtained directly through genomic sequencing. We describe d2_cluster, an agglomerative algorithm for rapidly and accurately partitioning transcript databases into index classes by clustering sequences according to minimal linkage or "transitive closure" rules. We then evaluate the relative efficiency of d2_cluster with respect to other clustering tools. UniGene is chosen for comparison because of its high quality and wide acceptance. It is shown that although d2_cluster and UniGene produce results that are between 83% and 90% identical, the joining rate of d2_cluster is between 8% and 20% greater than UniGene. Finally, we present the first published rigorous evaluation of under and over clustering (in other words, of type I and type II errors) of a sequence clustering algorithm, although the existence of highly identical gene paralogs means that care must be taken in the interpretation of the type II error. Upper bounds for these d2_cluster error rates are estimated at 0.4% and 0.8%, respectively. In other words, the sensitivity and selectivity of d2_cluster are estimated to be >99.6% and 99.2%.  (+info)

A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. (80/16506)

The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313, 103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1. 86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented.  (+info)