Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs. (41/21774)

We have developed a rapid visual method for identifying novel members of gene families. Starting with an evolutionary tree, 20-50 protein query sequences for a gene family are selected from different branches of the tree. These query sequences are used to search the GenBank and expressed sequence tag (EST) DNA databases and their nightly updates using the tfastx3 or tfasty3 programs. The results of all 20-50 searches are collated and resorted to highlight EST or genomic sequences that share significant similarity with the query sequences. The statistical significance of each DNA/protein alignment is plotted, highlighting the portion of the query sequence that is present in the database sequence and the percent identity in the aligned region. The collated results for database sequences are linked using the WWW to the underlying scores and alignments; these links can also be used to perform additional searches to characterize the novel sequence further. With traditional "deep" scoring matrices (BLOSUM50) one can search for previously unrecognized families of large protein superfamilies. Alternatively, by using query sequences and EST libraries from the same species (e. g., human or mouse) together with "shallow" scoring matrices and filters that remove high-identity sequences, one can highlight new paralogs of previously described subfamilies. Using query sequences from the glutathione transferase superfamily, we identified two novel mammalian glutathione transferase families that were recognized previously only in plants. Using query sequences from known mammalian glutathione transferase subfamilies, we identified new candidate paralogs from the mouse class-mu, class-pi, and class-theta families.  (+info)

Y2K: the moment of truth. (42/21774)

It remains to be seen whether the world will move in time to fix the Y2K bug, or whether computers around the world will shut down when the clock strikes midnight on 31 December 1999. Y2K could have a serious impact on environmental facilities, particularly given the extent to which computer software and microchips are now involved in pollution control and environmental monitoring and protection systems.  (+info)

CORA--topological fingerprints for protein structural families. (43/21774)

CORA is a suite of programs for multiply aligning and analyzing protein structural families to identify the consensus positions and capture their most conserved structural characteristics (e.g., residue accessibility, torsional angles, and global geometry as described by inter-residue vectors/contacts). Knowledge of these structurally conserved positions, which are mostly in the core of the fold and of their properties, significantly improves the identification and classification of newly-determined relatives. Information is encoded in a consensus three-dimensional (3D) template and relatives found by a sensitive alignment method, which employs a new scoring scheme based on conserved residue contacts. By encapsulating these critical "core" features, templates perform more reliably in recognizing distant structural relatives than searches with representative structures. Parameters for 3D-template generation and alignment were optimized for each structural class (mainly-alpha, mainly-beta, alpha-beta), using representative superfold families. For all families selected, the templates gave significant improvements in sensitivity and selectivity in recognizing distant structural relatives. Furthermore, since templates contain less than 70% of fold positions and compare fewer positions when aligning structures, scans are at least an order of magnitude faster than scans using selected structures. CORA was subsequently tested on eight other broad structural families from the CATH database. Diagnostics plots are generated automatically and provide qualitative assistance for classifying newly determined relatives. They are demonstrated here by application to the large globin-like fold family. CORA templates for both homologous superfamilies and fold families will be stored in CATH and used to improve the classification and analysis of newly determined structures.  (+info)

Predicting the structures of 18 peptides using Geocore. (44/21774)

We describe an extensive test of Geocore, an ab initio peptide folding algorithm. We studied 18 short molecules for which there are structures in the Protein Data Bank; chains are up to 31 monomers long. Except for the very shortest peptides, an extremely simple energy function is sufficient to discriminate the true native state from more than 10(8) lowest energy conformations that are searched explicitly for each peptide. A high incidence of native-like structures is found within the best few hundred conformations generated by Geocore for each amino acid sequence. Predictions improve when the number of discrete phi/psi choices is increased.  (+info)

Factors limiting the performance of prediction-based fold recognition methods. (45/21774)

In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.  (+info)

Protein structural topology: Automated analysis and diagrammatic representation. (46/21774)

The topology of a protein structure is a highly simplified description of its fold including only the sequence of secondary structure elements, and their relative spatial positions and approximate orientations. This information can be embodied in a two-dimensional diagram of protein topology, called a TOPS cartoon. These cartoons are useful for the understanding of particular folds and making comparisons between folds. Here we describe a new algorithm for the production of TOPS cartoons, which is more robust than those previously available, and has a much higher success rate. This algorithm has been used to produce a database of protein topology cartoons that covers most of the data bank of known protein structures.  (+info)

Pulsatile influxes of H+, K+ and Ca2+ lag growth pulses of Lilium longiflorum pollen tubes. (47/21774)

Fluxes of H+, K+ and Ca2+ were measured with self-referencing ion-selective probes, near the plasma membrane of growing Lilium longiflorum pollen tubes. Measurements from three regions around short, steady-growing tubes showed small, steady influx of H+ over the distal 40 microm and a region of the tube within 50-100 microm of the grain with larger magnitude efflux from the grain. K+ fluxes were immeasurable in short tubes. Measurements of longer tubes that were growing in a pulsatile manner revealed a pulsatile influx of both H+ and K+ at the growing tip. The average fluxes at the cell surface during the peaks of the H+ and K+ pulses were 489+/-81 and 688+/-144 pmol cm-2 second-1, respectively. Growth was measured by tracking the pollen tips with a computer vision system that achieved a spatial resolution of approximately 1/10 pixel. The high spatial resolution enabled the detection of growth, and thus the changes in growth rates, with a temporal sampling rate of 1 frame/second. These data show that the H+ and K+ pulses have a phase lag of 103+/-9 and 100+/-11 degrees, respectively, with respect to the growth pulses. Calcium fluxes were also measured in growing tubes. During steady growth, the calcium influx was relatively steady. When pulsatile growth began, the basal Ca2+ influx decreased and a pulsatile component appeared, superimposed on the reduced basal Ca2+ flux. The peaks of the Ca2+ pulses at the cell surface averaged 38.4+/-2.5 pmol cm-2 second-1. Longer tubes had large pulsatile Ca2+ fluxes with smaller baseline fluxes. The Ca2+ influx pulses had a phase lag of 123+/-9 degrees with respect to the growth pulses.  (+info)

CD-ROM use by rural physicians. (48/21774)

A survey of 131 eastern Washington rural family physicians showed that 59.5% owned a personal computer with a CD-ROM drive. There was an inverse correlation between the physicians' years in practice and computer ownership: 10 years or less (80.6%), 11 to 20 years (72.2%), 21 to 30 years (55.6%), and more than 30 years (32.4%). Those physicians who owned a computer used their CD-ROM for entertainment (52.6%), medical textbooks (44.9%), literature searching software (25.6%), drug information (17.9%), continuing medical education (15.4%), and journals on CD-ROM (11.5%). Many rural doctors who owned computers felt that CD-ROM software helped them provide better patient care (46.8%) and kept them current on new information and techniques (48.4%). Indications for medical education, libraries and CD-ROM publishers are noted.  (+info)