Evolutionary patterns from mass originations and mass extinctions. (65/10211)

The Fossil Record 2 database gives a stratigraphic range of most known animal and plant families. We have used it to plot the number of families extant through time and argue for an exponential fit, rather than a logistic one, on the basis of power spectra of the residuals from the exponential. The times of origins and extinctions, when plotted for all families of marine and terrestrial organisms over the last 600 Myr, reveal different origination and extinction peaks. This suggests that patterns of biological evolution are driven by its own internal dynamics as well as responding to upsets from external causes. Spectral analysis shows that the residuals from the exponential model of the marine system are more consistent with 1/f noise suggesting that self-organized criticality phenomena may be involved.  (+info)

Angiopoietin-3, a novel member of the angiopoietin family. (66/10211)

A cDNA clone encoding angiopoietin-3 protein (Ang3), a novel member of the angiopoietin family, was identified. Ang3 cDNA was cloned from a human aorta cDNA library. Ang3 is a 503 amino acid protein having 45.1% and 44.7% identity with human angiopoietin-1 and human angiopoietin-2, respectively. Ang3 mRNA is expressed in lung and cultured human umbilical vein endothelial cells (HUVECs). Ang3 mRNA expression in HUVECs was slightly decreased by vascular endothelial cell growth factor treatment, suggesting that the regulation of Ang3 mRNA expression is different from that of Ang2.  (+info)

A novel inhibitor protein for Bombyx cysteine proteinase is homologous to propeptide regions of cysteine proteinases. (67/10211)

A cDNA clone for an inhibitor of Bombyx cysteine proteinase was isolated and sequenced. Active inhibitor proteins were expressed in Escherichia coli using the cDNA. The open reading frame of the cDNA encodes a 105 residues protein with 19 residues of a signal sequence. The inhibitor has amino acid sequences homologous to several cysteine proteinases, but only to their propeptide sequences. The results suggest that some cysteine proteinase proregions may have evolved as autonomous modules and become inhibitor proteins for cysteine proteinases.  (+info)

Recognizing the pleckstrin homology domain fold in mammalian phospholipase D using hidden Markov models. (68/10211)

Phospholipase D was first described in plant tissue but has recently been shown to occur in mammalian cells where it is activated by cell surface receptors. Its mode of activation by receptors in unclear. Biochemical studies suggest that it may occur downstream of other effector proteins and that small GTP-dependent regulatory proteins may be involved. The sequence in a non-designated region of mammalian phospholipase D1 and 2 shows similarity to a structural domain that is present in signalling proteins that are regulated by protein kinases or heterotrimeric G-proteins. Mammalian phospholipase D has structural similarities with other lipid signalling phospholipases and thus may be regulated by receptors in an analogous fashion.  (+info)

Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. (69/10211)

MOTIVATION: Gene regulation often depends on functional modules which feature a detectable internal organization. Overall sequence similarity of these modules is often insufficient for detection by general search methods like FASTA or even Gapped BLAST. However, it is of interest to evaluate whether modules, often known from experimental analysis of single sequences, are present in other regulatory sequences. RESULTS: We developed a new method (FastM) which combines a search algorithm for individual transcription factor binding sites (MatInspector) with a distance correlation function. FastM allows fast definition of a model of correlated binding sites derived from as little as a single promoter or enhancer. ModelInspector results are suitable for evaluation of the significance of the model. We used FastM to define a model for the experimentally verified NFkappaB/IRF1 regulatory module from the major histocompatibility complex (MHC) class I HLA-B gene promoter. Analysis of a test set of sequences as well as database searches with this model showed excellent correlation of the model with the biological function of the module. These results could not be obtained by searches using FASTA or Gapped BLAST, which are based on sequence similarity. We were also able to demonstrate association of a hypothetical GRE-GRE module with viral sequences based on analysis of several GenBank sections with this module. AVAILABILITY: The WWW version of FastM is accessible at: http://www.gsf.de/cgi-bin/fastm. pl and http://genomatix.gsf.de/cgi-bin/fastm2/fastm.pl  (+info)

10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. (70/10211)

MOTIVATION: Completely sequenced genomes allow for detection and analysis of the relatively weak periodicities of 10-11 basepairs (bp). Two sources contribute to such signals: correlations in the corresponding protein sequences due to the amphipatic character of alpha-helices and the folding of DNA (nucleosomal patterns, DNA supercoiling). Since the topological state of genomic DNA is of importance for its replication, recombination and transcription, there is an immediate interest to obtain information about the supercoiled state from sequence periodicities. RESULTS: We show that correlations within proteins affect mainly the oscillations at distances below 35 bp. The long-ranging correlations up to 100 bp reflect primarily DNA folding. For the yeast genome these oscillations are consistent in detail with the chromatin structure. For eubacteria and archaea the periods deviate significantly from the 10.55 bp value for free DNA. These deviations suggest that while a period of 11 bp in bacteria reflects negative supercoiling, the significantly different period of thermophilic archaea close to 10 bp corresponds to positive supercoiling of thermophilic archaeal genomes. AVAILABILITY: Protein sets and C programs for the calculation of correlation functions are available on request from the authors (see http://itb.biologie.hu-berlin.de).  (+info)

EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation. (71/10211)

SUMMARY: Many databases in molecular biology face the problem that the ever increasing rate of data production can no longer be handled by traditional methods, especially human curation. Therefore, a number of projects are currently investigating methods for automated sequence annotation. This paper describes the EBI's approach to this problem for protein sequences by integration of arbitrary analysis programs into a distributed and highly flexible environment. Our software framework allows an individual treatment of sequences depending on their particular properties, which is achieved through a high-level description of the preconditions and capabilities of analysing modules. This not only improves the overall performance of the annotation process, as unnecessary steps are avoided, but also enhances its quality since dependencies between different modules are taken into account. We have implemented a prototype and use it in the production of TrEMBL releases. AVAILABILITY: Upon request.  (+info)

A novel method for automatic functional annotation of proteins. (72/10211)

MOTIVATION: To cope with the increasing amount of sequence data, reliable automatic annotation tools are required. The TrEMBL database contains together with SWISS-PROT nearly all publicly available protein sequences, but in contrast to SWISS-PROT only limited functional annotation. To improve this situation, we had to develop a method of automatic annotation that produces highly reliable functional prediction using the language and the syntax of SWISS-PROT. RESULTS: An algorithm was developed and successfully used for the automatic annotation of a testset of unknown proteins. The predicted information included description, function, catalytic activity, cofactors, pathway, subcellular location, quaternary structure, similarity to other protein, active sites, and keywords. The algorithm showed a low coverage (10%), but a high specificity and reliability. AVAILABILITY: The results can be obtained by anonymous ftp from ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb. The source code is available on request from the authors.  (+info)