Biological information: making it accessible and integrated (and trying to make sense of it).
The availability of the genome sequences of human and mouse, human sequence variation data and other large genetic data sets will lead to a revolution in understanding of the human machine and the treatment of its diseases. The success of the international genome sequencing consortiums shows what can be achieved by well coordinated large scale public domain projects and the benefits of data access to all. It is already clear that the availability of this sequence is having a huge impact on research worldwide. Complete genome sequences provide a framework to pull all biological data together such that each piece has the potential to say something about biology as a whole. Biology is too complex for any organisation to have a monopoly of ideas or data, so the collection, analysis and access to this data can be contributed to by research institutes around the world. However, although it is possible for all this data to be accessible to all through the internet, the more organisations provide data or analysis separately, the harder it becomes for anyone to collect and integrate the results. To address these problems of intergration of data, open standards for biological data exchange, such as the 'Distributed Annotation System' (DAS) are being developed and bioinformatics (Dowell et al., 2001) as a whole is now being strongly driven by the open source software (OSS) model for collaborative software development (Hubbard and Birney, 1999). The leading provider of human genome annotation, the Ensembl project (http://www.ensembl.org), is entirely an OSS project and has been widely adopted by academic and commerical organisations alike (Hubbard et al., 2002). Accurate automatic annotation of features such as genes in vertebrate genomes currently relies on supporting evidence in the form of homologies to mRNAs, ESTs or protein. However, it appears that sufficient high quality experimentally curated annotation now exists to be used as a substrate for machine learning algorithms to create effective models of biological signal sequences (Down and Hubbard, 2002). Is there hope for ab initio prediction methods after all? (+info)
CEBS object model for systems biology data, SysBio-OM.
MOTIVATION: To promote a systems biology approach to understanding the biological effects of environmental stressors, the Chemical Effects in Biological Systems (CEBS) knowledge base is being developed to house data from multiple complex data streams in a systems friendly manner that will accommodate extensive querying from users. Unified data representation via a single object model will greatly aid in integrating data storage and management, and facilitate reuse of software to analyze and display data resulting from diverse differential expression or differential profile technologies. Data streams include, but are not limited to, gene expression analysis (transcriptomics), protein expression and protein-protein interaction analysis (proteomics) and changes in low molecular weight metabolite levels (metabolomics). RESULTS: To enable the integration of microarray gene expression, proteomics and metabolomics data in the CEBS system, we designed an object model, Systems Biology Object Model (SysBio-OM). The model is comprehensive and leverages other open source efforts, namely the MicroArray Gene Expression Object Model (MAGE-OM) and the Proteomics Experiment Data Repository (PEDRo) object model. SysBio-OM is designed by extending MAGE-OM to represent protein expression data elements (including those from PEDRo), protein-protein interaction and metabolomics data. SysBio-OM promotes the standardization of data representation and data quality by facilitating the capture of the minimum annotation required for an experiment. Such standardization refines the accuracy of data mining and interpretation. The open source SysBio-OM model, which can be implemented on varied computing platforms is presented here. AVAILABILITY: A universal modeling language depiction of the entire SysBio-OM is available at http://cebs.niehs.nih.gov/SysBioOM/. The Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. The database and interface are being built to implement the model and will be available for public use at http://cebs.niehs.nih.gov. (+info)
MathSBML: a package for manipulating SBML-based biological models.
MathSBML is a Mathematica package designed for manipulating Systems Biology Markup Language (SBML) models. It converts SBML models into Mathematica data structures and provides a platform for manipulating and evaluating these models. Once a model is read by MathSBML, it is fully compatible with standard Mathematica functions such as NDSolve (a differential-algebraic equations solver). MathSBML also provides an application programming interface for viewing, manipulating, running numerical simulations; exporting SBML models; and converting SBML models in to other formats, such as XPP, HTML and FORTRAN. By accessing the full breadth of Mathematica functionality, MathSBML is fully extensible to SBML models of any size or complexity. AVAILABILITY: Open Source (LGPL) at http://www.sbml.org and http://www.sf.net/projects/sbml (+info)
CSB.DB: a comprehensive systems-biology database.
SUMMARY: The open access comprehensive systems-biology database (CSB.DB) presents the results of bio-statistical analyses on gene expression data in association with additional biochemical and physiological knowledge. The main aim of this database platform is to provide tools that support insight into life's complexity pyramid with a special focus on the integration of data from transcript and metabolite profiling experiments. The central part of CSB.DB, which we describe in this applications note, is a set of co-response databases that currently focus on the three key model organisms, Escherichia coli, Saccharomyces cerevisiae and Arabidopsis thaliana. CSB.DB gives easy access to the results of large-scale co-response analyses, which are currently based exclusively on the publicly available compendia of transcript profiles. By scanning for the best co-responses among changing transcript levels, CSB.DB allows to infer hypotheses on the functional interaction of genes. These hypotheses are novel and not accessible through analysis of sequence homology. The database enables the search for pairs of genes and larger units of genes, which are under common transcriptional control. In addition, statistical tools are offered to the user, which allow validation and comparison of those co-responses that were discovered by gene queries performed on the currently available set of pre-selectable datasets. AVAILABILITY: All co-response databases can be accessed through the CSB.DB Web server (http://csbdb.mpimp-golm.mpg.de/). (+info)
Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1.
BACKGROUND: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function. RESULTS: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators. CONCLUSIONS: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available. (+info)
System-based proteomic analysis of the interferon response in human liver cells.
BACKGROUND: Interferons (IFNs) play a critical role in the host antiviral defense and are an essential component of current therapies against hepatitis C virus (HCV), a major cause of liver disease worldwide. To examine liver-specific responses to IFN and begin to elucidate the mechanisms of IFN inhibition of virus replication, we performed a global quantitative proteomic analysis in a human hepatoma cell line (Huh7) in the presence and absence of IFN treatment using the isotope-coded affinity tag (ICAT) method and tandem mass spectrometry (MS/MS). RESULTS: In three subcellular fractions from the Huh7 cells treated with IFN (400 IU/ml, 16 h) or mock-treated, we identified more than 1,364 proteins at a threshold that corresponds to less than 5% false-positive error rate. Among these, 54 were induced by IFN and 24 were repressed by more than two-fold, respectively. These IFN-regulated proteins represented multiple cellular functions including antiviral defense, immune response, cell metabolism, signal transduction, cell growth and cellular organization. To analyze this proteomics dataset, we utilized several systems-biology data-mining tools, including Gene Ontology via the GoMiner program and the Cytoscape bioinformatics platform. CONCLUSIONS: Integration of the quantitative proteomics with global protein interaction data using the Cytoscape platform led to the identification of several novel and liver-specific key regulatory components of the IFN response, which may be important in regulating the interplay between HCV, interferon and the host response to virus infection. (+info)
Thematic review series: The pathogenesis of atherosclerosis. Toward a biological network for atherosclerosis.
The goal of systems biology is to define all of the elements present in a given system and to create an interaction network between these components so that the behavior of the system, as a whole and in parts, can be explained under specified conditions. The elements constituting the network that influences the development of atherosclerosis could be genes, pathways, transcript levels, proteins, or physiologic traits. In this review, we discuss how the integration of genetics and technologies such as transcriptomics and proteomics, combined with mathematical modeling, may lead to an understanding of such networks. (+info)
Modelling the dynamics of biosystems.
The need for a more formal handling of biological information processing with stochastic and mobile process algebras is addressed. Biology can benefit this approach, yielding a better understanding of behavioural properties of cells, and computer science can benefit this approach, obtaining new computational models inspired by nature. (+info)