Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists. (73/562)

We introduce a tool for text mining, Dragon Plant Biology Explorer (DPBE) that integrates information on Arabidopsis (Arabidopsis thaliana) genes with their functions, based on gene ontologies and biochemical entity vocabularies, and presents the associations as interactive networks. The associations are based on (1) user-provided PubMed abstracts; (2) a list of Arabidopsis genes compiled by The Arabidopsis Information Resource; (3) user-defined combinations of four vocabulary lists based on the ones developed by the general, plant, and Arabidopsis GO consortia; and (4) three lists developed here based on metabolic pathways, enzymes, and metabolites derived from AraCyc, BRENDA, and other metabolism databases. We demonstrate how various combinations can be applied to fields of (1) gene function and gene interaction analyses, (2) plant development, (3) biochemistry and metabolism, and (4) pharmacology of bioactive compounds. Furthermore, we show the suitability of DPBE for systems approaches by integration with "omics" platform outputs. Using a list of abiotic stress-related genes identified by microarray experiments, we show how this tool can be used to rapidly build an information base on the previously reported relationships. This tool complements the existing biological resources for systems biology by identifying potentially novel associations using text analysis between cellular entities based on genome annotation terms. Thus, it allows researchers to efficiently summarize existing information for a group of genes or pathways, so as to make better informed choices for designing validation experiments. Last, DPBE can be helpful for beginning researchers and graduate students to summarize vast information in an unfamiliar area. DPBE is freely available for academic and nonprofit users at http://research.i2r.a-star.edu.sg/DRAGON/ME2/.  (+info)

Monozygotic twin model reveals novel embryo-induced transcriptome changes of bovine endometrium in the preattachment period. (74/562)

Initiation and maintenance of pregnancy are critically dependent on an intact embryo-maternal communication in the preimplantation period. To get new insights into molecular mechanisms underlying this complex dialog, a holistic transcriptome study of endometrium samples from Day 18 pregnant vs. nonpregnant twin cows was performed. This genetically defined model system facilitated the identification of specific conceptus-induced changes of the endometrium transcriptome. Using a combination of subtracted cDNA libraries and cDNA array hybridization, 87 different genes were identified as upregulated in pregnant animals. Almost one half of these genes are known to be stimulated by type I interferons. For the ISG15ylation system, which is assumed to play an important role in interferon tau (IFNT) signaling, mRNAs of four potential components (IFITM1, IFITM3, HSXIAPAF1, and DTX3L) were found at increased levels in addition to ISG15 and UBE1L. These results were further substantiated by colocalization of these mRNAs in the endometrium of pregnant animals shown by in situ hybridization. A functional classification of the identified genes revealed several different biological processes involved in the preparation of the endometrium for the attachment and implantation of the embryo. Specifically, elevated transcript levels were found for genes involved in modulation of the maternal immune system, genes relevant for cell adhesion, and for remodeling of the endometrium. This first systematic study of maternal transcriptome changes in response to the presence of an embryo on Day 18 of pregnancy in cattle is an important step toward deciphering the embryo-maternal dialog using a systems biology approach.  (+info)

Using citation data to improve retrieval from MEDLINE. (75/562)

OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.  (+info)

Automatic extraction of candidate nomenclature terms using the doublet method. (76/562)

BACKGROUND: New terminology continuously enters the biomedical literature. How can curators identify new terms that can be added to existing nomenclatures? The most direct method, and one that has served well, involves reading the current literature. The scholarly curator adds new terms as they are encountered. Present-day scholars are severely challenged by the enormous volume of biomedical literature. Curators of medical nomenclatures need computational assistance if they hope to keep their terminologies current. The purpose of this paper is to describe a method of rapidly extracting new, candidate terms from huge volumes of biomedical text. The resulting lists of terms can be quickly reviewed by curators and added to nomenclatures, if appropriate. The candidate term extractor uses a variation of the previously described doublet coding method. The algorithm, which operates on virtually any nomenclature, derives from the observation that most terms within a knowledge domain are composed entirely of word combinations found in other terms from the same knowledge domain. Terms can be expressed as sequences of overlapping word doublets that have more specific meaning than the individual words that compose the term. The algorithm parses through text, finding contiguous sequences of word doublets that are known to occur somewhere in the reference nomenclature. When a sequence of matching word doublets is encountered, it is compared with whole terms already included in the nomenclature. If the doublet sequence is not already in the nomenclature, it is extracted as a candidate new term. Candidate new terms can be reviewed by a curator to determine if they should be added to the nomenclature. An implementation of the algorithm is demonstrated, using a corpus of published abstracts obtained through the National Library of Medicine's PubMed query service and using "The developmental lineage classification and taxonomy of neoplasms" as a reference nomenclature. RESULTS: A 31+ Megabyte corpus of pathology journal abstracts was parsed using the doublet extraction method. This corpus consisted of 4,289 records, each containing an abstract title. The total number of words included in the abstract titles was 50,547. New candidate terms for the nomenclature were automatically extracted from the titles of abstracts in the corpus. Total execution time on a desktop computer with CPU speed of 2.79 GHz was 2 seconds. The resulting output consisted of 313 new candidate terms, each consisting of concatenated doublets found in the reference nomenclature. Human review of the 313 candidate terms yielded a list of 285 terms approved by a curator. A final automatic extraction of duplicate terms yielded a final list of 222 new terms (71% of the original 313 extracted candidate terms) that could be added to the reference nomenclature. CONCLUSION: The doublet method for automatically extracting candidate nomenclature terms can be used to quickly find new terms from vast amounts of text. The method can be immediately adapted for virtually any text and any nomenclature. An implementation of the algorithm, in the Perl programming language, is provided with this article.  (+info)

SLIM: an alternative Web interface for MEDLINE/PubMed searches - a preliminary study. (77/562)

BACKGROUND: With the rapid growth of medical information and the pervasiveness of the Internet, online search and retrieval systems have become indispensable tools in medicine. The progress of Web technologies can provide expert searching capabilities to non-expert information seekers. The objective of the project is to create an alternative search interface for MEDLINE/PubMed searches using JavaScript slider bars. SLIM, or Slider Interface for MEDLINE/PubMed searches, was developed with PHP and JavaScript. Interactive slider bars in the search form controlled search parameters such as limits, filters and MeSH terminologies. Connections to PubMed were done using the Entrez Programming Utilities (E-Utilities). Custom scripts were created to mimic the automatic term mapping process of Entrez. Page generation times for both local and remote connections were recorded. RESULTS: Alpha testing by developers showed SLIM to be functionally stable. Page generation times to simulate loading times were recorded the first week of alpha and beta testing. Average page generation times for the index page, previews and searches were 2.94 milliseconds, 0.63 seconds and 3.84 seconds, respectively. Eighteen physicians from the US, Australia and the Philippines participated in the beta testing and provided feedback through an online survey. Most users found the search interface user-friendly and easy to use. Information on MeSH terms and the ability to instantly hide and display abstracts were identified as distinctive features. CONCLUSION: SLIM can be an interactive time-saving tool for online medical literature research that improves user control and capability to instantly refine and refocus search strategies. With continued development and by integrating search limits, methodology filters, MeSH terms and levels of evidence, SLIM may be useful in the practice of evidence-based medicine.  (+info)

Hypertension in sub-Saharan African populations. (78/562)

BACKGROUND: Hypertension in sub-Saharan Africa is a widespread problem of immense economic importance because of its high prevalence in urban areas, its frequent underdiagnosis, and the severity of its complications. METHODS AND RESULTS: We searched PubMed and relevant journals for words in the title of this article. Among the major problems in making headway toward better detection and treatment are the limited resources of many African countries. Relatively recent environmental changes seem to be adverse. Mass migration from rural to periurban and urban areas probably accounts, at least in part, for the high incidence of hypertension in urban black Africans. In the remaining semirural areas, inroads in lifestyle changes associated with "civilization" may explain the apparently rising prevalence of hypertension. Overall, significant segments of the African population are still afflicted by severe poverty, famine, and civil strife, making the overall prevalence of hypertension difficult to determine. Black South Africans have a stroke rate twice as high as that of whites. Two lifestyle changes that are feasible and should help to stem the epidemic of hypertension in Africa are a decreased salt intake and decreased obesity, especially in women. CONCLUSIONS: Overall, differences from whites in etiology and therapeutic responses in sub-Saharan African populations are graded and overlapping rather than absolute. Further studies are needed on black Africans, who may (or may not) be genetically and environmentally different from black Americans and from each other in different parts of this vast continent.  (+info)

PubMed Assistant: a biologist-friendly interface for enhanced PubMed search. (79/562)

MEDLINE is one of the most important bibliographical information sources for biologists and medical workers. Its PubMed interface supports Boolean queries, which are potentially expressive and exact. However, PubMed is also designed to support simplicity of use at the expense of query expressiveness and exactness. Many PubMed users have never tried explicit Boolean queries. We developed a Java program, PubMed Assistant, to make literature access easier in several ways. PubMed Assistant provides an interface that efficiently displays information about the citations and includes useful functions such as keyword highlighting, export to citation managers, clickable links to Google Scholar and others that are lacking in PubMed.  (+info)

PSE: a tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords. (80/562)

BACKGROUND: MEDLINE/PubMed (hereinafter called PubMed) is one of the most important literature databases for the biological and medical sciences, but it is impossible to read all related records due to the sheer size of the repository. We usually have to repeatedly enter keywords in a trial-and-error manner to extract useful records. Software which can reduce such a laborious task is therefore required. RESULTS: We developed a web-based software, the PubMed Sentence Extractor (PSE), which parses large number of PubMed abstracts, extracts and displays the co-occurrence sentences of gene names and other keywords, and some information from EntrezGene records. The result links to whole abstracts and other resources such as the Online Mendelian Inheritance in Men and Reference Sequence. While PSE executes at the sentence-level when evaluating the existence of keywords, the popular PubMed operates at the record-level. Therefore, the relationship between the two keywords, a gene name and a common word, is more accurately captured by PSE than PubMed. In addition, PSE shows the list of keywords and considers the synonyms and variations on gene names. Through these functions, PSE would reduce the task of searching through records for gene information. CONCLUSION: We developed PSE in order to extract useful records efficiently from PubMed. This system has four advantages over a simple PubMed search; the reduction in the amount of collected literatures, the showing of keyword lists, the consideration for synonyms and variations on gene names, and the links to external databases. We believe PSE is helpful in collecting necessary literatures efficiently in order to find research targets. PSE is freely available under the GPL licence as additional files to this manuscript.  (+info)