Gene indexing: characterization and analysis of NLM's GeneRIFs.
We present an initial analysis of the National Library of Medicine's (NLM) Gene Indexing initiative. Gene Indexing occurs at the time of indexing for all 4600 journals and over 500,000 articles added to PubMed/MEDLINE each year. Gene Indexing links articles about the basic biology of a gene or protein within eight model organisms to a specific record in the NLM's LocusLink database of gene products. The result is an entry called a Gene Reference Into Function (GeneRIF) within the LocusLink database. We analyzed the numbers of GeneRIFs produced in the first year of GeneRIF production. 27,645 GeneRIFs were produced, pertaining to 9126 loci over eight model organisms. 60% of these were associated with human genes and 27% with mouse genes. About 80% discuss genes with an established MeSH Heading or other MeSH term. We developed a prototype functional alerting system for researchers based on the GeneRIFs, and a strategy to find all of the literature related to genes. We conclude that the Gene Indexing initiative adds considerable value to the life sciences research community. (+info)
The dimensions of indexing.
Indexing of documents is an important strategy intended to make the literature more readily available to the user. Here we describe several dimensions of indexing that are important if indexing is to be optimal. These dimensions are coverage, predictability, and transparency. MeSH terms and text words are compared in MEDLINE in regard to these dimensions. Part of our analysis consists in applying AdaBoost with decisions trees as the weak learners to estimate how reliably index terms are being assigned and how complex the criteria are by which they are being assigned. Our conclusions are that MeSH terms are more predictable and more transparent than text words. (+info)
Developing optimal search strategies for detecting clinically sound causation studies in MEDLINE.
BACKGROUND: Clinical end users of MEDLINE must be able to retrieve articles that are both scientifically sound and directly relevant to clinical practice. The use of methodologic search filters has been advocated to improve the accuracy of searching for such studies. These filters are available for the literature on therapy and diagnosis, but strategies for the literature on causation have been less well studied. OBJECTIVE: To determine the retrieval characteristics of methodologic terms in MEDLINE for identifying methodologically sound studies on causation. DESIGN: Comparison of methodologic search terms and phrases for the retrieval of citations in MEDLINE with a manual hand search of the literature (the gold standard) for 162 core health care journals. METHODS: 6 trained, experienced research assistants read all issues of 162 journals for the publishing year 2000. Each article was rated using purpose and quality indicators and categorized into clinically relevant original studies, review articles, general papers, or case reports. The original and review articles were then categorized as 'pass' or 'fail' for methodologic rigor in the areas of therapy/quality improvement, diagnosis, prognosis, causation, economics, clinical prediction, and review articles. Search strategies were developed for all categories including causation. MAIN OUTCOME MEASURES: Sensitivity, specificity, precision, and accuracy of the search strategies. RESULTS: 12% of studies classified as causation met basic criteria for scientific merit for testing clinical applications. Combinations of terms reached peak sensitivities of 93%. Compared with the best single term, multiple terms increased sensitivity for sound studies by 15.5% (absolute increase), but with some loss of specificity when sensitivity was maximized. Combining terms to optimize sensitivity and specificity achieved sensitivities and specificities both above 80%. CONCLUSION: The retrieval of causation studies cited in MEDLINE can be substantially enhanced by selected combinations of indexing terms and textwords. (+info)
Developing optimal search strategies for detecting sound clinical prediction studies in MEDLINE.
BACKGROUND: The gaining interest in the use of clinical prediction guides as an aid for helping clinicians make effective front-line decisions, together with the increasing emphasis on evidence-based practice, underscores the need for accurate identification of sound clinical prediction studies. Despite the growing use of clinical prediction guides, little work has been done on identifying optimal literature search filters for retrieving these types of studies. The current study extends our earlier work, on developing optimal search strategies, to include clinical prediction guides. OBJECTIVE: To develop optimal search strategies for detecting methodologically sound clinical prediction studies in MEDLINE in the publishing year 2000. DESIGN: Comparison of the retrieval performance of methodologic search strategies in MEDLINE with a manual review ("gold standard") of each article for each issue of 162 core health care journals for the year 2000. METHODS: 6 experienced research assistants who had been trained and intensively calibrated reviewed all issues of 162 journals for the publishing year 2000. Each article was classified for format, interest, purpose, and methodologic rigor. Search strategies were developed for all purpose categories, including studies of clinical prediction guides. MAIN OUTCOME MEASURES: The sensitivity (recall), specificity, precision, and accuracy of single and combinations of search terms. RESULTS: 39% of original studies classified as a clinical prediction guide were methodologically sound. Combinations of terms reached peak sensitivities of 95%. Compared with the best single term, a three-term strategy increased sensitivity for sound studies by 17% (absolute increase), but with some loss of specificity when sensitivity was maximized. When search terms were combined to optimize sensitivity and specificity, these values reached or were close to 90%. CONCLUSION: Several search strategies can enhance the retrieval of sound clinical prediction studies. (+info)
Automated knowledge extraction for decision model construction: a data mining approach.
Combinations of Medical Subject Headings (MeSH) and Subheadings in MEDLINE citations may be used to infer relationships among medical concepts. To facilitate clinical decision model construction, we propose an approach to automatically extract semantic relations among medical terms from MEDLINE citations. We use the Apriori association rule mining algorithm to generate the co-occurrences of medical concepts, which are then filtered through a set of predefined semantic templates to instantiate useful relations. From such semantic relations, decision elements and possible relationships among them may be derived for clinical decision model construction. To evaluate the proposed method, we have conducted a case study in colorectal cancer management; preliminary results have shown that useful causal relations and decision alternatives can be extracted. (+info)
Toward (semi-)automatic generation of bio-medical ontologies.
The design and construction of domain specific ontologies and taxonomies requires allocation of huge resources in terms of cost and time. These efforts are human intensive and we need to explore ways of minimizing human involvement and other resources. In the biomedical domain, we seek to leverage resources such as the UMLS Metathesaurus and NLP-based applications such as MetaMap in conjunction with statistical clustering techniques, to (partially) automate the process. This is expected to be useful to the team involved in developing MeSH and other biomedical taxonomies to identify gaps in the existing taxonomies, and to be able to quickly bootstrap taxonomy generation for new research areas in biomedical informatics. (+info)
Visual mapping for medical concepts.
Concept relationships are traditionally defined in human-generated vocabulary lists such as the Medical Subject Headings (MeSH). This poster describes a prototype system that automatically generates concept relationships from the medical literature. The system is directly connected to the PUBMED search engine. For any given medical concept, the system will generate two styles of visual maps from MEDLINE in real time. Users can use the maps to explore concept relationships or construct better search queries interactively. (+info)
Automated indexing of the Hazardous Substances Data Bank (HSDB).
The Hazardous Substances Data Bank (HSDB), produced and maintained by the National Library of Medicine (NLM), contains over 4600 records on potentially hazardous chemicals. To enhance information retrieval from HSDB, NLM has undertaken the development of an automated HSDB indexing protocol as part of its Indexing Initiative. The NLM Indexing Initiative investigates methods whereby automated indexing may partially or completely substitute for human indexing. The poster's purpose is to describe the HSDB Automated Indexing Project. (+info)