Maintaining a catalog of manually-indexed, clinically-oriented World Wide Web content. (9/302)

With no quality controls and a highly distributed means of posting information, finding high-quality, clinically-oriented content on the World Wide Web can be difficult. Maintaining a catalog of such information can be equally challenging. CliniWeb is a catalog of quality-filtered and clinically-oriented content on the Web designed to enhance access to such information. This paper describes a group of semi-automated tools have been developed to maintain the CliniWeb database. One allows easier identification of content by utilizing Web crawling techniques from high-level pages. Another allows easier selection of content for inclusion and its indexing. A final one checks links to help keep the database current. These are augmented by general plans to adopt more detailed metadata and linkages into the medical literature.  (+info)

Creating and indexing teaching files from free-text patient reports. (10/302)

Teaching files based on real patient data can enhance the education of students, staff and other colleagues. Although information retrieval system can index free-text documents using keywords, these systems do not work well where content bearing terms (e.g., anatomy descriptions) frequently appears. This paper describes a system that uses multi-word indexing terms to provide access to free-text patient reports. The utilization of multi-word indexing allows better modeling of the content of medical reports, thus improving retrieval performance. The method used to select indexing terms as well as early evaluation of retrieval performance is discussed.  (+info)

A large-scale evaluation of terminology integration characteristics. (11/302)

OBJECTIVE: To describe terminology integration characteristics of local specialty specific and general vocabularies in order to facilitate the appropriate inclusion and mapping of these terms into a large-scale terminology. METHODS: We compared the sensitivity, specificity, positive predictive value, and positive likelihood ratios for Automated Term Composition to correctly map 9050 local specialty specific (dermatology) terms and 4994 local general terms to UMLS using Metaphrase. Results were systematically combined among exact matches, semantic type filtered matches, and non-filtered matches. For the general set, an analysis of semantic type filtering was performed. RESULTS: Dermatology exact matches defined a sensitivity of 51% (57% for general terms) and a specificity of 86% (92% general terms). Including semantic type filtered matches increased sensitivity (75% dermatology; 88% general); as did inclusion of non-filtered matches (98% and 99%). These inclusions correspondingly decreased specificity (filtered: 82% and 74%; non-filtered: 52% and 32%). Positive predictive values for exact matches (93.0% dermatology, 97.6% general) were improved by small but significant (p < 0.001) margins by including filtered matches (95.1% dermatology, 98.4% general) but decreased with non-filtered matches (89.2% dermatology, 87.8% general). Adding additional semantic types to the filtering algorithm failed to improve the positive predictive value or the positive likelihood ratio of term mapping, in spite of a 2.3% improvement in sensitivity. CONCLUSIONS: Automated methods for mapping local "colloquial" terminologies to large-scale controlled health vocabulary systems are practical (ppv 95% dermatology, 98% general). Semantic type filtering improves specificity without sacrificing sensitivity and yields high positive predictive values in every set analyzed.  (+info)

Evaluation of the Information Sources Map. (12/302)

As part of preliminary studies for the development of a digital library, we have studied the possibility of using the UMLS Information Sources Map (ISM) database to provide the means to connect and map different terminologies, as well as to facilitate access to available information sources. The main issues discussed are the indexing of and connection to relevant online sources. We found the features of the ISM to be consistent with the need to support automated source selection and retrieval. However, attention should be paid to three aspects of the information: granularity, completeness, and accuracy. We found the ISM to be potentially useful; however, significant modifications will be required if the ISM is to be able to support automated source selection and retrieval.  (+info)

Design and implementation of a national clinical trials registry. (13/302)

The authors have developed a Web-based system that provides summary information about clinical trials being conducted throughout the United States. The first version of the system, publicly available in February 2000, contains more than 4,000 records representing primarily trials sponsored by the National Institutes of Health. The impetus for this system has come from the Food and Drug Administration (FDA) Modernization Act of 1997, which mandated a registry of both federally and privately funded clinical trials "of experimental treatments for serious or life-threatening diseases or conditions." The system design and implementation have been guided by several principles. First, all stages of system development were guided by the needs of the primary intended audience, patients and other members of the public. Second, broad agreement on a common set of data elements was obtained. Third, the system was designed in a modular and extensible way, and search methods that take extensive advantage of the National Library of Medicine's Unified Medical Language System (UMLS) were developed. Finally, since this will be a long-term effort involving many individuals and organizations, the project is being implemented in several phases.  (+info)

PathMaster: content-based cell image retrieval using automated feature extraction. (14/302)

OBJECTIVE: Currently, when cytopathology images are archived, they are typically stored with a limited text-based description of their content. Such a description inherently fails to quantify the properties of an image and refers to an extremely small fraction of its information content. This paper describes a method for automatically indexing images of individual cells and their associated diagnoses by computationally derived cell descriptors. This methodology may serve to better index data contained in digital image databases, thereby enabling cytologists and pathologists to cross-reference cells of unknown etiology or nature. DESIGN: The indexing method, implemented in a program called PathMaster, uses a series of computer-based feature extraction routines. Descriptors of individual cell characteristics generated by these routines are employed as indexes of cell morphology, texture, color, and spatial orientation. MEASUREMENTS: The indexing fidelity of the program was tested after populating its database with images of 152 lymphocytes/lymphoma cells captured from lymph node touch preparations stained with hematoxylin and eosin. Images of "unknown" lymphoid cells, previously unprocessed, were then submitted for feature extraction and diagnostic cross-referencing analysis. RESULTS: PathMaster listed the correct diagnosis as its first differential in 94 percent of recognition trials. In the remaining 6 percent of trials, PathMaster listed the correct diagnosis within the first three "differentials." CONCLUSION: PathMaster is a pilot cell image indexing program/search engine that creates an indexed reference of images. Use of such a reference may provide assistance in the diagnostic/prognostic process by furnishing a prioritized list of possible identifications for a cell of uncertain etiology.  (+info)

The NLM Indexing Initiative. (15/302)

The objective of NLM's Indexing Initiative (IND) is to investigate methods whereby automated indexing methods partially or completely substitute for current indexing practices. The project will be considered a success if methods can be designed and implemented that result in retrieval performance that is equal to or better than the retrieval performance of systems based principally on humanly assigned index terms. We describe the current state of the project and discuss our plans for the future.  (+info)

Using UMLS semantics for classification purposes. (16/302)

The Unified Medical Language System (UMLS) contains semantic information about terms from various sources; each concept can be understood and located by its relationships to other concepts. We describe a method in which the semantic relationships between UMLS concepts are exploited for the purpose of classification. This method combines three existing components: 1) Mapping terms to UMLS concepts; 2) Restricting UMLS concepts to MeSH; and 3) Mapping MeSH terms to disease categories. When applied to the automatic classification of condition terms into broad disease categories in the Clinical Trials database, this method assigned relevant categories to 92% of the 1823 condition terms encountered. 135 (7%) failed to be classified and 14 (.77%) were misclassified. The limits of this method are discussed, as well as the reuse of existing components, and the tuning required to achieve automatic classification.  (+info)