A broad-coverage natural language processing system. (57/1313)

Natural language processing systems (NLP) that extract clinical information from textual reports were shown to be effective for limited domains and for particular applications. Because an NLP system typically requires substantial resources to develop, it is beneficial if it is designed to be easily extendible to multiple domains and applications. This paper describes multiple extensions of an NLP system called MedLEE, which was originally developed for the domain of radiological reports of the chest, but has subsequently been extended to mammography, discharge summaries, all of radiology, electrocardiography, echocardiography, and pathology.  (+info)

Understanding systematic conceptual structures in polysemous medical terms. (58/1313)

Polysemy is a bottleneck for the demanding needs of semantic data management. We suggest the importance of a well-founded conceptual analysis for understanding some systematic structures underlying polysemy in the medical lexicon. We present some cases studies, which exploit the methods (ontological integration and general theories) and tools (description logics and ontology libraries) of the ONIONS methodology defined elsewhere by the authors. This paper addresses an aspect (systematic metomymies) of the project we are involved in, which investigates the feasibility of building a large-scale ontology library of medicine that integrates the most important medical terminology banks.  (+info)

A general method for sifting linguistic knowledge from structured terminologies. (59/1313)

Morphological knowledge is useful for medical language processing, information retrieval and terminology or ontology development. We show how a large volume of morphological associations between words can be learnt from existing medical terminologies by taking advantage of the semantic relations already encoded between terms in these terminologies: synonymy, hierarchy and transversal relations. The method proposed relies on no a priori linguistic knowledge. Since it can work with different relations between terms, it can be applied to any structured terminology. Tested on SNOMED and ICD in French and English, it proves to identify fairly reliable morphological relations (precision > 90%) with a good coverage (over 88% compared to the UMLS lexical variant generation program). For English words with a stem longer than 3 characters, recall reaches 98.8% for inflection and 94.7% for derivation.  (+info)

The content coverage and organizational structure of terminologies: the example of postoperative pain. (60/1313)

Concepts such as symptoms present specific representational challenges in the EMR. This is because concepts without clear boundaries and external referents such as physical objects can only be examined against other terminology-based concept representation systems. The truth and falsity of such concept representation is therefore relative to the terminology-based systems. Using the concept of acute postoperative pain as an example, we examined three terminology based approaches to representing the concept. Widely varying coverage across existing clinical terminologies was evident, although the common clinical approach to reporting attributes of symptoms provided a useful organizational structure and should be examined in relation to developing terminology and information models.  (+info)

Assessing thesaurus-based query expansion using the UMLS Metathesaurus. (61/1313)

OBJECTIVES: Assess query expansion using thesaurus relationships and definitions in the UMLS Metathesaurus for improving searching performance. METHODS: The queries from a MEDLINE test collection (OHSUMED) were expanded using synonym, hierarchical, and related term information as well as term definitions from the UMLS Metathesaurus. Documents were retrieved from a word-statistical retrieval system and assessed for recall and precision based on relevance judgments from the test collection. RESULTS: All types of query expansion degraded aggregate retrieval performance as measured by recall and precision, although 38.6% of the queries with synonym expansion and up to 29.7% of the queries with hierarchical expansion showed improvement. CONCLUSIONS: Thesaurus-based query expansion causes a decline in retrieval performance generally but improves it in specific instances. Further research must focus on identifying instances where performance improves and how it can be exploited by real users.  (+info)

Terminology Query Language: a server interface for concept-oriented terminology systems. (62/1313)

Designers of medical computing applications increasingly require terminology support for their systems. Yet, terminology systems today lack standard methodologies for providing terminology support. This invariably means increased implementation time and expense for system developers who need to use terminologies in their applications. We introduce Terminology Query Language (TQL), a simple query language interface to server implementations of concept-oriented terminologies. TQL is a declarative, set-based query language built on a generic entity-relationship (E/R) schema. TQL defines a common query-based mechanism for accessing terminology information from one or more terminology servers over a network connection.  (+info)

Discovering missed synonymy in a large concept-oriented Metathesaurus. (63/1313)

The Unified Medical Language System (UMLS) [1, 2] Metathesuarus is concept-oriented; its goal is to unite all names with identical meaning in a single Concept. The names come from its constituent vocabularies or "sources"--a wide variety of biomedical terminologies including many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic, research, full-text, and expert systems. Many offer little definitional information, and many are not themselves concept-oriented, so identifying synonymy is a challenging semantic task [3]. The rapidly increasing size of the Metathesaurus makes the task daunting, demanding effective computational support; there are more than 1.5 million names for 730,000 concepts in the January 2000 release. Vocabularies are added and updated using sophisticated lexical matching, selective algorithms, and expert review [4, 5, 6]. Yet the result is imperfect; we have discovered and corrected missed synonymy in approximately 1% of previously released concepts each year. This paper reviews general methods for finding missed synonymy and describes several specific novel approaches which we have found effective.  (+info)

A method for the automated mapping of laboratory results to LOINC. (64/1313)

LOINC is emerging as the standard for laboratory result names, and there is great interest in mapping legacy terms from laboratory systems to it. However, the mapping task is non-trivial, requiring significant resource commitment and a good understanding of the LOINC identifying attributes for the laboratory result names. Because the number of results in a laboratory system may range from around 500 to 2000 or more, manual, one-by-one matching, even with the aid of the RELMA matching tool provided by LOINC, is time consuming and laborious. Moreover, human variation may introduce mapping inconsistencies or errors. Through our experience mapping the results from a variety of laboratory systems to LOINC, an automated mapping method has been developed and is described in this paper. This method allows for data from the laboratory information system to be provided in a manner familiar to the submitting technician, and makes use of parsing and logic rules, combined with synonyms, attribute relationships and mapping frequency data, to perform automated matching to LOINC.  (+info)