Using WordNet to improve the mapping of data elements to UMLS for data sources integration. (33/73)

Each biomedical system has its own way of naming the pieces of information it contains, i.e., of defining its data elements (DEs). Integrating DEs facilitates the integration of biomedical resources. However, the mapping of DEs to the UMLS is ambiguous in many cases, when any correspondence is found at all. We propose to evaluate the potential contribution of a more general terminology: WordNet. Our method is based on synonyms, definitions, and structural properties of the terminologies. We applied it to a set of 474 DEs extracted from eleven biomedical sources. We show that WordNet can improve the direct mapping of DEs to UMLS when used to validate and disambiguate UMLS direct mappings. WordNet can also help identify indirect mappings of DEs to the UMLS.  (+info)

Maintaining mappings from source systems in a local health information infrastructure. (34/73)

We developed a program to assist in managing changes in source system observation terms. The program returns candidate matches based on approximate string comparator scores. A preliminary evaluation of the tool for managing radiology term updates demonstrates its usefulness by identifying exact matches for 61% of new terms and high probability matches for another 25% of new terms.  (+info)

A UMLS-based spell checker for natural language processing in vaccine safety. (35/73)

BACKGROUND: The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS: We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. CONCLUSION: We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.  (+info)

An XML model of an enhanced data dictionary to facilitate the exchange of pre-existing clinical research data in international studies. (36/73)

Pre-existing clinical research data sets exchanged in international epidemiology research often lack the elements needed to assess their suitability for use in multi-region meta-analyses or other clinical studies. While the missing information is generally known to local investigators, it is not contained in the files exchanged between sites. Instead, such content must be solicited by the study coordinating center though a series of lengthy phone and electronic communications: an informal process whose reproducibility and accuracy decays over time. This report describes a set of supplemental information needed to assess whether clinical research data from diverse research sites are truly comparable, and what metadata ("data about the data") should be preserved when a data set is archived for future use. We propose a structured Extensible Markup Language (XML) model that captures this information. The authors hope this model will be a first step towards preserving the metadata associated with clinical research data sets, thereby improving the quality of international data exchange, data archiving, and merged-data research using data collected in many different countries, languages and care settings.  (+info)

Representation of clinical laboratory terminology in the Unified Medical Language System. (37/73)

The Unified Medical Language System (UMLS) was examined to determine its coverage of clinical laboratory terminology in use at the Columbia-Presbyterian Medical Center (CPMC). The Metathesaurus (Meta-1) contains exact matches for 30% of 1460 CPMC laboratory terms and near matches for an additional 42%, with better coverage of atomic-level concepts ("substance" terms) than complex ones (tests and panels). The Semantic Network includes types for representing laboratory procedures (2), measured substances (at least 56) and sampled substances (at least 14), but no type to represent specimens. Few of the UMLS semantic relationships are applicable to the CPMC vocabulary. These results have implications for the utility of the UMLS for linking clinical databases to electronic medical information sources.  (+info)

Synonym set extraction from the biomedical literature by lexical pattern discovery. (38/73)

 (+info)

Normalizing biomedical terms by minimizing ambiguity and variability. (39/73)

 (+info)

Assessment of disease named entity recognition on a corpus of annotated sentences. (40/73)

 (+info)