Clustering WHO-ART terms using semantic distance and machine learning algorithms. (41/155)

WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clustering algorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements.  (+info)

Mining cross-terminology links in the UMLS. (42/155)

OBJECTIVE: To explore link mining approaches over transitive relationship paths in the Unified Medical Language System (UMLS). The goal is to classify relevant and 'interesting' cross-terminology links/paths for integration of Electronic Health Records (EHRs) and information resources. METHODS: We present approaches for using the link semantics as learning features, sampling the UMLS to create training examples, and ranking the classified links. We use the clinical query and MEDLINE pairs in the OHSUMED dataset to extract 'gold-links' between SNOMED-CT and MeSH respectively, and compare them against corresponding two-step transitive links generated from the UMLS. RESULTS: a). 75.7% increase in reachable MeSH concepts with two-step links as compared to direct one-step links b). 94.08% recall after link classification. CONCLUSION: Using link mining with the UMLS is a promising approach for inter-terminology translation; further research is needed to handle the exponential link growth.  (+info)

Ontology-based annotation and query of tissue microarray data. (43/155)

The Stanford Tissue Microarray Database (TMAD) is a repository of data amassed by a consortium of pathologists and biomedical researchers. The TMAD data are annotated with multiple free-text fields, specifying the pathological diagnoses for each tissue sample. These annotations are spread out over multiple text fields and are not structured according to any ontology, making it difficult to integrate this resource with other biological and clinical data. We developed methods to map these annotations to the NCI thesaurus and the SNOMED-CT ontologies. Using these two ontologies we can effectively represent about 80% of the annotations in a structured manner. This mapping offers the ability to perform ontology driven querying of the TMAD data. We also found that 40% of annotations can be mapped to terms from both ontologies, providing the potential to align the two ontologies based on experimental data. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data.  (+info)

Coverage of clinical trials tasks in existing ontologies. (44/155)

Clinical research trials involve multiple, often simultaneous processes and corresponding data that collectively involve a diverse group of stakeholders. As efforts are ongoing to enable computable clinical trials and harmonize clinical research data, an ontology targeting the domain of clinical research is essential. As part of a larger project to develop a Clinical Trials scheduling and tracking application, the domain coverage of the UMLS and two component ontologies- SNOMED CT, and the NCI Thesaurus-was evaluated in the context of common clinical trial tasks and events. In total, 102 unique activities were abstracted from 20 protocols, representing a variety of domains, and manually mapped to the target ontologies. Coverage ranged from 84% for UMLS to 32% for the NCI Thesaurus.  (+info)

Improving a UMLS based allergy list for use in live electronic medical record systems. (45/155)

The SNOMED allergy subset available through the UMLS has a variety of deficits that are substantial barriers to use in live clinical practice. These authors describe a method of enhancing a UMLS based allergy list by combining concepts from other terminologies found within the UMLS. This method resulted in a three-fold increase in the coverage allergy list compared to the standard SNOMED allergy subset.  (+info)

Representing natural-language case report form terminology using Health Level 7 Common Document Architecture, LOINC, and SNOMED-CT: lessons learned. (46/155)

Clinicians and biomedical research investigators ordinarily use natural language when describing biomedical concepts and constructs, even in the context of highly structured case report forms. We describe work in progress and lessons learned in translating complex natural-language concepts on case report forms into machine-readable format using the HL7 CDA, LOINC, and SNOMED-CT standards.  (+info)

Mapping SNOMED-CT concepts to MeSH concepts. (47/155)

In clinical and research communities there is a high demand for efficient mapping of concepts between terminology sources. We have developed and implemented a successful mapping strategy of SNOMED-CT to MeSH concepts using Apelon's TermWorks, a mapping tool based on Microsoft Excel. This poster illustrates guidelines development and testing, project implementation, and a plan for maintenance and version control.  (+info)

Generalizability of hybrid search algorithms to map multiple biomedical vocabulary domains. (48/155)

Hybrid text matching algorithms similar to those used for DNA sequencing were developed by 3M Health Information Systems to map a noisy legacy codeset to the 3M Healthcare Data Dictionary (3M HDD). Applying these techniques to map other biomedical vocabularies was briefly introduced in an earlier paper describing the algorithms. We now present results from successfully utilizing them to map different vocabularies across multiple biomedical domains, proving their generalizability.  (+info)