Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. (9/196)

BACKGROUND: Studies on ethnic variations in health have played an important role in aetiological and health services research. Most routine datasets, however, do not include information on ethnicity. South Asians, one of the largest minority ethnic groups in Britain, have distinctive names that also allow differentiation of the main sub-groups with their important differences in health-related exposures and disease risks. METHODS: A computerized name recognition algorithm (SANGRA) was developed incorporating directories of South Asian first names and surnames together with their religious and linguistic origin. SANGRA was validated using health-related data with self-ascribed information on ethnicity. RESULTS: SANGRA was successful in recognizing South Asian origin in reference datasets, with sensitivity of 89-96 per cent, specificity of 94-98 per cent, positive predictive value (PPV) of 80-89 per cent and negative predictive value (NPV) of 98-99 per cent. Religious origin was correctly assigned in the majority of cases: sensitivity, specificity and PPV were 94 per cent, 91 per cent and 90 per cent for Hindus; 90 per cent, 99 per cent and 98 per cent for Muslims; and 76 per cent, 99 per cent and 94 per cent for Sikhs. SANGRA correctly identified 76 per cent Gujerati and 70 per cent Punjabi names, although only 62 per cent of Gujerati names were sufficiently distinct to be allocated to the Gujerati-only category and only 53 per cent Punjabi names were allocated to the Punjabi-only category. However, specificity and PPV were high for both languages (respectively 97 per cent and 93 per cent for Gujerati, and 99 per cent and 97 per cent for Punjabi). CONCLUSIONS: SANGRA provides a practical and valid method of ascertaining South Asian origin by name and, to a lesser degree of accuracy, of differentiating between the main religious and linguistic subgroups living in Britain. This algorithm will be useful in health-related studies where information on self-ascribed ethnicity is not available or is of a limited nature.  (+info)

Cognitive rehabilitation of naming deficits following viral meningo-encephalitis. (10/196)

OBJECTIVE: This case study describes the neuropsychological assessment and cognitive rehabilitation of a patient who developed word retrieval deficits for objects and people's names, following an episode of viral meningo-encephalitits. It shows the implementation and outcome of two techniques adapted to the patient's individual characteristics and context providing a more ecologically valid approach. METHODS: In the first technique, "verbal semantic association", the patient was required to describe what she knew about an object as a strategy to help her retrieve its name. In the second one, "face-name association" she was taught to apply a visual-imagery technique in order to retrieve relevant people's names. RESULTS: Following the implementation of these procedures there was a decrease in the number of episodes of failure to retrieve objects and people's names in her everyday life context. CONCLUSION: The improvement found in the patient's ability to retrieve words is discussed in terms of the utility of cognitive rehabilitation programmes and cognitive models of language processing  (+info)

Spanish personal name variations in national and international biomedical databases: implications for information retrieval and bibliometric studies. (11/196)

OBJECTIVES: The study sought to investigate how Spanish names are handled by national and international databases and to identify mistakes that can undermine the usefulness of these databases for locating and retrieving works by Spanish authors. METHODS: The authors sampled 172 articles published by authors from the University of Granada Medical School between 1987 and 1996 and analyzed the variations in how each of their names was indexed in Science Citation Index (SCI), MEDLINE, and Indice Medico Espanol (IME). The number and types of variants that appeared for each author's name were recorded and compared across databases to identify inconsistencies in indexing practices. We analyzed the relationship between variability (number of variants of an author's name) and productivity (number of items the name was associated with as an author), the consequences for retrieval of information, and the most frequent indexing structures used for Spanish names. RESULTS: The proportion of authors who appeared under more then one name was 48.1% in SCI, 50.7% in MEDLINE, and 69.0% in IME. Productivity correlated directly with variability: more than 50% of the authors listed on five to ten items appeared under more than one name in any given database, and close to 100% of the authors listed on more than ten items appeared under two or more variants. Productivity correlated inversely with retrievability: as the number of variants for a name increased, the number of items retrieved under each variant decreased. For the most highly productive authors, the number of items retrieved under each variant tended toward one. The most frequent indexing methods varied between databases. In MEDLINE and IME, names were indexed correctly as "first surname second surname, first name initial middle name initial" (if present) in 41.7% and 49.5% of the records, respectively. However, in SCI, the most frequent method was "first surname, first name initial second name initial" (48.0% of the records) and first surname and second surname run together, first name initial (18.3%). CONCLUSIONS: Retrievability on the basis of author's name was poor in all three databases. Each database uses accurate indexing methods, but these methods fail to result in consistency or coherence for specific entries. The likely causes of inconsistency are: (1) use by authors of variants of their names during their publication careers, (2) lack of authority control in all three databases, (3) the use of an inappropriate indexing method for Spanish names in SCI, (4) authors' inconsistent behaviors, and (5) possible editorial interventions by some journals. We offer some suggestions as to how to avert the proliferation of author name variants in the databases.  (+info)

Do "Shufflebottoms" bottom shuffle? (12/196)

AIMS: To investigate anecdotal evidence that the name "Shufflebottom" originates from the dominantly inherited characteristic of bottom shuffling. METHODS: A questionnaire based retrospective study to determine the incidence of bottom shuffling and age of first walking among those named "Shufflebottom" and a control population, of those named "Walker". RESULTS: There was no statistically significant difference in incidence of bottom shuffling or age at first walking, between the two groups. The incidence of bottom shuffling (21.4%) was generally higher than has been described previously and Walkers were more likely to walk later than Shufflebottoms. CONCLUSION: Shufflebottoms are no more likely to bottom shuffle than other children. The origin of the surname as representing this physical characteristic cannot be confirmed.  (+info)

Identification of patient name references within medical documents using semantic selectional restrictions. (13/196)

De-identification of a patient's personal data from medical records is a protective legal requirement imposed before medical documents can be used for research purposes or transferred to other healthcare providers (e.g., teachers, students, tele-consultations). This de-identification process is tedious if performed manually, and is known to be quite faulty in direct search and replace strategies [9]. In this paper, we report on the identification step of this process. The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions. The semantic restrictions place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports. Maximum entropy classifiers are used to provide a probabilistic measure of the belief of a given candidate token to a given semantic restriction. We report on the design and preliminary evaluation of the system within the do-main of pediatric urology.  (+info)

A successful technique for removing names in pathology reports using an augmented search and replace method. (14/196)

The ability to access large amounts of de-identified clinical data would facilitate epidemiologic and retrospective research. Previously described de-identification methods require knowledge of natural language processing or have not been made available to the public. We take advantage of the fact that the vast majority of proper names in pathology reports occur in pairs. In rare cases where one proper name is by itself, it is preceded or followed by an affix that identifies it as a proper name (Mrs., Dr., PhD). We created a tool based on this observation using substitution methods that was easy to implement and was largely based on publicly available data sources. We compiled a Clinical and Common Usage Word (CCUW) list as well as a fairly comprehensive proper name list. Despite the large overlap between these two lists, we were able to refine our methods to achieve accuracy similar to previous attempts at de-identification. Our method found 98.7% of 231 proper names in the narrative sections of pathology reports. Three single proper names were missed out of 1001 pathology reports (0.3%, no first name/last name pairs). It is unlikely that identification could be implied from this information. We will continue to refine our methods, specifically working to improve the quality of our CCUW and proper name lists to obtain higher levels of accuracy.  (+info)

Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. (15/196)

Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alone  (+info)

Dynamics of the hippocampus during encoding and retrieval of face-name pairs. (16/196)

The medial temporal lobe (MTL) is critical in forming new memories, but how subregions within the MTL carry out encoding and retrieval processes in humans is unknown. Using new high-resolution functional magnetic resonance imaging (fMRI) acquisition and analysis methods, we identified mnemonic properties of different subregions within the hippocampal circuitry as human subjects learned to associate names with faces. The cornu ammonis (CA) fields 2 and 3 and the dentate gyrus were active relative to baseline only during encoding, and this activity decreased as associations were learned. Activity in the subiculum showed the same temporal decline, but primarily during retrieval. Our results demonstrate that subdivisions within the hippocampus make distinct contributions to new memory formation.  (+info)