A computational screen for methylation guide snoRNAs in yeast.
Small nucleolar RNAs (snoRNAs) are required for ribose 2'-O-methylation of eukaryotic ribosomal RNA. Many of the genes for this snoRNA family have remained unidentified in Saccharomyces cerevisiae, despite the availability of a complete genome sequence. Probabilistic modeling methods akin to those used in speech recognition and computational linguistics were used to computationally screen the yeast genome and identify 22 methylation guide snoRNAs, snR50 to snR71. Gene disruptions and other experimental characterization confirmed their methylation guide function. In total, 51 of the 55 ribose methylated sites in yeast ribosomal RNA were assigned to 41 different guide snoRNAs. (+info)
Influence of sampling on estimates of clustering and recent transmission of Mycobacterium tuberculosis derived from DNA fingerprinting techniques.
The availability of DNA fingerprinting techniques for Mycobacterium tuberculosis has led to attempts to estimate the extent of recent transmission in populations, using the assumption that groups of tuberculosis patients with identical isolates ("clusters") are likely to reflect recently acquired infections. It is never possible to include all cases of tuberculosis in a given population in a study, and the proportion of isolates found to be clustered will depend on the completeness of the sampling. Using stochastic simulation models based on real and hypothetical populations, the authors demonstrate the influence of incomplete sampling on the estimates of clustering obtained. The results show that as the sampling fraction increases, the proportion of isolates identified as clustered also increases and the variance of the estimated proportion clustered decreases. Cluster size is also important: the underestimation of clustering for any given sampling fraction is greater, and the variability in the results obtained is larger, for populations with small clusters than for those with the same number of individuals arranged in large clusters. A considerable amount of caution should be used in interpreting the results of studies on clustering of M. tuberculosis isolates, particularly when sampling fractions are small. (+info)
Capture-recapture models including covariate effects.
Capture-recapture methods are used to estimate the incidence of a disease, using a multiple-source registry. Usually, log-linear methods are used to estimate population size, assuming that not all sources of notification are dependent. Where there are categorical covariates, a stratified analysis can be performed. The multinomial logit model has occasionally been used. In this paper, the authors compare log-linear and logit models with and without covariates, and use simulated data to compare estimates from different models. The crude estimate of population size is biased when the sources are not independent. Analyses adjusting for covariates produce less biased estimates. In the absence of covariates, or where all covariates are categorical, the log-linear model and the logit model are equivalent. The log-linear model cannot include continuous variables. To minimize potential bias in estimating incidence, covariates should be included in the design and analysis of multiple-source disease registries. (+info)
Sequence specificity, statistical potentials, and three-dimensional structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins.
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. (+info)
Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes.
We examine the similarities and differences between two widely used knowledge-based potentials, which are expressed as contact matrices (consisting of 210 elements) that gives a scale for interaction energies between the naturally occurring amino acid residues. These are the Miyazawa-Jernigan contact interaction matrix M and the potential matrix S derived by Skolnick J et al., 1997, Protein Sci 6:676-688. Although the correlation between the two matrices is good, there is a relatively large dispersion between the elements. We show that when Thr is chosen as a reference solvent within the Miyazawa and Jernigan scheme, the dispersion between the M and S matrices is reduced. The resulting interaction matrix B gives hydrophobicities that are in very good agreement with experiment. The small dispersion between the S and B matrices, which arises due to differing reference states, is shown to have dramatic effect on the predicted native states of lattice models of proteins. These findings and other arguments are used to suggest that for reliable predictions of protein structures, pairwise additive potentials are not sufficient. We also establish that optimized protein sequences can tolerate relatively large random errors in the pair potentials. We conjecture that three body interaction may be needed to predict the folds of proteins in a reliable manner. (+info)
Cloning, overexpression, purification, and physicochemical characterization of a cold shock protein homolog from the hyperthermophilic bacterium Thermotoga maritima.
Thermotoga maritima (Tm) expresses a 7 kDa monomeric protein whose 18 N-terminal amino acids show 81% identity to N-terminal sequences of cold shock proteins (Csps) from Bacillus caldolyticus and Bacillus stearothermophilus. There were only trace amounts of the protein in Thermotoga cells grown at 80 degrees C. Therefore, to perform physicochemical experiments, the gene was cloned in Escherichia coli. A DNA probe was produced by PCR from genomic Tm DNA with degenerated primers developed from the known N-terminus of TmCsp and the known C-terminus of CspB from Bacillus subtilis. Southern blot analysis of genomic Tm DNA allowed to produce a partial gene library, which was used as a template for PCRs with gene- and vector-specific primers to identify the complete DNA sequence. As reported for other csp genes, the 5' untranslated region of the mRNA was anomalously long; it contained the putative Shine-Dalgarno sequence. The coding part of the gene contained 198 bp, i.e., 66 amino acids. The sequence showed 61% identity to CspB from B. caldolyticus and high similarity to all other known Csps. Computer-based homology modeling allowed the conclusion that TmCsp represents a beta-barrel similar to CspB from B. subtilis and CspA from E. coli. As indicated by spectroscopic analysis, analytical gel permeation chromatography, and mass spectrometry, overexpression of the recombinant protein yielded authentic TmCsp with a molecular weight of 7,474 Da. This was in agreement with the results of analytical ultracentrifugation confirming the monomeric state of the protein. The temperature-induced equilibrium transition at 87 degrees C exceeds the maximum growth temperature of Tm and represents the maximal Tm-value reported for Csps so far. (+info)
pKa calculations for class A beta-lactamases: influence of substrate binding.
Beta-Lactamases are responsible for bacterial resistance to beta-lactams and are thus of major clinical importance. However, the identity of the general base involved in their mechanism of action is still unclear. Two candidate residues, Glu166 and Lys73, have been proposed to fulfill this role. Previous studies support the proposal that Glu166 acts during the deacylation, but there is no consensus on the possible role of this residue in the acylation step. Recent experimental data and theoretical considerations indicate that Lys73 is protonated in the free beta-lactamases, showing that this residue is unlikely to act as a proton abstractor. On the other hand, it has been proposed that the pKa of Lys73 would be dramatically reduced upon substrate binding and would thus be able to act as a base. To check this hypothesis, we performed continuum electrostatic calculations for five wild-type and three beta-lactamase mutants to estimate the pKa of Lys73 in the presence of substrates, both in the Henri-Michaelis complex and in the tetrahedral intermediate. In all cases, the pKa of Lys73 was computed to be above 10, showing that it is unlikely to act as a proton abstractor, even when a beta-lactam substrate is bound in the enzyme active site. The pKa of Lys234 is also raised in the tetrahedral intermediate, thus confirming a probable role of this residue in the stabilization of the tetrahedral intermediate. The influence of the beta-lactam carboxylate on the pKa values of the active-site lysines is also discussed. (+info)
Simplified methods for pKa and acid pH-dependent stability estimation in proteins: removing dielectric and counterion boundaries.
Much computational research aimed at understanding ionizable group interactions in proteins has focused on numerical solutions of the Poisson-Boltzmann (PB) equation, incorporating protein exclusion zones for solvent and counterions in a continuum model. Poor agreement with measured pKas and pH-dependent stabilities for a (protein, solvent) relative dielectric boundary of (4,80) has lead to the adoption of an intermediate (20,80) boundary. It is now shown that a simple Debye-Huckel (DH) calculation, removing both the low dielectric and counterion exclusion regions associated with protein, is equally effective in general pKa calculations. However, a broad-based discrepancy to measured pH-dependent stabilities is maintained in the absence of ionizable group interactions in the unfolded state. A simple model is introduced for these interactions, with a significantly improved match to experiment that suggests a potential utility in predicting and analyzing the acid pH-dependence of protein stability. The methods are applied to the relative pH-dependent stabilities of the pore-forming domains of colicins A and N. The results relate generally to the well-known preponderance of surface ionizable groups with solvent-mediated interactions. Although numerical PB solutions do not currently have a significant advantage for overall pKa estimations, development based on consideration of microscopic solvation energetics in tandem with the continuum model could combine the large deltapKas of a subset of ionizable groups with the overall robustness of the DH model. (+info)