Carotenoid biosynthesis in cyanobacteria: structural and evolutionary scenarios based on comparative genomics. (1/1533)

Carotenoids are widely distributed pigments in nature and their biosynthetic pathway has been extensively studied in various organisms. The recent access to the overwhelming amount genomic data of cyanobacteria has given birth to a novel approach called comparative genomics. The putative enzymes involved in the carotenoid biosynthesis among the cyanobacteria were determined by similarity-based tools. The reconstruction of biosynthetic pathway was based on the related enzymes. It is interesting to find that nearly all the cyanobacteria share quite similar pathway to synthesize beta-carotene except for Gloeobacter violaceus PCC 7421. The enzymes, crtE-B-P-Qb-L, involved in the upstream pathway are more conserved than the subsequent ones (crtW-R). In addition, many carotenoid synthesis enzymes exhibit diversity in structure and function. Such examples in the families of zeta -carotene desaturase, lycopene cylases and carotene ketolases were described in this article. When we mapped these crt genes to the cyanobacterial genomes, the crt genes showed great structural variation among species. All of them are dispersed on the whole chromosome in contrast to the linear adjacent distribution of the crt gene cluster in other eubacteria. Moreover, in unicellular cyanobacteria, each step of the carotenogenic pathway is usually catalyzed by one gene product, whereas multiple ketolase genes are found in filamentous cyanobacteria. Such increased numbers of crt genes and their correlation to the ecological adaptation were carefully discussed.  (+info)

A family of human microRNA genes from miniature inverted-repeat transposable elements. (2/1533)

While hundreds of novel microRNA (miRNA) genes have been discovered in the last few years alone, the origin and evolution of these non-coding regulatory sequences remain largely obscure. In this report, we demonstrate that members of a recently discovered family of human miRNA genes, hsa-mir-548, are derived from Made1 transposable elements. Made1 elements are short miniature inverted-repeat transposable elements (MITEs), which consist of two 37 base pair (bp) terminal inverted repeats that flank 6 bp of internal sequence. Thus, Made1 elements are nearly perfect palindromes, and when expressed as RNA they form highly stable hairpin loops. Apparently, these Made1-related structures are recognized by the RNA interference enzymatic machinery and processed to form 22 bp mature miRNA sequences. Consistent with their origin from MITEs, hsa-mir-548 genes are primate-specific and have many potential paralogs in the human genome. There are more than 3,500 putative hsa-mir-548 target genes; analysis of their expression profiles and functional affinities suggests cancer-related regulatory roles for hsa-mir-548. Taken together, the characteristics of Made1 elements, and MITEs in general, point to a specific mechanism for the generation of numerous small regulatory RNAs and target sites throughout the genome. The evolutionary lineage-specific nature of MITEs could also provide for the generation of novel regulatory phenotypes related to species diversification. Finally, we propose that MITEs may represent an evolutionary link between siRNAs and miRNAs.  (+info)

Evidence for active maintenance of inverted repeat structures identified by a comparative genomic approach. (3/1533)

Inverted repeats have been found to occur in both prokaryotic and eukaryotic genomes. Usually they are short and some have important functions in various biological processes. However, long inverted repeats are rare and can cause genome instability. Analyses of C. elegans genome identified long, nearly-perfect inverted repeat sequences involving both divergently and convergently oriented homologous gene pairs and complete intergenic sequences. Comparisons with the orthologous regions from the genomes of C. briggsae and C. remanei show that the inverted repeat structures are often far more conserved than the sequences. This observation implies that there is an active mechanism for maintaining the inverted repeat nature of the sequences.  (+info)

Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes. (4/1533)

BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.  (+info)

Clinical implementation of chromosomal microarray analysis: summary of 2513 postnatal cases. (5/1533)

BACKGROUND: Array Comparative Genomic Hybridization (a-CGH) is a powerful molecular cytogenetic tool to detect genomic imbalances and study disease mechanism and pathogenesis. We report our experience with the clinical implementation of this high resolution human genome analysis, referred to as Chromosomal Microarray Analysis (CMA). METHODS AND FINDINGS: CMA was performed clinically on 2513 postnatal samples from patients referred with a variety of clinical phenotypes. The initial 775 samples were studied using CMA array version 4 and the remaining 1738 samples were analyzed with CMA version 5 containing expanded genomic coverage. Overall, CMA identified clinically relevant genomic imbalances in 8.5% of patients: 7.6% using V4 and 8.9% using V5. Among 117 cases referred for additional investigation of a known cytogenetically detectable rearrangement, CMA identified the majority (92.5%) of the genomic imbalances. Importantly, abnormal CMA findings were observed in 5.2% of patients (98/1872) with normal karyotypes/FISH results, and V5, with expanded genomic coverage, enabled a higher detection rate in this category than V4. For cases without cytogenetic results available, 8.0% (42/524) abnormal CMA results were detected; again, V5 demonstrated an increased ability to detect abnormality. Improved diagnostic potential of CMA is illustrated by 90 cases identified with 51 cryptic microdeletions and 39 predicted apparent reciprocal microduplications in 13 specific chromosomal regions associated with 11 known genomic disorders. In addition, CMA identified copy number variations (CNVs) of uncertain significance in 262 probands; however, parental studies usually facilitated clinical interpretation. Of these, 217 were interpreted as familial variants and 11 were determined to be de novo; the remaining 34 await parental studies to resolve the clinical significance. CONCLUSIONS: This large set of clinical results demonstrates the significantly improved sensitivity of CMA for the detection of clinically relevant genomic imbalances and highlights the need for comprehensive genetic counseling to facilitate accurate clinical correlation and interpretation.  (+info)

Epigenetic natural variation in Arabidopsis thaliana. (6/1533)

Cytosine methylation of repetitive sequences is widespread in plant genomes, occurring in both symmetric (CpG and CpNpG) as well as asymmetric sequence contexts. We used the methylation-dependent restriction enzyme McrBC to profile methylated DNA using tiling microarrays of Arabidopsis Chromosome 4 in two distinct ecotypes, Columbia and Landsberg erecta. We also used comparative genome hybridization to profile copy number polymorphisms. Repeated sequences and transposable elements (TEs), especially long terminal repeat retrotransposons, are densely methylated, but one third of genes also have low but detectable methylation in their transcribed regions. While TEs are almost always methylated, genic methylation is highly polymorphic, with half of all methylated genes being methylated in only one of the two ecotypes. A survey of loci in 96 Arabidopsis accessions revealed a similar degree of methylation polymorphism. Within-gene methylation is heritable, but is lost at a high frequency in segregating F(2) families. Promoter methylation is rare, and gene expression is not generally affected by differences in DNA methylation. Small interfering RNA are preferentially associated with methylated TEs, but not with methylated genes, indicating that most genic methylation is not guided by small interfering RNA. This may account for the instability of gene methylation, if occasional failure of maintenance methylation cannot be restored by other means.  (+info)

ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data. (7/1533)

BACKGROUND: Copy number alterations (CNAs) in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH). The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies. METHODOLOGY/PRINCIPAL FINDINGS: ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM). For improved speed, we use parallel computing (via MPI). Additional information (GO terms, PubMed citations, KEGG and Reactome pathways) is available for individual genes, and for sets of genes with altered copy numbers. CONCLUSIONS/SIGNIFICANCE: ADACGH represents a qualitative increase in the standards of these types of applications: a) all of the best performing algorithms are included, not just one or two; b) we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45x); c) we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms); d) we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e) all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.  (+info)

Detection of novel amplicons in prostate cancer by comprehensive genomic profiling of prostate cancer cell lines using oligonucleotide-based arrayCGH. (8/1533)

BACKGROUND: The purpose of this study was to prove the feasibility of a longmer oligonucleotide microarray platform to profile gene copy number alterations in prostate cancer cell lines and to quickly indicate novel candidate genes, which may play a role in carcinogenesis. METHODS/RESULTS AND FINDINGS: Genome-wide screening for regions of genetic gains and losses on nine prostate cancer cell lines (PC3, DU145, LNCaP, CWR22, and derived sublines) was carried out using comparative genomic hybridization on a 35,000 feature oligonucleotide microarray (arrayCGH). Compared to conventional chromosomal CGH, more deletions and small regions of gains, particularly in pericentromeric regions and regions next to the telomeres, were detected. As validation of the high-resolution of arrayCGH we further analyzed a small amplicon of 1.7 MB at 9p13.3, which was found in CWR22 and CWR22-Rv1. Increased copy number was confirmed by fluorescence in situ hybridization using the BAC clone RP11-165H19 from the amplified region comprising the two genes interleukin 11 receptor alpha (IL11-RA) and dynactin 3 (DCTN3). Using quantitative real time PCR (qPCR) we could demonstrate that IL11-RA is the gene with the highest copy number gain in the cell lines compared to DCTN3 suggesting IL11-RA to be the amplification target. Screening of 20 primary prostate carcinomas by qPCR revealed an IL11-RA copy number gain in 75% of the tumors analyzed. Gain of DCTN3 was only found in two cases together with a gain of IL11-RA. CONCLUSIONS/SIGNIFICANCE: ArrayCGH using longmer oligonucleotide microarrays is feasible for high-resolution analysis of chomosomal imbalances. Characterization of a small gained region at 9p13.3 in prostate cancer cell lines and primary prostate cancer samples by fluorescence in situ hybridization and quantitative PCR has revealed interleukin 11 receptor alpha gene as a candidate target of amplification with an amplification frequency of 75% in prostate carcinomas. Frequent amplification of IL11-RA in prostate cancer is a potential mechanism of IL11-RA overexpression in this tumor type.  (+info)