Serghei Mangul, PhD, joins the faculty of the Titus Department of Clinical Pharmacy as assistant professor of pharmacy after completing a postdoctoral fellowship at UCLA's Institute for Quantitative and Computational Biosciences. Before that, he was a visiting scholar at Harvard Medical School after earning his PhD in bioinformatics at Georgia State University.
His work combines expertise in computational biology and bioscience to help close the digital divide that can prevent life scientists from maximizing the potential of data-driven investigation.
Mangul's research focuses on improving the techniques of bioinformatics — computational analysis of biological data — to better understand the mechanisms of disease. The mission of his lab is to design, develop and apply novel and robust data-driven, computational approaches that will accelerate the diffusion of genomics and biomedical data into translational research and education.
While at UCLA, Mangul helped create a plan for training research faculty in developing nations on how to enhance their bioinformatics programs through cloud computing and big data analysis. Nature Biotechnology published the concept paper, and Mangul's team also developed an online resource guide.
He received his PhD in bioinformatics from Georgia State University.
Areas of Expertise (5)
Georgia State University: Ph.D., Bioinformatics
Selected Media Appearances (1)
Team proposes plan to use bioinformatics, open data to boost science in developing countries
“A computer and a high-speed internet connection are all the infrastructure that’s required for good bioinformatics studies, and these resources are often already at universities in lower-income countries,” said study co-author Serghei Mangul, a UCLA postdoctoral scholar in computer science who specializes in biosciences...
Selected Articles (5)
Meng How Tan, Qin Li, Raghuvaran Shanmugam, Robert Piskol, Jennefer Kohler, Amy N Young, Kaiwen Ivy Liu, Rui Zhang, Gokul Ramaswami, Kentaro Ariyoshi, Ankita Gupte, Liam P Keegan, Cyril X George, Avinash Ramu, Ni Huang, Elizabeth A Pollina, Dena S Leeman, Alessandra Rustighi, YP Sharon Goh, Ajay Chawla, Giannino Del Sal, Gary Peltz, Anne Brunet, Donald F Conrad, Charles E Samuel, Mary A O’Connell, Carl R Walkley, Kazuko Nishikura, Jin Billy Li, GTEx Consortium
Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules1. Although many editing sites have recently been discovered2,3,4,5,6,7, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood8,9,10. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.
Serghei Mangul, Nicholas C Wu, Nicholas Mancuso, Alex Zelikovsky, Ren Sun, Eleazar Eskin
Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation–maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Marius Nicolae, Serghei Mangul, Ion I Măndoiu, Alex Zelikovsky
Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/ . Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.
Irina Astrovskaya, Bassam Tork, Serghei Mangul, Kelly Westbrooks, Ion Măndoiu, Peter Balfe, Alex Zelikovsky
RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences.