hero image
Lukasz Kurgan, Ph.D. - VCU College of Engineering. Engineering East Hall, Room E4268, Richmond, VA, US

Lukasz Kurgan, Ph.D. Lukasz Kurgan, Ph.D.

Robert J. Mattauch Endowed Professor and Vice Chair of Computer Science | VCU College of Engineering

Engineering East Hall, Room E4268, Richmond, VA, UNITED STATES

Data scientist specializing in high-throughput structural bioinformatics of proteins & small RNAs.





loading image





Lukasz Kurgan received his M.Sc. degree (with honors) in Automation and Robotics from AGH University of Science and Technology (Poland) in 1999 and a Ph.D. degree in Computer Science from University of Colorado at Boulder in 2003. He joined the University of Alberta in 2003 where he received tenure in 2007 and was promoted to the rank of Professor in 2013. He moved to the Virginia Commonwealth University in 2016 as the Robert J. Mattauch Endowed Professor of Computer Science.

Industry Expertise (5)



Computer Software



Areas of Expertise (8)

Structural Bioinformatics

Intrinsically Disordered Proteins

Protein-ligand(drug) interactions

Computer-aided molecular modeling

Big Data Analysis

Drug Repurposing

Drug Repositioning

Structural Genomics

Accomplishments (7)

Member of Faculty Opinions (professional)


Inducted as member of the "Big Data & Analytics" section of the "Bioinformatics, Biomedical Informatics & Computational Biology" area.

Author of the winning flDPnn algorithm of the international Critical Assessment of Protein Intrinsic Disorder Prediction (CAID) challenge (professional)


CAID is a worldwide competition that identifies the most accurate methods that predict the intrinsically disordered protein regions. The results were recently published in Nature Methods (https://www.nature.com/articles/s41592-021-01117-3), followed by a commentary article in the same journal that highlights our win (https://www.nature.com/articles/s41592-021-01123-5).

Fellow of the Kosciuszko Foundation Collegium of Eminent Scientists (professional)


With citation for "outstanding achievements and contributions to the Polish scientific community."

Fellow of the American Institute for Medical and Biomedical Engineering (AIMBE) (professional)


With citation for "outstanding contributions to structural bioinformatics, focusing on protein-ligand and protein-nucleic acids interactions and computational characterization of intrinsic disorder. "

Senior Member of ACM (professional)


Elevated to be a senior member of Association for Computing Machinery.

Gold Medal of Stanislaw Staszic (professional)


Recognition for outstanding academic achievements in the undergraduate and Master's studies.

Outstanding Graduate Student Award (professional)


Outstanding Graduate Student Award for the M.Sc. thesis.

Education (2)

University of Colorado at Boulder: Ph.D., Computer Science 2003

University of Science and Technology (Poland): M.Sc., Automation and Robotics 1999

Affiliations (2)

  • Professor Department of Computer Science Virginia Commonwealth University
  • Adjunt Professor Department of Electrical and Computer Engineering University of Alberta

Media Appearances (7)

Computer science research team gains international recognition for method that accurately predicts intrinsic disorder in proteins

VCU news  online


A computer science research team from VCU Engineering won an international challenge for their novel method of predicting intrinsically disordered proteins. Kurgan's award-winning method now appears in the journal Nature Communications (https://www.nature.com/articles/s41467-021-24773-7). The editors of Nature Communications also placed Kurgan's article on the Editor's Highlights page, which features a small selection of articles the editorial team believes to be particularly interesting or important.

view more

VCU professors join elite bioengineering institute

Commonwealth Times  online


Three professors were inducted into the American Institute for Medical and Biological Engineering (AIMBE) at a formal ceremony on April 9, 2018. Kurgan was nominated for his work in structural bioinformatics, using computer programs to study the structures of proteins and DNA.

view more

VCU's Kurgan supercomputer programs help biologists to speed up hypothesis generation to understand proteins

Supercomputing Online News  online


“We have manually curated but understand less than 1 percent of these proteins, and right now there’s over 80 million to solve,” said Kurgan, a Qimonda-endowed professor and data scientist. “A program can solve these proteins faster than a single human and can help researchers speed up hypothesis generation.”

Lukasz Kurgan, Ph.D., with a picture of an intrinsically disordered protein behind him. Kurgan has developed bioinformatics computer programs that help determine the functions of these proteins.

view more

Bioinformatics computer programs help biologists understand intrinsically disordered proteins

Phys.org  online


To help shed light on the workings of proteins, Virginia Commonwealth University researcher Lukasz Kurgan, Ph.D., vice chair of the Computer Science Department in the School of Engineering, has developed a series of bioinformatics programs to assist biologists in developing insights into the functions of intrinsically disordered proteins. This group of proteins lacks a fixed structure, which means they are totally or partially flexible and amorphous. Read more at: https://phys.org/news/2017-07-bioinformatics-biologists-intrinsically-disordered-proteins.html#jCp

view more

Unravelling the Complexity of Proteins

ScienceDaily  online


Interview concerning our collaborative project in structural genomics that investigates structural coverage of proteins.

view more

Crystallography for Complete Proteomes

BioTechniques  online


Interview concerning our collaborative project in structural genomics that investigates structural coverage of proteins.

view more

Zeroing In...

Drug Design and Development  online


Interview concerning a collaborative project on the development of new technologies to find and characterize drug targets.

view more

Research Grants (7)

Integrated prediction of intrinsic disorder and disorder functions with modular multi-label deep learning

NSF $500,000


Proteins are remarkable biological machines. Hundreds of millions of protein sequences were decoded over the last two decades creating a significant knowledge gap related to the fact that we do not know what most of them do. A common way to decipher protein functions relies on the sequence-to-structure-to-function paradigm where protein function is learned from the protein structure that is produced from the sequence. However, recent research has identified a large family of the intrinsically disordered proteins that lack a stable structure under physiological conditions and which therefore cannot be characterized using the structure-based approaches. These proteins are particularly abundant in the eukaryotes and are involved in the pathogenesis of numerous human diseases. The discovery of the intrinsically disordered proteins has prompted the development of a new generation of computational methods that predict presence of intrinsic disorder directly from protein sequences. A recently completed Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment has shown that these methods are fast and provide accurate results. However, while intrinsic disorder can be readily and accurately identified in protein sequences, its function remains a mystery. This proposal will conceptualize, design, implement, test and deploy an innovative machine learning method that provides highly accurate and integrated predictions of disorder and disorder functions directly from protein sequences. The team will utilize this method to produce functional annotations of disorder on an unprecedented scale of dozens of millions of proteins, addressing the knowledge gap problem for this protein family. In the long run this project will advance understanding of fundamental biological processes and related human health issues in the context of the intrinsically disordered proteins. This project will also train STEM students and researchers via high-school outreach and multidisciplinary teaching and mentoring of undergraduate and graduate students and postdoctoral researchers, producing highly skilled researchers who are sought after by industry and academia.

view more

High-throughput annotation of cellular functions of intrinsic disorder in proteins

NSF $500,000


One of fundamental problems in molecular biology is to decipher functions of millions of uncharacterized protein sequences that are rapidly generated by high-throughput genome sequencing. The sequence-to-structure-to-function paradigm was used for decades to determine functions of proteins. However, recent research has broadened this paradigm by adding new players, proteins with intrinsic disorder (ID). They are highly abundant and cannot be solved with the currently used structure-driven approach. While there are many widely used computational methods that accurately predict ID in protein sequences, methods for the prediction of the many functions of ID are lacking. This project will develop a family of novel, accurate, and high-throughput computational methods that predict all major functions of ID in protein sequences. It will produce putative functional annotations on an unprecedented scale of thousands of species, addressing the problem of high rate acquisition of raw sequence data and contributing to the increase of the rate of scientific discovery. These results will advance our understanding of fundamental biological processes and human health given the high prevalence of ID in human diseases and attractiveness of proteins with ID as drug targets.

view more

High-throughput characterization, prediction, and applications of protein disorder

NSERC $170,000


For years, scientists were convinced that proteins must fold into precise, rigid molecules to allow proteins to function correctly. This view is changing now. The intrinsically disordered proteins have at least some disordered (also called unfolded/highly flexible) parts and many of them carry out their function without ever fully folding into a rigid molecule. The disorder is highly abundant in nature and its prevalence was shown in several human diseases. However, the characterization of protein disorder is lagging behind the rapidly growing number of known proteins. Experimental annotations of disorder are time consuming and difficult and thus computational methods that predict disorder from protein sequences have emerged as a viable alternative to bridge the annotation gap and to investigate the disorder. Although the quality of these predictors continues to rise, more accurate methods and novel methods that address specific characteristics of disorder are urgently needed. Moreover, there is a pressing need to understand and characterize disorder in various proteomes and functional classes of proteins. To this end, our objectives include (1) development of a comprehensive computational platform for accurate, fast, and multi-objective prediction of disorder; and (2) applications and experimental validation of disorder predictions. This work facilitates a more complete understanding of the protein disorder, principles of protein folding, and molecular mechanisms of protein function. Our methods provide a cost and time effective solution to guide experimentalists, and they are crucial for modern research and development in several areas, including rational drug design, structural genomics, and systems biology.

view more

Early prediction of patient-related and radiological outcomes in patients with recent-onset inflammatory polyarthritis (EPA) using established and novel independent predictors

CIHR $765,175


Early inflammatory polyarthritis (EPA) describes recent-onset disease with signs of inflammation in at least 3 peripheral joints, typically starts between 40 and 55 years of age, affects up to 5% of adults over their lifetime, and results in persistent inflammatory arthritis in close to 2% (30-50% of EPA). EPA patients are clinically very similar at onset, and their prognosis remains ill-defined and frequently poor, despite the availability of effective medications and the use of remission-targeted strategies. The lack at baseline of effective prognostic markers to identify patients needing these interventions is in part responsible for missing the window of opportunity for treatment in many patients. Based on previous observations that individuals segregate into poor and good in vitro activators of bone cells called osteoclasts (OC), we propose to identify characteristics of OC precursors and of OCs formed in vitro from patients' blood cells to correlate these characteristics with severe joint damage. As RA patients have short-for-age telomeres (i.e. DNA sequences at the ends of chromosomes), we propose to determine whether short telomeres at baseline (and rapidly shortening telomeres soon thereafter) are independent predictors of severe RA-like disease in EPA patients. We will also define the role of ultrasound joint evaluation in patients who do not have bone erosions on Xrays to predict which ones will develop severe joint damage. Finally, we will look at variants of immune-related genes and at the psychosocial characteristics (e.g. depression, coping strategies, pain perception) that may predict poor pain improvement and poor outcomes. The combination of these prognostic markers will lead to a prognostic tool that may guide early treatment (both biomedical and psychosocial) targeted to those patients most likely to benefit (cost-saving) and avoid unnecessary exposure to expensive and potentially toxic drugs when these are not needed.

view more

Molecular-level prediction and mitigation of side effects of tubulin-targeting cancer therapy drugs

Alberta Cancer Foundation $50,000


The adverse drug reactions (ADRs) incur high societal costs due to the drug-related mortality and undesirable side-effects and lead to failures in the late stages of drug development. Virtually all contemporary cancer therapy drugs, including the clinically successful compounds like paclitaxel and vinblastine, trigger frequent and often severe ADRs. However, the corresponding molecular-level mechanisms are usually unclear or unknown. We will implement an automated computational platform developed for the discovery of protein-drug interactions and to apply this platform on several important anticancer agents to investigate the molecular-level mechanisms underlying the known physiological side-effects. The outcomes of this work would help in designing novel variants of drugs that reduce or eliminate the ADRs and would assist in developing more effective preventive measures.

Computational intelligence based platform for prediction and characterization of binding sites in proteins

NSERC $85,000


Proteins are nano-scale machines that catalyze chemical reactions (enzymes), form the cytoskeleton (tubulin), perform transporting functions (hemoglobin), etc. Knowledge of the tertiary structure of proteins is of pivotal importance to the understanding and manipulation of protein's biochemical and cellular functions. Protein activity is often triggered by binding of various molecules/ions (referred to as ligands) to binding sites on the protein's surface. For instance, several cancer drugs bind to the tubulin protein, alter its function and as a result block cell division. A cost-effective rational drug design, which is used to find such drugs, requires knowledge of protein surface, which is deduced from the tertiary structure, to find and characterize the binding sites. The goal of this proposal is to build an integrated, high-throughput, in silico framework for prediction and characterization of protein binding sites based on the primary protein sequence with an intermediate step of performing tertiary structure prediction. In contrast to expensive and time-consuming experimental work, which simply cannot test thousands of competing hypotheses (as we will do in our in-silico research), the proposed research represents an important, cost-effective and practicable step towards determining and characterizing binding sites from protein sequences, which can be used to cut the costs of "wet lab" experiments. This research targets specific, important applications such as rational drug design that aims to finds cures for many major human diseases.

Role of osteoclastogenesis and osteoclast activation in joint destruction in degenerative and inflammatory joint diseases

CIHR $1,012,500


In rheumatoid arthritis, resorption by osteoclasts causes local and systemic bone loss leading to collapse of joint surfaces and difficulties in replacing joints with implants. We hypothesize that enhanced osteoclast differentiation and/or activity contribute to joint destruction. To test this hypothesis, peripheral blood mononuclear cells from a transverse cohort of RA patients will be used to determine if the osteoclastogenic capacity and resorptive activity of the resulting differentiated osteoclasts correlate with disease severity. Understanding the molecular mediators of enhanced osteoclast functionality could potentially identify novel prognosis biomarkers and therapeutic targets to control joint destruction and systemic osteoporosis in rheumatoid arthritis.

view more

Courses (10)

CMSC 435 Introduction to Data Science

Virginia Commonwealth University

CMSC 635 Knowledge Discovery and Data Mining

Virginia Commonwealth University

ECE 321 Software Requirements Engineering

University of Alberta

ENCMP 100 Computer Programming for Engineers

University of Alberta

EE 280 Introduction to Digital Logic Design

University of Alberta

CMPE 310 Applying Software Engineering Practices Project

University of Alberta

ECE 625 Data Analysis and Knowledge Discovery

University of Alberta

ECE 625 Advanced Data Analysis and Decision Making

University of Alberta

CSC 4811 Computer Security

University of Colorado at Denver

CSC 5728 Software Engineering

University of Colorado at Denver

Selected Articles (18)

Intrinsic Disorder in Human RNA-Binding Proteins

Journal of Molecular Biology


Although RNA-binding proteins (RBPs) are known to be enriched in intrinsic disorder, no previous analysis focused on RBPs interacting with specific RNA types. We fill this gap with a comprehensive analysis of the putative disorder in RBPs binding to six common RNA types: messenger RNA (mRNA), transfer RNA (tRNA), small nuclear RNA (snRNA), non-coding RNA (ncRNA), ribosomal RNA (rRNA), and internal ribosome RNA (irRNA). We also analyze the amount of putative intrinsic disorder in the RNA-binding domains (RBDs) and non-RNA-binding-domain regions (non-RBD regions). Consistent with previous studies, we show that in comparison with human proteome, RBPs are significantly enriched in disorder. However, closer examination finds significant enrichment in predicted disorder for the mRNA-, rRNA- and snRNA-binding proteins, while the proteins that interact with ncRNA and irRNA are not enriched in disorder, and the tRNA-binding proteins are significantly depleted in disorder. We show a consistent pattern of significant disorder enrichment in the non-RBD regions coupled with low levels of disorder in RBDs, which suggests that disorder is relatively rarely utilized in the RNA-binding regions. Our analysis of the non-RBD regions suggests that disorder harbors posttranslational modification sites and is involved in the putative interactions with DNA. Importantly, we utilize experimental data from DisProt and independent data from Pfam to validate the above observations that rely on the disorder predictions. This study provides new insights into the distribution of disorder across proteins that bind different RNA types and the functional role of disorder in the regions where it is enriched.

view more

flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions

Nature Communications


Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn’s webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/

view more

DescribePROT: database of amino acid-level protein structure and function predictions

Nucleic Acids Research


We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

view more

IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell

Cellular and Molecular Life Sciences


Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.

view more

DEPICTER: Intrinsic Disorder and Disorder Function Prediction Server

Journal of Molecular Biology


Computational predictions of the intrinsic disorder and its functions are instrumental to facilitate annotation for the millions of unannotated proteins. However, access to these predictors is fragmented and requires substantial effort to find them and to collect and combine their results. The DEPICTER (DisorderEd PredictIon CenTER) server provides first-of-its-kind centralized access to 10 popular disorder and disorder function predictions that cover protein and nucleic acids binding, linkers, and moonlighting regions. It automates the prediction process, runs user-selected methods on the server side, visualizes the results, and outputs all predictions in a consistent and easy-to-parse format. DEPICTER also includes two accurate consensus predictors of disorder and disordered protein binding. Empirical tests on an independent (low similarity) benchmark dataset reveal that the computational tools included in DEPICTER generate accurate predictions that are significantly better than the results secured using sequence alignment. The DEPICTER server is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER/.

view more

Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity



Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.

view more

DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues

Nucleic Acids Research


Protein-DNA and protein-RNA interactions are part of many diverse and essential cellular functions and yet most of them remain to be discovered and characterized. Recent research shows that sequence-based predictors of DNA-binding residues accurately find these residues but also cross-predict many RNA-binding residues as DNA-binding, and vice versa. Most of these methods are also relatively slow, prohibiting applications on the whole-genome scale. We describe a novel sequence-based method, DRNApred, which accurately and in high-throughput predicts and discriminates between DNA- and RNA-binding residues. DRNApred was designed using a new dataset with both DNA- and RNA-binding proteins, regression that penalizes cross-predictions, and a novel two-layered architecture. DRNApred outperforms state-of-the-art predictors of DNA- or RNA-binding residues on a benchmark test dataset by substantially reducing the cross predictions and predicting arguably higher quality false positives that are located nearby the native binding residues. Moreover, it also more accurately predicts the DNA- and RNA-binding proteins. Application on the human proteome confirms that DRNApred reduces the cross predictions among the native nucleic acid binders. Also, novel putative DNA/RNA-binding proteins that it predicts share similar subcellular locations and residue charge profiles with the known native binding proteins. Webserver of DRNApred is freely available at http://biomine.cs.vcu.edu/servers/DRNApred/.

view more

Disordered Nucleiome: Abundance of Intrinsic Disorder in the DNA- and RNA-binding Proteins in 1121 Species from Eukaryota, Bacteria and Archaea



Intrinsically disordered proteins (IDPs) are abundant in various proteomes, where they play numerous important roles and complement biological activities of ordered proteins. Among functions assigned to IDPs are interactions with nucleic acids. However, often, such assignments are made based on the guilty-by-association principle. The validity of the extension of these correlations to all nucleic acid binding proteins has never been analyzed on a large scale across all domains of life. To fill this gap, we perform a comprehensive computational analysis of the abundance of intrinsic disorder and intrinsically disordered domains in nucleiomes (∼548 000 nucleic acid binding proteins) of 1121 species from Archaea, Bacteria and Eukaryota. Nucleiome is a whole complement of proteins involved in interactions with nucleic acids. We show that relative to other proteins in the corresponding proteomes, the DNA-binding proteins have significantly increased disorder content and are significantly enriched in disordered domains in Eukaryotes but not in Archaea and Bacteria. The RNA-binding proteins are significantly enriched in the disordered domains in Bacteria, Archaea and Eukaryota, while the overall abundance of disorder in these proteins is significantly increased in Bacteria, Archaea, animals and fungi. The high abundance of disorder in nucleiomes supports the notion that the nucleic acid binding proteins often require intrinsic disorder for their functions and regulation.

view more

Molecular Recognition Features (MoRFs) in Three Domains of Life

Molecular BioSystems


Intrinsically disordered proteins and protein regions offer numerous advantages in the context of protein-protein interactions when compared to the structured proteins and domains. These advantages include ability to interact with multiple partners, to fold into different conformations when bound to different partners, and to undergo disorder-to-order transitions concomitant with their functional activity. Molecular recognition features (MoRFs) are widespread elements located in disordered regions that undergo disorder-to-order transition upon binding to their protein partners. We characterize abundance, composition, and functions of MoRFs and their association with the disordered regions across 868 species spread across Eukaryota, Bacteria and Archaea. We found that although disorder is substantially elevated in Eukaryota, MoRFs have similar abundance and amino acid composition across the three domains of life. The abundance of MoRFs is highly correlated with the amount of intrinsic disorder in Bacteria and Archaea but only modestly correlated in Eukaryota. Proteins with MoRFs have significantly more disorder and MoRFs are present in many disordered regions, with Eukaryota having more MoRF-free disordered regions. MoRF-containing proteins are enriched in the ribosome, nucleus, nucleolus and microtubule and are involved in translation, protein transport, protein folding, and interactions with DNAs. Our insights into the nature and function of MoRFs enhance our understanding of the mechanisms underlying the disorder-to-order transition and protein-protein recognition and interactions.

view more

PDID: Database of Molecular-level Putative Protein-drug Interactions in the Structural Human Proteome



Many drugs interact with numerous proteins besides their intended therapeutic targets and a substantial portion of these interactions is yet to be elucidated. Protein-Drug Interaction Database (PDID) addresses incompleteness of these data by providing access to putative protein-drug interactions that cover the entire structural human proteome. PDID covers 9652 structures from 3746 proteins and houses 16 800 putative interactions generated from close to 1.1 million accurate, all-atom structure-based predictions for several dozens of popular drugs. The predictions were generated with three modern methods: ILbind, SMAP and eFindSite. They are accompanied by propensity scores that quantify likelihood of interactions and coordinates of the putative location of the binding drugs in the corresponding protein structures. PDID complements the current databases that focus on the curated interactions and the BioDrugScreen database that relies on docking to find putative interactions. Moreover, we also include experimentally curated interactions which are linked to their sources: DrugBank, BindingDB and Protein Data Bank. Our database can be used to facilitate studies related to polypharmacology of drugs including repurposing and explaining side effects of drugs.

view more

A Comprehensive Comparative Review of Sequence Based Pedictors of DNA and RNA Binding Residues

Briefings in Bioinformatics


Motivated by the pressing need to characterize protein-DNA and protein-RNA interactions on large scale, we review a comprehensive set of 30 computational methods for high-throughput prediction of RNA- or DNA-binding residues from protein sequences. We summarize these predictors from several significant perspectives including their design, outputs and availability. We perform empirical assessment of methods that offer web servers using a new benchmark data set characterized by a more complete annotation that includes binding residues transferred from the same or similar proteins. We show that predictors of DNA-binding (RNA-binding) residues offer relatively strong predictive performance but they are unable to properly separate DNA- from RNA-binding residues. We design and empirically assess several types of consensuses and demonstrate that machine learning (ML)-based approaches provide improved predictive performance when compared with the individual predictors of DNA-binding residues or RNA-binding residues. We also formulate and execute first-of-its-kind study that targets combined prediction of DNA- and RNA-binding residues. We design and test three types of consensuses for this prediction and conclude that this novel approach that relies on ML design provides better predictive quality than individual predictors when tested on prediction of DNA- and RNA-binding residues individually. It also substantially improves discrimination between these two types of nucleic acids. Our results suggest that development of a new generation of predictors would benefit from using training data sets that combine both RNA- and DNA-binding proteins, designing new inputs that specifically target either DNA- or RNA-binding residues and pursuing combined prediction of DNA- and RNA-binding residues.

view more

High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder

Nucleic Acids Research


Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein–protein interaction networks.

view more

Comprehensive overview and assessment of computational prediction of microRNA targets in animals

Briefings in Bioinformatics


2014 MicroRNAs (miRNAs) are short endogenous noncoding RNAs that bind to target mRNAs, usually resulting in degradation and translational repression. Identification of miRNA targets is crucial for deciphering functional roles of the numerous miRNAs that are rapidly generated by sequencing efforts. Computational prediction methods are widely used for high-throughput generation of putative miRNA targets. We review a comprehensive collection of 38 miRNA sequence-based computational target predictors in animals that were developed over the past decade. Our in-depth analysis considers all significant perspectives including the underlying predictive methodologies with focus on how they draw from the mechanistic basis of the miRNA–mRNA interaction. We also discuss ease of use, availability, impact of the considered predictors and the evaluation protocols that were used to assess them. We are the first to comparatively and comprehensively evaluate seven representative methods when predicting miRNA targets at the duplex and gene levels. The gene-level evaluation is based on three benchmark data sets that rely on different ways to annotate targets including biochemical assays, microarrays and pSILAC. We offer practical advice on selection of appropriate predictors according to certain properties of miRNA sequences, characteristics of a specific application and desired levels of predictive quality. We also discuss future work related to the design of new models, data quality, improved usability, need for standardized evaluation and ability to predict mRNA expression changes.

view more

Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life

Cellular and Molecular Life Sciences


Recent years witnessed increased interest in intrinsically disordered proteins and regions. These proteins and regions are abundant and possess unique structural features and a broad functional repertoire that complements ordered proteins. However, modern studies on the abundance and functions of intrinsically disordered proteins and regions are relatively limited in size and scope of their analysis. To fill this gap, we performed a broad and detailed computational analysis of over 6 million proteins from 59 archaea, 471 bacterial, 110 eukaryotic and 325 viral proteomes. We used arguably more accurate consensus-based disorder predictions, and for the first time comprehensively characterized intrinsic disorder at proteomic and protein levels from all significant perspectives, including abundance, cellular localization, functional roles, evolution, and impact on structural coverage. We show that intrinsic disorder is more abundant and has a unique profile in eukaryotes. We map disorder into archaea, bacterial and eukaryotic cells, and demonstrate that it is preferentially located in some cellular compartments. Functional analysis that considers over 1,200 annotations shows that certain functions are exclusively implemented by intrinsically disordered proteins and regions, and that some of them are specific to certain domains of life. We reveal that disordered regions are often targets for various post-translational modifications, but primarily in the eukaryotes and viruses. Using a phylogenetic tree for 14 eukaryotic and 112 bacterial species, we analyzed relations between disorder, sequence conservation and evolutionary speed. We provide a complete analysis that clearly shows that intrinsic disorder is exceptionally and uniquely abundant in each domain of life.

view more

Human structural proteome-wide characterization of Cyclosporine A targets



Off-target interactions of a popular immunosuppressant Cyclosporine A (CSA) with several proteins besides its molecular target, cyclophilin A, are implicated in the activation of signaling pathways that lead to numerous side effects of this drug. Using structural human proteome and a novel algorithm for inverse ligand binding prediction, ILbind, we determined a comprehensive set of 100+ putative partners of CSA. We empirically show that predictive quality of ILbind is better compared with other available predictors for this compound. We linked the putative target proteins, which include many new partners of CSA, with cellular functions, canonical pathways and toxicities that are typical for patients who take this drug. We used complementary approaches (molecular docking, molecular dynamics, surface plasmon resonance binding analysis and enzymatic assays) to validate and characterize three novel CSA targets: calpain 2, caspase 3 and p38 MAP kinase 14. The three targets are involved in the apoptotic pathways, are interconnected and are implicated in nephrotoxicity.

view more

Interplay Between the Oxidoreductase PDIA6 and microRNA-322 Controls the Response to Disrupted Endoplasmic Reticulum Calcium Homeostasis

Science Signaling


The disruption of the energy or nutrient balance triggers endoplasmic reticulum (ER) stress, a process that mobilizes various strategies, collectively called the unfolded protein response (UPR), which reestablish homeostasis of the ER and cell. Activation of the UPR stress sensor IRE1α (inositol-requiring enzyme 1α) stimulates its endoribonuclease activity, leading to the generation of the mRNA encoding the transcription factor XBP1 (X-box binding protein 1), which regulates the transcription of genes encoding factors involved in controlling the quality and folding of proteins. We found that the activity of IRE1α was regulated by the ER oxidoreductase PDIA6 (protein disulfide isomerase A6) and the microRNA miR-322 in response to disruption of ER Ca2+ homeostasis. PDIA6 interacted with IRE1α and enhanced IRE1α activity as monitored by phosphorylation of IRE1α and XBP1 mRNA splicing, but PDIA6 did not substantially affect the activity of other pathways that mediate responses to ER stress. ER Ca2+ depletion and activation of store-operated Ca2+ entry reduced the abundance of the microRNA miR-322, which increased PDIA6 mRNA stability and, consequently, IRE1α activity during the ER stress response. In vivo experiments with mice and worms showed that the induction of ER stress correlated with decreased miR-322 abundance, increased PDIA6 mRNA abundance, or both. Together, these findings demonstrated that ER Ca2+, PDIA6, IRE1α, and miR-322 function in a dynamic feedback loop modulating the UPR under conditions of disrupted ER Ca2+ homeostasis.

view more

Disordered Proteinaceous Machines

Chemical Reviews


Intrinsic disorder is abundant in proteins involved in signaling and regulatory processes, where disorder-mediated protein interactions enable transient signaling complexes. On the other hand, intrinsic disorder provides various benefits for organization of large protein assemblages. In addition to the transient signaling complexes, there are numerous stable protein complexes (oligomers) that represent a functional form of proteinaceous machines. Functional disorder could be two distinctive types: (i) internal for assembly and movement of the different parts and (ii) external for interaction with regulators. The goal of this Review is to show that intrinsic disorder impacts the function and assembly of the proteinaceous machines. The first half of this Review considers some general aspects related to the involvement of intrinsic disorder in assembly and function of the protein complexes, whereas the second half is dedicated to the representation of some illustrative examples of pliable proteinaceous machines.

view more

Resilience of death: intrinsic disorder in proteins involved in the programmed cell death

Cell Death and Differentiation


It is recognized now that intrinsically disordered proteins (IDPs), which do not have unique 3D structures as a whole or in noticeable parts, constitute a significant fraction of any given proteome. IDPs are characterized by an astonishing structural and functional diversity that defines their ability to be universal regulators of various cellular pathways. Programmed cell death (PCD) is one of the most intricate cellular processes where the cell uses specialized cellular machinery and intracellular programs to kill itself. This cell-suicide mechanism enables metazoans to control cell numbers and to eliminate cells that threaten the animal’s survival. PCD includes several specific modules, such as apoptosis, autophagy, and programmed necrosis (necroptosis). These modules are not only tightly regulated but also intimately interconnected and are jointly controlled via a complex set of protein–protein interactions. To understand the role of the intrinsic disorder in controlling and regulating the PCD, several large sets of PCD-related proteins across 28 species were analyzed using a wide array of modern bioinformatics tools. This study indicates that the intrinsic disorder phenomenon has to be taken into consideration to generate a complete picture of the interconnected processes, pathways, and modules that determine the essence of the PCD. We demonstrate that proteins involved in regulation and execution of PCD possess substantial amount of intrinsic disorder. We annotate functional roles of disorder across and within apoptosis, autophagy, and necroptosis processes. Disordered regions are shown to be implemented in a number of crucial functions, such as protein–protein interactions, interactions with other partners including nucleic acids and other ligands, are enriched in post-translational modification sites, and are characterized by specific evolutionary patterns. We mapped the disorder into an integrated network of PCD pathways and into the interactomes of selected proteins that are involved in the p53-mediated apoptotic signaling pathway.

view more