Lukasz Kurgan received his M.Sc. degree (with honors) in Automation and Robotics from AGH University of Science and Technology (Poland) in 1999 and a Ph.D. degree in Computer Science from University of Colorado at Boulder in 2003. He joined the University of Alberta in 2003 where he received tenure in 2007 and was promoted to the rank of Professor in 2013. He moved to the Virginia Commonwealth University in 2016 as the Qimonda Endowed Professor of Computer Science.
Industry Expertise (5)
Areas of Expertise (9)
Intrinsically Disordered Proteins
Computer-aided molecular modeling
Big Data Analysis
Fellow of the Kosciuszko Foundation Collegium of Eminent Scientists (professional)
With citation for "outstanding achievements and contributions to the Polish scientific community."
Fellow of the American Institute for Medical and Biomedical Engineering (AIMBE) (professional)
With citation for "outstanding contributions to structural bioinformatics, focusing on protein-ligand and protein-nucleic acids interactions and computational characterization of intrinsic disorder. "
Senior Member of ACM (professional)
Elevated to be a senior member of Association for Computing Machinery.
Gold Medal of Stanislaw Staszic (professional)
Recognition for outstanding academic achievements in the undergraduate and Master's studies.
Outstanding Graduate Student Award (professional)
Outstanding Graduate Student Award for the M.Sc. thesis.
University of Colorado at Boulder: Ph.D., Computer Science 2003
University of Science and Technology (Poland): M.Sc., Automation and Robotics 1999
- Professor Department of Computer Science Virginia Commonwealth University
- Adjunt Professor Department of Electrical and Computer Engineering University of Alberta
Media Appearances (5)
VCU's Kurgan supercomputer programs help biologists to speed up hypothesis generation to understand proteins
Supercomputing Online News online
“We have manually curated but understand less than 1 percent of these proteins, and right now there’s over 80 million to solve,” said Kurgan, a Qimonda-endowed professor and data scientist. “A program can solve these proteins faster than a single human and can help researchers speed up hypothesis generation.”
Bioinformatics computer programs help biologists understand intrinsically disordered proteins
To help shed light on the workings of proteins, Virginia Commonwealth University researcher Lukasz Kurgan, Ph.D., vice chair of the Computer Science Department in the School of Engineering, has developed a series of bioinformatics programs to assist biologists in developing insights into the functions of intrinsically disordered proteins. This group of proteins lacks a fixed structure, which means they are totally or partially flexible and amorphous.
Read more at: https://phys.org/news/2017-07-bioinformatics-biologists-intrinsically-disordered-proteins.html#jCp
Unravelling the Complexity of Proteins
Interview concerning our collaborative project in structural genomics that investigates structural coverage of proteins.
Crystallography for Complete Proteomes
Interview concerning our collaborative project in structural genomics that investigates structural coverage of proteins.
Drug Design and Development online
Interview concerning a collaborative project on the development of new technologies to find and characterize drug targets.
Research Grants (6)
High-throughput annotation of cellular functions of intrinsic disorder in proteins
One of fundamental problems in molecular biology is to decipher functions of millions of uncharacterized protein sequences that are rapidly generated by high-throughput genome sequencing. The sequence-to-structure-to-function paradigm was used for decades to determine functions of proteins. However, recent research has broadened this paradigm by adding new players, proteins with intrinsic disorder (ID). They are highly abundant and cannot be solved with the currently used structure-driven approach. While there are many widely used computational methods that accurately predict ID in protein sequences, methods for the prediction of the many functions of ID are lacking. This project will develop a family of novel, accurate, and high-throughput computational methods that predict all major functions of ID in protein sequences. It will produce putative functional annotations on an unprecedented scale of thousands of species, addressing the problem of high rate acquisition of raw sequence data and contributing to the increase of the rate of scientific discovery. These results will advance our understanding of fundamental biological processes and human health given the high prevalence of ID in human diseases and attractiveness of proteins with ID as drug targets.
High-throughput characterization, prediction, and applications of protein disorder
For years, scientists were convinced that proteins must fold into precise, rigid molecules to allow proteins to function correctly. This view is changing now. The intrinsically disordered proteins have at least some disordered (also called unfolded/highly flexible) parts and many of them carry out their function without ever fully folding into a rigid molecule. The disorder is highly abundant in nature and its prevalence was shown in several human diseases. However, the characterization of protein disorder is lagging behind the rapidly growing number of known proteins. Experimental annotations of disorder are time consuming and difficult and thus computational methods that predict disorder from protein sequences have emerged as a viable alternative to bridge the annotation gap and to investigate the disorder. Although the quality of these predictors continues to rise, more accurate methods and novel methods that address specific characteristics of disorder are urgently needed. Moreover, there is a pressing need to understand and characterize disorder in various proteomes and functional classes of proteins. To this end, our objectives include (1) development of a comprehensive computational platform for accurate, fast, and multi-objective prediction of disorder; and (2) applications and experimental validation of disorder predictions. This work facilitates a more complete understanding of the protein disorder, principles of protein folding, and molecular mechanisms of protein function. Our methods provide a cost and time effective solution to guide experimentalists, and they are crucial for modern research and development in several areas, including rational drug design, structural genomics, and systems biology.
Early prediction of patient-related and radiological outcomes in patients with recent-onset inflammatory polyarthritis (EPA) using established and novel independent predictors
Early inflammatory polyarthritis (EPA) describes recent-onset disease with signs of inflammation in at least 3 peripheral joints, typically starts between 40 and 55 years of age, affects up to 5% of adults over their lifetime, and results in persistent inflammatory arthritis in close to 2% (30-50% of EPA). EPA patients are clinically very similar at onset, and their prognosis remains ill-defined and frequently poor, despite the availability of effective medications and the use of remission-targeted strategies. The lack at baseline of effective prognostic markers to identify patients needing these interventions is in part responsible for missing the window of opportunity for treatment in many patients. Based on previous observations that individuals segregate into poor and good in vitro activators of bone cells called osteoclasts (OC), we propose to identify characteristics of OC precursors and of OCs formed in vitro from patients' blood cells to correlate these characteristics with severe joint damage. As RA patients have short-for-age telomeres (i.e. DNA sequences at the ends of chromosomes), we propose to determine whether short telomeres at baseline (and rapidly shortening telomeres soon thereafter) are independent predictors of severe RA-like disease in EPA patients. We will also define the role of ultrasound joint evaluation in patients who do not have bone erosions on Xrays to predict which ones will develop severe joint damage. Finally, we will look at variants of immune-related genes and at the psychosocial characteristics (e.g. depression, coping strategies, pain perception) that may predict poor pain improvement and poor outcomes. The combination of these prognostic markers will lead to a prognostic tool that may guide early treatment (both biomedical and psychosocial) targeted to those patients most likely to benefit (cost-saving) and avoid unnecessary exposure to expensive and potentially toxic drugs when these are not needed.
Molecular-level prediction and mitigation of side effects of tubulin-targeting cancer therapy drugs
Alberta Cancer Foundation $50,000
The adverse drug reactions (ADRs) incur high societal costs due to the drug-related mortality and undesirable side-effects and lead to failures in the late stages of drug development. Virtually all contemporary cancer therapy drugs, including the clinically successful compounds like paclitaxel and vinblastine, trigger frequent and often severe ADRs. However, the corresponding molecular-level mechanisms are usually unclear or unknown. We will implement an automated computational platform developed for the discovery of protein-drug interactions and to apply this platform on several important anticancer agents to investigate the molecular-level mechanisms underlying the known physiological side-effects. The outcomes of this work would help in designing novel variants of drugs that reduce or eliminate the ADRs and would assist in developing more effective preventive measures.
Computational intelligence based platform for prediction and characterization of binding sites in proteins
Proteins are nano-scale machines that catalyze chemical reactions (enzymes), form the cytoskeleton (tubulin),
perform transporting functions (hemoglobin), etc. Knowledge of the tertiary structure of proteins is
of pivotal importance to the understanding and manipulation of protein's biochemical and cellular functions. Protein activity is often triggered by binding of various molecules/ions (referred to as ligands) to binding sites on the protein's surface. For instance, several cancer drugs bind to the tubulin protein, alter its function and as a result block cell division. A cost-effective rational drug design, which is used to find such drugs, requires knowledge of protein surface, which is deduced from the tertiary structure, to find and characterize the binding sites. The goal of this proposal is to build an integrated, high-throughput, in silico framework for prediction and characterization of protein binding sites based on the primary protein sequence with an intermediate step of performing tertiary structure prediction. In contrast to expensive and time-consuming experimental work, which simply cannot test thousands of competing hypotheses (as we will do in our in-silico research), the proposed research represents an important, cost-effective and practicable step towards determining and characterizing binding sites from protein sequences, which can be used to cut the costs of "wet lab" experiments. This research targets specific, important applications such as rational drug design that aims to finds cures for many major human diseases.
Role of osteoclastogenesis and osteoclast activation in joint destruction in degenerative and inflammatory joint diseases
In rheumatoid arthritis, resorption by osteoclasts causes local and systemic bone loss leading to collapse of joint surfaces and difficulties in replacing joints with implants. We hypothesize that enhanced osteoclast differentiation and/or activity contribute to joint destruction. To test this hypothesis, peripheral blood mononuclear cells from a transverse cohort of RA patients will be used to determine if the osteoclastogenic capacity and resorptive activity of the resulting differentiated osteoclasts correlate with disease severity. Understanding the molecular mediators of enhanced osteoclast functionality could potentially identify novel prognosis biomarkers and therapeutic targets to control joint destruction and systemic osteoporosis in rheumatoid arthritis.
CMSC 435 Introduction to Data Science
Virginia Commonwealth University
CMSC 635 Knowledge Discovery and Data Mining
Virginia Commonwealth University
ECE 321 Software Requirements Engineering
University of Alberta
ENCMP 100 Computer Programming for Engineers
University of Alberta
EE 280 Introduction to Digital Logic Design
University of Alberta
CMPE 310 Applying Software Engineering Practices Project
University of Alberta
ECE 625 Data Analysis and Knowledge Discovery
University of Alberta
ECE 625 Advanced Data Analysis and Decision Making
University of Alberta
CSC 4811 Computer Security
University of Colorado at Denver
CSC 5728 Software Engineering
University of Colorado at Denver
Selected Articles (11)
Intrinsically disordered proteins (IDPs) are abundant in various proteomes, where they play numerous important roles and complement biological activities of ordered proteins. Among functions assigned to IDPs are interactions with nucleic acids. However, often, such assignments are made based on the guilty-by-association principle. The validity of the extension of these correlations to all nucleic acid binding proteins has never been analyzed on a large scale across all domains of life. To fill this gap, we perform a comprehensive computational analysis of the abundance of intrinsic disorder and intrinsically disordered domains in nucleiomes (∼548 000 nucleic acid binding proteins) of 1121 species from Archaea, Bacteria and Eukaryota. Nucleiome is a whole complement of proteins involved in interactions with nucleic acids. We show that relative to other proteins in the corresponding proteomes, the DNA-binding proteins have significantly increased disorder content and are significantly enriched in disordered domains in Eukaryotes but not in Archaea and Bacteria. The RNA-binding proteins are significantly enriched in the disordered domains in Bacteria, Archaea and Eukaryota, while the overall abundance of disorder in these proteins is significantly increased in Bacteria, Archaea, animals and fungi. The high abundance of disorder in nucleiomes supports the notion that the nucleic acid binding proteins often require intrinsic disorder for their functions and regulation.
Intrinsically disordered proteins and protein regions offer numerous advantages in the context of protein-protein interactions when compared to the structured proteins and domains. These advantages include ability to interact with multiple partners, to fold into different conformations when bound to different partners, and to undergo disorder-to-order transitions concomitant with their functional activity. Molecular recognition features (MoRFs) are widespread elements located in disordered regions that undergo disorder-to-order transition upon binding to their protein partners. We characterize abundance, composition, and functions of MoRFs and their association with the disordered regions across 868 species spread across Eukaryota, Bacteria and Archaea. We found that although disorder is substantially elevated in Eukaryota, MoRFs have similar abundance and amino acid composition across the three domains of life. The abundance of MoRFs is highly correlated with the amount of intrinsic disorder in Bacteria and Archaea but only modestly correlated in Eukaryota. Proteins with MoRFs have significantly more disorder and MoRFs are present in many disordered regions, with Eukaryota having more MoRF-free disordered regions. MoRF-containing proteins are enriched in the ribosome, nucleus, nucleolus and microtubule and are involved in translation, protein transport, protein folding, and interactions with DNAs. Our insights into the nature and function of MoRFs enhance our understanding of the mechanisms underlying the disorder-to-order transition and protein-protein recognition and interactions.
Many drugs interact with numerous proteins besides their intended therapeutic targets and a substantial portion of these interactions is yet to be elucidated. Protein-Drug Interaction Database (PDID) addresses incompleteness of these data by providing access to putative protein-drug interactions that cover the entire structural human proteome. PDID covers 9652 structures from 3746 proteins and houses 16 800 putative interactions generated from close to 1.1 million accurate, all-atom structure-based predictions for several dozens of popular drugs. The predictions were generated with three modern methods: ILbind, SMAP and eFindSite. They are accompanied by propensity scores that quantify likelihood of interactions and coordinates of the putative location of the binding drugs in the corresponding protein structures. PDID complements the current databases that focus on the curated interactions and the BioDrugScreen database that relies on docking to find putative interactions. Moreover, we also include experimentally curated interactions which are linked to their sources: DrugBank, BindingDB and Protein Data Bank. Our database can be used to facilitate studies related to polypharmacology of drugs including repurposing and explaining side effects of drugs.
Motivated by the pressing need to characterize protein-DNA and protein-RNA interactions on large scale, we review a comprehensive set of 30 computational methods for high-throughput prediction of RNA- or DNA-binding residues from protein sequences. We summarize these predictors from several significant perspectives including their design, outputs and availability. We perform empirical assessment of methods that offer web servers using a new benchmark data set characterized by a more complete annotation that includes binding residues transferred from the same or similar proteins. We show that predictors of DNA-binding (RNA-binding) residues offer relatively strong predictive performance but they are unable to properly separate DNA- from RNA-binding residues. We design and empirically assess several types of consensuses and demonstrate that machine learning (ML)-based approaches provide improved predictive performance when compared with the individual predictors of DNA-binding residues or RNA-binding residues. We also formulate and execute first-of-its-kind study that targets combined prediction of DNA- and RNA-binding residues. We design and test three types of consensuses for this prediction and conclude that this novel approach that relies on ML design provides better predictive quality than individual predictors when tested on prediction of DNA- and RNA-binding residues individually. It also substantially improves discrimination between these two types of nucleic acids. Our results suggest that development of a new generation of predictors would benefit from using training data sets that combine both RNA- and DNA-binding proteins, designing new inputs that specifically target either DNA- or RNA-binding residues and pursuing combined prediction of DNA- and RNA-binding residues.
Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein–protein interaction networks.
MicroRNAs (miRNAs) are short endogenous noncoding RNAs that bind to target mRNAs, usually resulting in degradation and translational repression. Identification of miRNA targets is crucial for deciphering functional roles of the numerous miRNAs that are rapidly generated by sequencing efforts. Computational prediction methods are widely used for high-throughput generation of putative miRNA targets. We review a comprehensive collection of 38 miRNA sequence-based computational target predictors in animals that were developed over the past decade. Our in-depth analysis considers all significant perspectives including the underlying predictive methodologies with focus on how they draw from the mechanistic basis of the miRNA–mRNA interaction. We also discuss ease of use, availability, impact of the considered predictors and the evaluation protocols that were used to assess them. We are the first to comparatively and comprehensively evaluate seven representative methods when predicting miRNA targets at the duplex and gene levels. The gene-level evaluation is based on three benchmark data sets that rely on different ways to annotate targets including biochemical assays, microarrays and pSILAC. We offer practical advice on selection of appropriate predictors according to certain properties of miRNA sequences, characteristics of a specific application and desired levels of predictive quality. We also discuss future work related to the design of new models, data quality, improved usability, need for standardized evaluation and ability to predict mRNA expression changes.
Recent years witnessed increased interest in intrinsically disordered proteins and regions. These proteins and regions are abundant and possess unique structural features and a broad functional repertoire that complements ordered proteins. However, modern studies on the abundance and functions of intrinsically disordered proteins and regions are relatively limited in size and scope of their analysis. To fill this gap, we performed a broad and detailed computational analysis of over 6 million proteins from 59 archaea, 471 bacterial, 110 eukaryotic and 325 viral proteomes. We used arguably more accurate consensus-based disorder predictions, and for the first time comprehensively characterized intrinsic disorder at proteomic and protein levels from all significant perspectives, including abundance, cellular localization, functional roles, evolution, and impact on structural coverage. We show that intrinsic disorder is more abundant and has a unique profile in eukaryotes. We map disorder into archaea, bacterial and eukaryotic cells, and demonstrate that it is preferentially located in some cellular compartments. Functional analysis that considers over 1,200 annotations shows that certain functions are exclusively implemented by intrinsically disordered proteins and regions, and that some of them are specific to certain domains of life. We reveal that disordered regions are often targets for various post-translational modifications, but primarily in the eukaryotes and viruses. Using a phylogenetic tree for 14 eukaryotic and 112 bacterial species, we analyzed relations between disorder, sequence conservation and evolutionary speed. We provide a complete analysis that clearly shows that intrinsic disorder is exceptionally and uniquely abundant in each domain of life.
Off-target interactions of a popular immunosuppressant Cyclosporine A (CSA) with several proteins besides its molecular target, cyclophilin A, are implicated in the activation of signaling pathways that lead to numerous side effects of this drug. Using structural human proteome and a novel algorithm for inverse ligand binding prediction, ILbind, we determined a comprehensive set of 100+ putative partners of CSA. We empirically show that predictive quality of ILbind is better compared with other available predictors for this compound. We linked the putative target proteins, which include many new partners of CSA, with cellular functions, canonical pathways and toxicities that are typical for patients who take this drug. We used complementary approaches (molecular docking, molecular dynamics, surface plasmon resonance binding analysis and enzymatic assays) to validate and characterize three novel CSA targets: calpain 2, caspase 3 and p38 MAP kinase 14. The three targets are involved in the apoptotic pathways, are interconnected and are implicated in nephrotoxicity.
The disruption of the energy or nutrient balance triggers endoplasmic reticulum (ER) stress, a process that mobilizes various strategies, collectively called the unfolded protein response (UPR), which reestablish homeostasis of the ER and cell. Activation of the UPR stress sensor IRE1α (inositol-requiring enzyme 1α) stimulates its endoribonuclease activity, leading to the generation of the mRNA encoding the transcription factor XBP1 (X-box binding protein 1), which regulates the transcription of genes encoding factors involved in controlling the quality and folding of proteins. We found that the activity of IRE1α was regulated by the ER oxidoreductase PDIA6 (protein disulfide isomerase A6) and the microRNA miR-322 in response to disruption of ER Ca2+ homeostasis. PDIA6 interacted with IRE1α and enhanced IRE1α activity as monitored by phosphorylation of IRE1α and XBP1 mRNA splicing, but PDIA6 did not substantially affect the activity of other pathways that mediate responses to ER stress. ER Ca2+ depletion and activation of store-operated Ca2+ entry reduced the abundance of the microRNA miR-322, which increased PDIA6 mRNA stability and, consequently, IRE1α activity during the ER stress response. In vivo experiments with mice and worms showed that the induction of ER stress correlated with decreased miR-322 abundance, increased PDIA6 mRNA abundance, or both. Together, these findings demonstrated that ER Ca2+, PDIA6, IRE1α, and miR-322 function in a dynamic feedback loop modulating the UPR under conditions of disrupted ER Ca2+ homeostasis.
Intrinsic disorder is abundant in proteins involved in signaling and regulatory processes, where disorder-mediated protein interactions enable transient signaling complexes. On the other hand, intrinsic disorder provides various benefits for organization of large protein assemblages. In addition to the transient signaling complexes, there are numerous stable protein complexes (oligomers) that represent a functional form of proteinaceous machines. Functional disorder could be two distinctive types: (i) internal for assembly and movement of the different parts and (ii) external for interaction with regulators. The goal of this Review is to show that intrinsic disorder impacts the function and assembly of the proteinaceous machines. The first half of this Review considers some general aspects related to the involvement of intrinsic disorder in assembly and function of the protein complexes, whereas the second half is dedicated to the representation of some illustrative examples of pliable proteinaceous machines.
It is recognized now that intrinsically disordered proteins (IDPs), which do not have unique 3D structures as a whole or in noticeable parts, constitute a significant fraction of any given proteome. IDPs are characterized by an astonishing structural and functional diversity that defines their ability to be universal regulators of various cellular pathways. Programmed cell death (PCD) is one of the most intricate cellular processes where the cell uses specialized cellular machinery and intracellular programs to kill itself. This cell-suicide mechanism enables metazoans to control cell numbers and to eliminate cells that threaten the animal’s survival. PCD includes several specific modules, such as apoptosis, autophagy, and programmed necrosis (necroptosis). These modules are not only tightly regulated but also intimately interconnected and are jointly controlled via a complex set of protein–protein interactions. To understand the role of the intrinsic disorder in controlling and regulating the PCD, several large sets of PCD-related proteins across 28 species were analyzed using a wide array of modern bioinformatics tools. This study indicates that the intrinsic disorder phenomenon has to be taken into consideration to generate a complete picture of the interconnected processes, pathways, and modules that determine the essence of the PCD. We demonstrate that proteins involved in regulation and execution of PCD possess substantial amount of intrinsic disorder. We annotate functional roles of disorder across and within apoptosis, autophagy, and necroptosis processes. Disordered regions are shown to be implemented in a number of crucial functions, such as protein–protein interactions, interactions with other partners including nucleic acids and other ligands, are enriched in post-translational modification sites, and are characterized by specific evolutionary patterns. We mapped the disorder into an integrated network of PCD pathways and into the interactomes of selected proteins that are involved in the p53-mediated apoptotic signaling pathway.