Education, Licensure and Certification (3)
Ph.D.: Computer Science & Engineering, University of Notre Dame 2016
M.S.: Computer Science & Engineering, University of Notre Dame 2015
B.S.: Computer Science/Mathematics, Eckerd College 2010
Areas of Expertise (6)
Computer Science
Machine Learning
Data Science
Genomics
Data Structures and Algorithms
Bioinformatics
Accomplishments (4)
GAANN Fellowship
2012-2014
OpenMM Visiting Scholar
2012
Outstanding Graduate TA Award
Kaneb Center 2012
Sigma Xi -Scientific Research Honor Society
Nominated for Associate Membership - 2010 Nominated for Full Membership - 2020
Affiliations (6)
- Institute of Electrical and Electronics Engineers (IEEE) : Member
- American Society for Engineering Education (ASEE) : Member
- Sigma Xi: Full Member
- Council on Undergraduate Research (CUR): Member
- Society for the Study of Evolution (SSE): Member
- Society for Molecular Biology and Evolution (SMBE): Member
Event and Speaking Appearances (5)
Exploring Mechanisms of Molecular Evolution and Their Representations in PCA
43rd Annual IEEE Computer Software and Applications Conference (COMPSAC) Milwaukee, WI., 2019
Detecting and Localizing Inversions with SNPs
12th Annual Arthropod Genomics Symposium Manhattan, KS., 2019
Real-World Lessons in Machine Learning Applied to Spam Classification
Milwaukee Big Data Meetup Milwaukee, WI., 2017
Populations Genetics without Population Labels
11th Annual Arthropod Genomics Symposium Urbana-Champaign, IL., 2018
Feature Ranking as an Alternative to FST
10th Annual Arthropod Genomics Symposium Notre Dame, IN., 2017
Research Grants (2)
Hearing Patient's Voice: Contextual Phenotyping of Patient Narratives and Clinical Data using ML & NLP
CTSI Pilot Grant $50,000 (10,000 subcontract to MSOE)
Submitted 2019. Awarded. Co-PI.
CRII: III: RUI: Association Testing and Inversion Detection without Reference Genomes
National Science Foundation $174,231
Submitted 2019; Awarded 2020. PI
Selected Publications (6)
Detecting inversions with PCA in the presence of population structure
Public Library of Science ONENowling, R. J., Manke, K.R., Emrich, S.J.
2020 Chromosomal inversions can lead to reproductive isolation and adaptation in insects such as Drosophila melanogaster and the non-model malaria vector Anopheles gambiae. Inversions can be detected and characterized using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). To aid in developing such methods, we formed a new benchmark derived from three publicly-available insect data. We then used this benchmark to perform an extended validation of our software for inversion analysis (Asaph). Through that process, we identified and characterized several problematic test cases liable to misinterpretation that can help guide PCA-based inversion detection. Lastly, we re-analyzed the 2R chromosome arm of 150 An. gambiae and coluzzii samples and observed two inversions (2Rc and 2Rd) that were previously known but not annotated in these particular individuals. The resulting benchmark data set and methods will be useful for future inversion detection based solely on SNP data.
Adjusted Likelihood-ratio Test for Variants with Unknown Genotypes
Journal of Bioinformatics and Computational BiologyNowling, R.J., Emrich, S.J.
2018 Association tests performed with the Likelihood-Ratio Test (LR Test) can be an alternative to [Formula: see text], which is often used in population genetics to find variants of interest. Because the LR Test has several properties that could make it preferable to [Formula: see text], we propose a novel approach for modeling unknown genotypes in highly-similar species. To show the effectiveness of this LR Test approach, we apply it to single-nucleotide polymorphisms (SNPs) associated with the recent speciation of the malaria vectors Anopheles gambiae and Anopheles coluzzii and compare to
Detecting Chromosomal Inversions from Dense SNPs by Combination PCA and Association Tests
Proceedings of the 2018 ACM International Conference on BioinformaticsNowling, R.J., Emrich, S.J.
2018 Principal Component Analysis (PCA) of dense single nucleotide polymorphism (SNP) data has wide-ranging applications in populations genetics, including detection of chromosomal inversions. SNPs associated with each PC can be identified through single-SNP association tests performed between SNP genotypes and PC coordinates; this approach has several advantages over thresholding loading factors or sparse PCA methods.
Stable Feature Ranking with Logistic Regression Ensembles
IEEE: International Conference on Bioinformatics and Biomedicine (BIBM)Nowling, R.J., Emrich, S.J.,
2017 Beyond automated classification, supervised machine-learning models can be interpreted to find which features or combination of features distinguish sets of classes. Logistic Regression (LR) is an example of a model well-suited for human interpretation. Unfortunately, results from feature ranking with LR may not be reliable and reproducible for the same dataset. We demonstrate that stability and consistency can be achieved via ensembles (“LR ensembles”). As a specific example of the real-world utility of our associated framework, we apply LR ensembles to single-nucleotide polymorphisms (SNPs) associated with the recent speciation of the malaria vectors Anopheles gambiae and Anopheles coluzzii and compare with the more common univariate metric F
A Domain-Driven, Generative Data Model for Big Pet Store
IEEE: International Conference on Big Data and Cloud Computing (BDCloud)Nowling, R.J., Vyas, J.
2014 Generating large amounts of semantically-rich data for testing big data workflows is paramount for scalable performance benchmarking and quality assurance in modern machine-learning and analytics workloads. The most obvious use case for such a generative algorithm is in conjunction with a big data application blueprint, which can be used by developers (to test their emerging big data solutions) as well as end users (as a starting point for validating infrastructure installations, building novel applications, and learning analytics methods). We present a new domain-driven, generative data model for Big Pet Store, a big data application blueprint for the Hadoop ecosystem included in the Apache Big Top distribution. We describe the model and demonstrate its ability to generate semantically-rich data at variable scale ranging from a single machine to a large cluster. We validate the model by using the generated data to answer questions about customer locations and purchasing habits for a fictional targeted advertising campaign, a common business use case.
Long Timestep Molecular Dynamics on the Graphical Processing Unit
Journal of Chemical Theory ComputingSweet, J.C., Nowling, R.J., Cickovski, T., Sweet, C.R., Pande, V.S., Izaguirre, J.A.
2013 Molecular dynamics (MD) simulations now play a key role in many areas of theoretical chemistry, biology, physics, and materials science. In many cases, such calculations are significantly limited by the massive amount of computer time needed to perform calculations of interest. Herein, we present long timestep molecular dynamics (LTMD), a method to significantly speed MD simulations. In particular, we discuss new methods to calculate the needed terms in LTMD as well as issues germane to a graphical processing unit (GPU) implementation. The resulting code, implemented in the OpenMM MD library, can achieve a significant 6-fold speed increase, leading to MD simulations on the order of 5 μs/day using implicit solvent models.