Spotlight
Media
Documents:
Photos:
Biography
David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology.
Dr. Bader is a Fellow of the IEEE, ACM, AAAS, and SIAM; a recipient of the IEEE Sidney Fernbach Award; and the 2022 Innovation Hall of Fame inductee of the University of Maryland’s A. James Clark School of Engineering. He advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE).
Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH).
Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. ROI-NJ recognized Bader as a technology influencer on its 2021 inaugural and 2022 lists.
In 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence.
In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution, and Hyperion Research estimates that the total economic value of Linux supercomputing pioneered by Bader has been over $100 trillion over the past 25 years.
Areas of Expertise (6)
Graph Analytics
Massive-Scale Analytics
High-Performance Computing
Data Science
Applications in Cybersecurity
Computational Genomics
Accomplishments (8)
Inductee into University of Maryland's A. James Clark School of Engineering Innovator Hall of Fame
2022
NVIDIA AI Lab (NVAIL) Award
2019
Invited attendee to the White House’s National Strategic Computing Initiative (NSCI) Anniversary Workshop.
2019
Facebook AI System Hardware/Software Co-Design Research Award
2019
Named a member of "People to Watch" by HPC Wire
2014
The first recipient of the University of Maryland's Distinguished Alumni Award
2012 Department of Electrical and Computer Engineering
Named a member of "People to Watch" by HPC Wire
2012
Selected by Sony, Toshiba, and IBM to direct the first Center of Competence for the Cell Processor
2006
Education (3)
University of Maryland: Ph.D., Electrical and Computer Engineering 1996
Lehigh University: M.S., Electrical Engineering 1991
Lehigh University: B.S., Computer Engineering 1990
Affiliations (4)
- AAAS Fellow
- IEEE Fellow
- SIAM Fellow
- ACM Fellow
Links (4)
Media Appearances (8)
This New AI Brain Decoder Could Be A Privacy Nightmare, Experts Say
Lifewire online
2023-05-08
The technique offers promise for stroke patients but could be invasive.
Common password mistakes you're making that could get you hacked
CBS News online
2023-03-03
It's hard to memorize passwords as you juggle dozens of apps — whether you're logging in to stream your favorite show, view your medical records, check your savings account balance or more, you'll want to avoid unwanted prying eyes.
The Democratization of Data Science Tools with Dr. David Bader
To the Point Cybersecurity podcast online
2023-09-19
He deep dives into the opportunity to democratize data science tools and the awesome free tool he and Mike Merrill spent the last several years building that can be found on the Bears-R-Us GitHub page open to the public.
Academic Data Science Alliance Picks Up Steam
Datanami online
2022-11-22
Universities looking for resources to build their data science curriculums and degree programs have a new resource at their disposal in the form of the Academic Data Science Alliance. Founded just prior to the pandemic, the ADSA survived COVID and now it’s working to foster a community of data science leaders at universities across North America and Europe...
‘Weaponised app’: Is Egypt spying on COP27 delegates’ phones?
Al Jazeera online
2022-11-12
Cybersecurity concerns have been raised at the United Nations’ COP27 climate talks over an official smartphone app that reportedly has carte blanche to monitor locations, private conversations and photographs. About 35,000 people are expected to attend the two-week climate conference in Egypt, and the app has been downloaded more than 10,000 times on Google Play, including by officials from France, Germany and Canada...
Your Hard Drive May One Day Use Diamonds for Storage
Lifewire online
2022-05-03
Diamonds could one day be used to store vast amounts of information. Researchers are trying to use the strange effects of quantum mechanics to hold information. However, experts say don’t expect a quantum hard drive in your PC anytime soon.
Big Data Career Notes: July 2019 Edition
Datanami online
2019-07-16
The New Jersey Institute of Technology has announced that it will establish a new Institute for Data Science, directed by Distinguished Professor David Bader. Bader recently joined NJIT’s Ying Wu College of Computing from Georgia Tech, where he was chair of the School of Computational Science and Engineering within the College of Computing. Bader was recognized as one of HPCwire’s People to Watch in 2014.
David Bader to Lead New Institute for Data Science at NJIT
Inside HPC online
2019-07-10
Professor David Bader will lead the new Institute for Data Science at the New Jersey Institute of Technology. Focused on cutting-edge interdisciplinary research and development in all areas pertinent to digital data, the institute will bring existing research centers in big data, medical informatics and cybersecurity together to conduct both basic and applied research.
Event Appearances (3)
Massive-scale Analytics
13th International Conference on Parallel Processing and Applied Mathematics (PPAM) BIalystok, Poland
2019-09-09
Predictive Analytics from Massive Streaming Data
44th Annual GOMACTech Conference: Artificial Intelligence & Cyber Security: Challenges and Opportunities for the Government Albuquerque, NM
2019-03-26
Massive-Scale Analytics Applied to Real-World Problems
2018 Platform for Advanced Scientific Computing (PASC) Conference Basel, Switzerland
2018-07-04
Research Focus (2)
NVIDIA AI Lab (NVAIL) for Scalable Graph Algorithms
2019-08-05
Graph algorithms represent some of the most challenging known problems in computer science for modern processors. These algorithms contain far more memory access per unit of computation than traditional scientific computing. Access patterns are not known until execution time and are heavily dependent on the input data set. Graph algorithms vary widely in the volume of spatial and temporal locality that is usable my modern architectures. In today’s rapidly evolving world, graph algorithms are used to make sense of large volumes of data from news reports, distributed sensors, and lab test equipment, among other sources connected to worldwide networks. As data is created and collected, dynamic graph algorithms make it possible to compute highly specialized and complex relationship metrics over the entire web of data in near-real time, reducing the latency between data collection and the capability to take action. With this partnership with NVIDIA, we collaborate on the design and implementation of scalable graph algorithms and graph primitives that will bring new capabilities to the broader community of data scientists. Leveraging existing open frameworks, this effort will improve the experience of graph data analysis using GPUs by improving tools for analyzing graph data, speeding up graph traversal using optimized data structures, and accelerating computations with better runtime support for dynamic work stealing and load balancing.
Facebook AI Systems Hardware/Software Co-Design research award on Scalable Graph Learning Algorithms
2019-05-10
Deep learning has boosted the machine learning field at large and created significant increases in the performance of tasks including speech recognition, image classification, object detection, and recommendation. It has opened the door to complex tasks, such as self-driving and super-human image recognition. However, the important techniques used in deep learning, e.g. convolutional neural networks, are designed for Euclidean data type and do not directly apply on graphs. This problem is solved by embedding graphs into a lower dimensional Euclidean space, generating a regular structure. There is also prior work on applying convolutions directly on graphs and using sampling to choose neighbor elements. Systems that use this technique are called graph convolution networks (GCNs). GCNs have proven to be successful at graph learning tasks like link prediction and graph classification. Recent work has pushed the scale of GCNs to billions of edges but significant work remains to extend learned graph systems beyond recommendation systems with specific structure and to support big data models such as streaming graphs. This project will focus on developing scalable graph learning algorithms and implementations that open the door for learned graph models on massive graphs. We plan to approach this problem in two ways. First, developing a scalable high performance graph learning system based on existing GCNs algorithms, like GraphSage, by improving the workflow on shared-memory NUMA machines, balancing computation between threads, optimizing data movement, and improving memory locality. Second, we will investigate graph learning algorithm-specific decompositions and develop new strategies for graph learning that can inherently scale well while maintaining high accuracy. This includes traditional partitioning, however in general we consider breaking the problem into smaller pieces, which, when solved will result in a solution to the bigger problem. We will explore decomposition results from graph theory, for example, forbidden graphs and the Embedding Lemma, and determine how to apply such results into the field of graph learning. We will investigate whether these decompositions could assist in a dynamic graph setting.
Research Grants (6)
Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes
DARPA/NVIDIA $25,000,000
2010-06-01
Goal: Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks
Center for Adaptive Supercomputing Software for Multithreaded Architectures (CASS-MT): Analyzing Massive Social Networks
Department of Defense $24,000,000
2008-08-01
Exascale Streaming Data Analytics for social networks: understanding communities, intentions, population dynamics, pandemic spread, transportation and evacuation.
Proactive Detection of Insider Threats with Graph Analysis at Multiple Scales (PRODIGAL), under Anomoly Detection at Multiple Scales (ADAMS)
DARPA $9,000,000
2011-05-01
This paper reports on insider threat detection research, during which a prototype system (PRODIGAL)1 was developed and operated as a testbed for exploring a range of detection and analysis methods. The data and test environment, system components, and the core method of unsupervised detection of insider threat leads are presented to document this work and benefit others working in the insider threat domain...
Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing
DARPA $ 7,500,000.00
2010-06-01
Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks.
SHARP: Software Toolkit for Accelerating Graph Algorithms on Hive Processors
DARPA $6,760,425
2017-04-23
The aim of SHARP is to enable platform independent implementation of fast, scalable and approximate, static and streaming graph algorithms. SHARP will develop a software tool-kit for seamless acceleration of graph analytics (GA) applications, for a first of its kind collection of graph processors...
GRATEFUL: GRaph Analysis Tackling power EFficiency, Uncertainty, and Locality
DARPA $2,929,819
2012-10-19
Think of the perfect embedded computer. Think of a computer so energy-efficient that it can last 75 times longer than today’s systems. Researchers at Georgia Tech are helping the Defense Advanced Projects Research Agency (DARPA) develop such a computer as part of an initiative called Power Efficiency Revolution for Embedded Computing Technologies, or PERFECT. “The program is looking at how do we come to a new paradigm of computing where running time isn’t necessarily the constraint, but how much power and battery that we have available is really the new constraint,” says David Bader, executive director of high-performance computing at the School of Computational Science and Engineering. If the project is successful, it could result in computers far smaller and orders of magnitude more efficient than today’s machines. It could also mean that the computer mounted tomorrow on an unmanned aircraft or ground vehicle, or even worn by a soldier would use less energy than a larger device, while still being as powerful. Georgia Tech’s part in the DARPA-led PERFECT effort is called GRATEFUL, which stands for Graph Analysis Tackling power-Efficiency, Uncertainty and Locality. Headed by Bader and co-investigator Jason Riedy, GRATEFUL focuses on algorithms that would process vast stores of data and turn it into a graphical representation in the most energy-efficient way possible.
Answers (6)
Articles (8)
Cybersecurity Challenges in the Age of Generative AI
CTOTech MagazineDavid Bader
2023-11-20
Cybersecurity professionals will not only have to discover malicious events at the time of occurrence, but also proactively implement preventative measures before an attack. For these professionals, the significant challenge will be protecting against new behaviors and methods that they are not yet familiar with.
What CISOs need to know to mitigate quantum computing risks
SecurityDavid Bader
2023-06-03
Quantum technologies harness the laws of quantum mechanics to solve complex problems beyond the capabilities of classical computers. Although quantum computing can one day lead to positive and transformative solutions for complex global issues, the development of these technologies also poses a significant and emerging threat to cybersecurity infrastructure for organizations.
Tailoring parallel alternating criteria search for domain specific MIPs: Application to maritime inventory routing
Computers & Operations ResearchLluís-Miquel Munguía, Shabbir Ahmed, David A Bader, George L Nemhauser, Yufen Shao, Dimitri J Papageorgiou
2019 Parallel Alternating Criteria Search (PACS) relies on the combination of computer parallelism and Large Neighborhood Searches to attempt to deliver high quality solutions to any generic Mixed-Integer Program (MIP) quickly. While general-purpose primal heuristics are widely used due to their universal application, they are usually outperformed by domain-specific heuristics when optimizing a particular problem class.
High-Performance Phylogenetic Inference
Bioinformatics and PhylogeneticsDavid A Bader, Kamesh Madduri
2019 Software tools based on the maximum likelihood method and Bayesian methods are widely used for phylogenetic tree inference. This article surveys recent research on parallelization and performance optimization of state-of-the-art tree inference tools. We outline advances in shared-memory multicore parallelization, optimizations for efficient Graphics Processing Unit (GPU) execution, as well as large-scale distributed-memory parallelization.
Numerically approximating centrality for graph ranking guarantees
Journal of Computational ScienceEisha Nathan, Geoffrey Sanders, David A Bader
2018 Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying its relative importance. In this work the centrality measures we use are Katz Centrality and PageRank. Given an approximate solution, we use the residual to accurately estimate how much of the ranking matches the ranking given by the exact solution.
Ranking in dynamic graphs using exponential centrality
International Conference on Complex Networks and their ApplicationsEisha Nathan, James Fairbanks, David Bader
2017 Many large datasets from several fields of research such as biology or society can be represented as graphs. Additionally in many real applications, data is constantly being produced, leading to the notion of dynamic graphs. A heavily studied problem is identification of the most important vertices in a graph. This can be done using centrality measures, where a centrality metric computes a numerical value for each vertex in the graph.
Scalable and High Performance Betweenness Centrality on the GPU [Best Student Paper Finalist]
Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisA. McLaughlin, D. A. Bader
2014-11-01
raphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data structures and inefficient graph traversals that limit scalability and performance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running between ness centrality on 192 GPUs.
STINGER: High performance data structure for streaming graphs [Best Paper Award]
IEEE Conference on High Performance Extreme ComputingD. Ediger, R. McColl, J. Riedy, D. A. Bader
2012-09-01
The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, Spatio-Temporal Interaction Networks and Graphs Extensible Representation (STINGER), that includes a graph data structure that enables these applications. Key attributes of STINGER are fast insertions, deletions, and updates on semantic graphs with skewed degree distributions. We demonstrate a process of algorithmic and architectural optimizations that enable high performance on the Cray XMT family and Intel multicore servers. Our implementation of STINGER on the Cray XMT processes over 3 million updates per second on a scale-free graph with 537 million edges.
Social