Areas of Expertise (7)
Information and Computer Science
Parallel Discrete-Event Simulation
Massively Parallel Systems
Modeling and Simulation Systems
Massively Parallel Processing
Systems and Network Modeling
Chris Carothers is a Professor in the Computer Science Department at Rensselaer Polytechnic Institute. His research interest are in massively parallel systems focusing on modeling and simulation systems of all sorts. Prof. Carothers is an NSF CAREER award winner and is currently active in the DOE Exascale Co-Design Program associated with designs for next generation exascale storage systems as well as the NSF PetaApps Program, and the Army Research Center's Mobile Network Modeling Institute
Georgia Institute of Technology: Ph.D., Computer Science 1997
Georgia Institute of Technology: M.S., Computer Science 1996
Georgia Institute of Technology: B.S., Information and Computer Science 1991
Media Appearances (2)
U.S. Military Sees Future in Neuromorphic Computing
The Next Platform online
The novel architectures story is still shaping out for 2017 when it comes machine learning, hyperscale, supercomputing and other areas.
IBM’s supercomputer Watson gets a roommate at RPI
Albany Business Review online
IBM chose Rensselaer Polytechnic Institute to house one of the most powerful supercomputers in the world, which will allow businesses to analyze massive amounts of data.
The major theme of research investigated here is how might neuromorphic computing impact future designs of supercomputer systems. This report provides both a summary and detailed experimental research results for the five core research thrusts (CRTs) covered in this research project.
Caitlin J Ross, Christopher D Carothers, Misbah Mubarak, Robert B Ross, Jianping Kelvin Li, Kwan-Liu Ma
Scalability of parallel discrete-event simulation (PDES) systems is key to their use in modeling complex networks at high fidelity. In particular, intranode scalability is important due to the prevalence of many-core systems, but MPI communication between cores on the same node is known to have drawbacks (e.g., software overheads). We have extended the ROSS optimistic PDES framework to create memory pools shared by MPI processes on the same node in order to reduce on-node MPI overhead. We perform experiments to compare the performance of shared memory ROSS with pure MPI ROSS on two different systems. For the experiments, we use several models that exhibit a variety of characteristics to understand the conditions where shared memory can benefit the simulation. In general, higher remote event rates means that simulations are more likely to benefit from using shared memory, but this may also be due in part to improved rollback behavior.
Prasanna Date, Christopher D Carothers, James A Hendler, Malik Magdon-Ismail
Today's petascale supercomputers are comprised of ten's of thousands of compute nodes. Failures on these massive machines are a growing problem as the time for a single compute node to fail is shrinking. Ideally, the job scheduler would like the capability to predict node failures ahead of time in order to minimize the impact of node failures on overall job throughput. However, due to the tight power constraints of future systems, the online modeling of real-time error data must be accomplished using as little power as possible. To this end, the IBM TrueNorth Neurosynaptic System is used to create a Spiking Neural Network (SNN) model of supercomputer failure data and the classification accuracy of this model is compared to other Machine Learning (ML) and Deep Learning (DL) techniques. It is observed that the TrueNorth failure classification model yields a training accuracy of 99.41%, validation accuracy of 98.12% and testing accuracy of 99.80% and outperforms other machine learning and deep learning approaches. Moreover, the TrueNorth SNN consumes five orders of magnitude less power than the other ML/DL approaches during the testing phase. Additionally, it is observed that all ML/DL approaches investigated as part of this study are able to produce accurate models of the supercomputer system failure data.