Cody Buntain's research focus is the intersection of data science with the social sciences, to develop and adapt advanced computational techniques to solve critical political and social issues, with a focus on crisis communication, social movements, and political participation.
Using data science with a focus on using social media to study how people engage socially and politically, especially during disasters and times of social unrest.
Areas of Expertise (8)
Online Political Engagement
Best Paper Award, IEEE SmartCloud
SIGIR Student Grant
Best Paper Honorable Mention, #Microposts2016
UMD HCIL Conference Award
2014 and 2015
International Conference Student Awards
2014 and 2015
Computer Science Department Gannon Award
University of Maryland College Park: Ph.D., Computer Science 2015
University of Alabama in Huntsville: M.S., Computer Science 2010
University of Alabama in Huntsville: B.S., Computer Science, Math 2007
- Intelligence Postdoctoral Fellowship
- OSDC-PIRE Fellowship
Media Appearances (1)
Meet the Troll Hunters
NBC New York tv
I-Team interviews NJIT's Cody Buntain about tracking political trolls online.
Event Appearances (4)
#HandsOffMyADA: A Twitter Response to the ADA Education and Reform Act
ACM Conference on Human Factors in Computing Systems Glasgow, Scotland
Analyzing a Fake News Authorship Network
2019 iConference College Park, MD
Learning Information Types in Social Media for Crises: TREC-IS
Twenty-Seventh Text Retrieval Conference Gaithersburg, MD
#pray4victims: Consistencies In Response To Disaster on Twitter
21st ACM Conference on Computer-Supported Cooperative Work and Social Computing Jersey City, NY
Research Grants (4)
Automated Program Analysis in Cybersecurity
The primary goal for Five Directions during the APAC [Automated Program Analysis for Cybersecurity] program was to measure the effectiveness and efficiency of the R&D [research and development] teams in detecting malware in Android applications. In order to achieve this goal, experiments were designed to test the tools being developed by the Research and Development (R&D) teams. The experiments pitted the research tools against malicious Android applications created by the Adversarial Challenge (AC) teams. The results of these experiments were then compared to the performance of a separate Control Team that used existing tools and techniques in order to analyze the same malicious applications. This analysis provided a method of evaluating the performances of each R&D team as well as the overall performance of the APAC program.
Deception Studio: Attacker Characterization and Dynamic Relocation
Department of Defense - Airforce/Pikewerks $745,519
2011 - Phase II Deception Studio (DS) is a learning, behavior-based defense system for ensuring service availability and trust. DS's learning capabilities include attack detection, prediction, and attribution and can react to attacks in real time by shaping an adversary's perception and creating an illusion capable of manipulating his planning processes. Responses are deployed in a targeted fashion, allowing DS to respond with responses proportionate to the attack without inflicting hard penalties on valid users. Such responses can be both deceptive and active, extending the protection boundary of the system and forcing attackers to react to ever-changing conditions. DS can further provide availability of critical services by moving them out-of-band during ongoing attacks, dynamically migrating an attacker into a decoy environment, or degrading his access while maintaining availability for legitimate users. Before employing such responses, DS includes technology to heal critical services from infection and can also bring this healing technology to bare on compromised systems, returning them to the pool of usable systems. Deception Studio represents the state-of-the-art in active, behavior-based attack detection and prevention systems, imbuing systems with the ability to remain operational, available, and trustworthy through even the most targeted attacks.
Imbuing Trust in Untrusted Hardware to Improve Protections
Department of Defense - Airforce/Pikewerks $97,559
2010 The Pikewerks InTrust system is a two-stage system designed to detect malicious implants or alterations in COTS hardware and firmware. It is meant to be used during both the integration/pre-deployment and the deployment stages to first establish trust and then maintain that trust during fielding. The pre-deployment test platform will make use of invasive testing and analysis techniques to ensure no unauthorized information leakage is occurring or embedded malware exists. Since many of these tests are heuristic-based and a number of malicious hardware modifications may have zero footprint until activation, however, it is possible that some alterations or implants will get past the pre-deployment analysis. As such, InTrust's second stage hardware sensors and firmware analysis mechanisms are designed to be embedded into fielded COTS platforms to detect tamper, attempts at modification, and the side effects of a triggered alteration. Further, once a tamper or modification attempt is detected, InTrust employs the Malicious Hardware Shield (MHS) to seal off regions of memory from direct access from unauthorized devices. InTrust can then integrate with existing Pikewerks environmental key generation to prevent unauthorized exposure of CPI/CT.
Deception Studio: Attacker Characterization and Dynamic Relocation
Department of Defense - Air Force / Pikewerks $99,961
2009 - Phase I One of the most significant weaknesses that faces modern software protection solutions is the reliance on static policies and rule sets that are established based on “known” attack methods at the time of development. In reality, attacks are not static; they adapt over time, and evolve to defeat protections as they are made public. Pikewerks proposes to address both of these weaknesses by developing a system, referred to as Deception Studio that characterizes and appropriately reacts to attackers in real-time. As has been successfully implemented in traditional warfare, it will strive to shape the attacker’s perception, and create an illusion capable of manipulating their planning process. This concept is based on the combat operations process defined by John Boyd referred to as Observe, Orient, Decide, and Act (OODA). Deception Studio will characterize the attack, and tailor defenses based on what is observed.
Characterizing Gender Differences in Misogynistic and Antisocial Microblog PostsOnline Harassment
2018 This chapter presents an observational study into the genders of authors posting abusive misogynistic insults and hate speech on Twitter. We first characterize the different uses of potentially abusive and misogynistic expletives in Twitter using a novel diversity-based sampling strategy and use Amazon’s Mechanical Turk (MTurk) crowdsourcing platform to construct a labeled dataset of abusive, misogynistic insults.
SMIDGen: An Approach for Scalable, Mixed-Initiative Dataset Generation from Online Social NetworksHCIL Tech Reports
Matthew Louis Mauriello, Cody Buntain, Brenna McNally, Sapna Bagalkotkar, Samuel Kushnir, Jon E Froehlich
2018 Recent qualitative studies have begun using large amounts of Online Social Network (OSN) data to study how users interact with technologies. However, current approaches to dataset generation are manual, time-consuming, and can be difficult to reproduce. To address these issues, we introduce SMIDGen: a hybrid manual+ computational approach for enhancing the replicability and scalability of data collection from OSNs to support qualitative research.
Sampling Social Media: Supporting Information Retrieval from Microblog Data Resellers with Text, Network, and Spatial AnalysisIn Proceedings of the 51st Hawaii International Conference on System Sciences 2018
Buntain, Cody, McGrath, Erin, and Behlendorf, Brandon
2018 This paper presents a computationally assisted method for scaling researcher expertise to large, online social media datasets in which access is constrained and costly. Developed collaboratively between social and computer science researchers, this method is designed to be flexible, scalable, cost-effective, and to reduce bias in data collection. Online response to six case studies covering elections and election-related violence in Sub-Saharan African countries are explored using Twitter, a popular online microblogging platform. Results show: 1) automated query expansion can researcher mitigate bias, 2) machine learning models combining textual, social, temporal, and geographic features in social media data perform well in filtering data unrelated to the target event, and 3) these results are achievable while minimizing fee-based queries by bootstrapping with readily-available Twitter samples.
Automatically Identifying Fake News in Popular Twitter ThreadsIEEE
Cody Buntain ; Jennifer Golbeck
Information quality in social media is an increasingly important issue, but web-scale data hinders experts' ability to assess and correct much of the inaccurate content, or "fake news," present in these platforms. This paper develops a method for automating fake news detection on Twitter by learning to predict accuracy assessments in two credibility-focused Twitter datasets: CREDBANK, a crowdsourced dataset of accuracy assessments for events in Twitter, and PHEME, a dataset of potential rumors in Twitter and journalistic assessments of their accuracies. We apply this method to Twitter content sourced from BuzzFeed's fake news dataset and show models trained against crowdsourced workers outperform models based on journalists' assessment and models trained on a pooled dataset of both crowdsourced workers and journalists. All three datasets, aligned into a uniform format, are also publicly available. A feature analysis then identifies features that are most predictive for crowdsourced and journalistic accuracy assessments, results of which are consistent with prior work. We close with a discussion contrasting accuracy and credibility and why models of non-experts outperform models of journalists for fake news detection in Twitter.
Powers and Problems of Integrating Social Media Data with Public Health and SafetyData for Good Exchange
Cody Buntain, Jennifer Golbeck, & Gary LaFree
2015 Social media sites like Twitter provide readily accessible sources of large-volume, high-velocity data streams, now referred to as “Big Data.” While private companies have already made great strides in leveraging these social media sources, many public organizations and government agencies could reap significant benefits from these resources. Care must be exercised in this integration, however, as huge data sets come with their own intrinsic issues.
Trust transfer between contextsJournal of Trust Management
Cody Buntain, Jennifer Golbeck
2015 This paper explores whether trust, developed in one context, transfers into another, distinct context and, if so, attempts to quantify the influence this prior trust exerts. Specifically, we investigate the effects of artificially stimulated prior trust as it transfers across disparate contexts and whether this prior trust can compensate for negative objective information. To study such incidents, we leveraged Berg’s investment game to stimulate varying degrees of trust between a human and a set of automated agents.