hero image
Huajin Wang - Carnegie Mellon University. Pittsburgh, PA, US

Huajin Wang

Senior Librarian | Carnegie Mellon University


Huajin Wang leads innovative initiatives that help to create a culture change towards a more open and reproducible research landscape.


Huajin Wang is a Senior Librarian and co-director for the Open Science & Data Collaborations program at Carnegie Mellon University Libraries. As a director for the Open Science & Data Collaborations program, she leads innovative initiatives that help to create a culture change towards a more open and reproducible research landscape through tools, training, community building and collaboration across disciplinary boundaries. As a researcher, she has led many successful research projects and collaborated with biologists, clinicians, information professionals and data scientists on interdisciplinary research topics. Her current research interest is on open science methodology and assessments, AI-readiness of research data and secondary reuse of biomedical data. She is the chair and co-PI for the NSF-funded Artificial Intelligence for Data Discovery and Reuse (AIDR) conference and co-chairs the annual Open Science Symposium.

Areas of Expertise (4)

Data Collaboration

Open Science

Biomedical Data

AI-Readiness of Research Data

Media Appearances (1)

Libraries Convene Community of Scholars to Tackle Data Challenges

Carnegie Mellon University  online


"With the recent advances in machine learning and AI, it is possible to train computers to find optimal solutions to a problem, such as integrating different datasets and extracting metadata," said Huajin Wang, a CMU librarian and conference chair. "We created AIDR 2019 because it's about time that people working in a variety of disciplines come together to benefit from diverse expertise, and address these mutual challenges together, using the power of AI."

view more






Case Study - Carnegie Mellon University



Industry Expertise (2)

Library and Information Management


Education (3)

Carnegie Mellon University: (non-degree program), Machine Learning

Shandong University: B.S., Microbiology

University of Alberta: Ph.D., Cell Biology

Event Appearances (5)

An End-to-end Open Science and Data Collaborations Program

(2021) Coalition for Networked Information (CNI)  Virtual

The rising importance of open science and open data

(2021) Mid-Atlatic Chapter of the Medical Library Association Annual Meeting  Virtual

Data Discovery and Reuse: AI Solutions & the Human Factor

(2020) National Information Standards Organization (NISO) Plus Conference  Baltimore, MD

Building Community and Support for Open Science at Carnegie Mellon University

(2018) Coalition for Networked Information (CNI) Fall 2018 Membership Meeting  Washington, DC

AI for Data Reuse - Tools, Challenges, and Opportunities

(2019) Reproducibility and Data Reuse in Life Science, SciLifeLab Data Centre  Uppsala, Sweden

Articles (5)

Implementation and assessment of an end-to-end Open Science & Data Collaborations program


2022 As research becomes more interdisciplinary, fast-paced, data-intensive, and collaborative, there is an increasing need to share data and other research products in accordance with Open Science principles. In response to this need, we created an Open Science & Data Collaborations (OSDC) program at the Carnegie Mellon University Libraries that provides Open Science tools, training, collaboration opportunities, and community-building events to support Open Research and Open Science adoption. This program presents a unique end-to-end model for Open Science programs because it extends open science support beyond open repositories and open access publishing to the entire research lifecycle.

view more

Partitioning of MLX-family transcription factors to lipid droplets regulates metabolic gene expression

Molecular Cell

2020 Lipid droplets (LDs) store lipids for energy and are central to cellular lipid homeostasis. The mechanisms coordinating lipid storage in LDs with cellular metabolism are unclear but relevant to obesity-related diseases. Here we utilized genome-wide screening to identify genes that modulate lipid storage in macrophages, a cell type involved in metabolic diseases. Among ∼550 identified screen hits is MLX, a basic helix-loop-helix leucine-zipper transcription factor that regulates metabolic processes. We show that MLX and glucose-sensing family members MLXIP/MondoA and MLXIPL/ChREBP bind LDs via C-terminal amphipathic helices.

view more

The Evolution of Information Literacy Outcomes in Interdisciplinary Undergraduate Science Courses

Issues in Science and Technology Librarianship

2019 The ACRL Framework for Information Literacy presents opportunities for moving beyond ‘one-shot’information literacy sessions and creating a more scaffolded and embedded approach for instruction. We collaborated with faculty at Carnegie Mellon University to create Framework-inspired information literacy learning objectives for first-year and third-year science undergraduates and are continuously refining the objectives as the curriculum continues to evolve. This article describes our learning objective design and refinement process, challenges encountered, and ideas on how to create opportunities for embedding information literacy into a curriculum. We also share our full activity lesson plans and assessment tool.

view more

Functional contribution of the spastic paraplegia-related triglyceride hydrolase DDHD2 to the formation and content of lipid droplets


2018 Deleterious mutations in the serine lipase DDHD2 are a causative basis of complex hereditary spastic paraplegia (HSP, subtype SPG54) in humans. We recently found that DDHD2 is a principal triglyceride hydrolase in the central nervous system (CNS) and that genetic deletion of this enzyme in mice leads to ectopic lipid droplet (LD) accumulation in neurons throughout the brain. Nonetheless, how HSP-related mutations in DDHD2 relate to triglyceride metabolism and LD formation remains poorly understood. Here, we have characterized a set of HSP-related mutations in DDHD2 and found that they disrupt triglyceride hydrolase activity in vitro and impair the capacity of DDHD2 to protect cells from LD accumulation following exposure to free fatty acid, an outcome that was also observed with a DDHD2-selective inhibitor.

view more

Seipin is required for converting nascent to mature lipid droplets


2016 How proteins control the biogenesis of cellular lipid droplets (LDs) is poorly understood. Using Drosophila and human cells, we show here that seipin, an ER protein implicated in LD biology, mediates a discrete step in LD formation—the conversion of small, nascent LDs to larger, mature LDs. Seipin forms discrete and dynamic foci in the ER that interact with nascent LDs to enable their growth. In the absence of seipin, numerous small, nascent LDs accumulate near the ER and most often fail to grow. Those that do grow prematurely acquire lipid synthesis enzymes and undergo expansion, eventually leading to the giant LDs characteristic of seipin deficiency.

view more