Yi Chen is a professor and the Henry J. Leir Chair in Healthcare in Martin Tuchman School of Management, with a joint appointment in Ying Wu College of Computing at New Jersey Institute of Technology (NJIT). Prior to joining NJIT, she was an associate professor at Arizona State University. She received her Ph.D. degree in computer science from the University of Pennsylvania in 2005 and B.S. from Central South University in 1999. She and her research group develop cutting-edge database, data mining and machine learning techniques with applications in business, health care and the web.
Some of her projects include information discovery on big data, social media mining for health care, computational advertising, social computing, workflow management and information integration. She has served on the organization and program committees for prestigious conferences, including SIGMOD, VLDB, ICDE, CIKM and SIGIR, served as an associate editor for TKDE, DAPD, PVLDB, INFORMS Journal on Computing, ECRA, and Journal of Healthcare Informatics Research, as well as a general chair for SIGMOD'2012. She also served as the inaugural director for the P.hD. program in business data science at NJIT.
Chen is a recipient of a Peter Chen Big Data Young Researcher Award, Excellence in Research Prize (NJIT), Outstanding Faculty Researcher in Computer Science and Engineering (ASU), Google Research Award, IBM Faculty Award and an NSF CAREER Award. Her research is funded by NSF, Leir Charitable Foundations, Google, IBM, Science Foundation Arizona and the Department of Defense.
Areas of Expertise (8)
Excellence in Research , MTSM, NJIT
Peter Chen Big Data Young Researcher Award
Henry J. Leir Chair in Healthcare, Leir Charitable Foundations
Leir Best Paper Award, Second Prize
University of Pennsylvania: PhD, Computer Science 2005
University of Pennsylvania: MS, Computer Science 2000
Central South University: BS, Computer Science 1999
Research Focus (1)
Healthcare Impact Of Economic and Financial Crises
Currently focusing on the State Of New Jersey we are investigating the impact and economic crises and other economic effects such as technological unemployment on the health of NJ residents as revealed in hospital admissions and other data.
Research Grants (1)
CAREER: Analyzing and Exploiting Meta-information for Keyword Search on Semi-structured Data
National Science Foundation $384,342
he goal of this research project is to provide high-quality keyword search results on semi-structured data in XML format. To address the challenge of handling inherent ambiguity in keyword search, fundamental techniques and an effective search engine are developed that exploit the meta-information in the data in order to infer user search intention and to achieve high search quality. The project includes novel research on the following key areas: (1) Query Result Generation: identifying relevant nodes in XML data and composing atomic and intact query results, each of which represents an object of the inferred user search goal; (2) Query Result Presentation: developing techniques for result ranking, snippet generation, and result clustering, in order to help users quickly find the most relevant results; (3) Advanced Queries and Data Models: supporting expressive search options and handling XML data with rich constraints; and (4) Efficiency: developing techniques for performance optimization, including indexes, materialized views, and top-k query processing. Furthermore, an axiomatic evaluation framework is initiated for formally reasoning about XML keyword search strategies. The success of the project will advance the state-of-the-art of keyword search on XML data, enhance the research and education infrastructure in this area, and have broader impacts on both general public as well as scientific communities for information discovery. This research is integrated with education through curriculum enhancement, student advising, workshops as well as outreach programs. Publications, software and course materials that are resulted from this project will be disseminated via the project website (http://www.public.asu.edu/~ychen127/xseek/).
Ackerman, Brian, & Wang, Chong, & Chen, Yi
Recommender systems are changing the way that people find information, products, and even other people. This paper studies the problem of leveraging the context of the items presented to the user in a user/system interaction session to improve the recommender system's ranking prediction. We propose a novel model that incorporates the opportunity cost of giving up the other items in the session and computes session‐specific relevance values for items for context‐aware recommendation. The model can work on a variety of different problems settings with emphasis on implicit user feedback as it supports varying levels of ordinal relevance. Experimental evaluation demonstrates the advantages of our new model with respect to the ranking quality.
Wang, Chong, & Zhao, Shuai, & Kalra, Achir, & Borcea, Cristian, & Chen, Yi
Display advertising is the most important revenue source for publishers in the online publishing industry. The ad pricing standards are shifting to a new model in which ads are paid only if they are viewed. Consequently, an important problem for publishers is to predict the probability that an ad at a given page depth will be shown on a user's screen for a certain dwell time. This paper proposes deep learning models based on Long Short-Term Memory (LSTM) to predict the viewability of any page depth for any given dwell time. The main novelty of our best model consists in the combination of bi-directional LSTM networks, encoder-decoder structure, and residual connections. The experimental results over a dataset collected from a large online publisher demonstrate that the proposed LSTM-based sequential neural networks outperform the comparison methods in terms of prediction performance.
Wang, Chong, & Zhao, Shuai, & Kalra, Achir, & Borcea, Cristian, & Chen, Yi
A half of online display ads are not rendered viewable because the users do not scroll deep enough or spend sufficient time at the page depth where the ads are placed. In order to increase the marketing efficiency and ad effectiveness, there is a strong demand for viewability prediction from both advertisers and publishers. This paper aims to predict the dwell time for a given (user, page, depth) triplet based on historic data collected by publishers. This problem is difficult because of user behavior variability and data sparsity. To solve it, we propose predictive models based on Factorization Machines and Field‐aware Factorization Machines in order to overcome the data sparsity issue and provide flexibility to add auxiliary information such as the visible area of a user's browser. In addition, we leverage the prior dwell time behavior of the user within the current page view, that is, time series information, to further improve the proposed models. Experimental results using data from a large web publisher demonstrate that the proposed models outperform comparison models. Also, the results show that adding time series information further improves the performance.
Chen, Yi, & Li, Zhengzheng, & Barkaoui, K., & Wu, N., & Zhou, Mengchu
This work proposes a novel structure in Petri nets, namely data inhibitor arcs, and their application to the optimal supervisory control of Petri nets. A data inhibitor arc is an arc from a place to a transition labeled with a set of integers. A transition is disabled by a data inhibitor arc if the number of tokens in the place is in the set of integers labeled on it. Its formal definitions and properties are given. Then, we propose a method to design an optimal Petri net supervisor with data inhibitor arcs to prevent a system from reaching illegal markings with respect to control specifications. Two techniques are developed to reduce the supervisor structure by compressing the number of control places. Finally, a number of examples are used to illustrate the proposed approaches and experimental results show that they can obtain optimal Petri net supervisors for the net models that cannot be optimally controlled by pure net supervisors. A significant result is that the proposed approach can always lead to an optimal supervisor with only one control place for bounded Petri nets on the premise that such a supervisor exists.
Liu, Yunzhong, & Shi, Jinhe, & Chen, Yi
Adverse Drug Reactions (ADRs) have become a serious health problem and even a leading cause of death in the United States. Pre‐marketing clinical trials and traditional post‐marketing surveillance using voluntary and spontaneous report systems are insufficient for ADR detection. On the other hand, online health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. In this article, we present patient‐centered and experience‐aware mining framework for effective ADR discovery using online health forum data. Our experimental evaluation with both an official ADR knowledge base and human‐annotated ground truth verifies the effectiveness of the proposed method for ADR discovery.