The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges
Chinese/English Journal of Educational Measurement and Evaluation2024
The integration of artificial intelligence (AI) in educational measurement has transformed assessment methods, allowing for automated scoring, swift content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide valuable insights into student performance while also enhancing the overall assessment experience. However, the implementation of AI in education also raises significant ethical concerns regarding validity, reliability, transparency, fairness, and equity. Issues such as algorithmic bias and the opacity of AI decision-making processes risk perpetuating inequalities and affecting assessment outcomes. In response, various stakeholders, including educators, policymakers, and testing organizations, have developed guidelines to ensure the ethical use of AI in education. The National Council of Measurement in Education’s Special Interest Group on AI in Measurement and Education (AIME) is dedicated to establishing ethical standards and advancing research in this area. In this paper, a diverse group of AIME members examines the ethical implications of AI-powered tools in educational measurement, explores significant challenges such as automation bias and environmental impact, and proposes solutions to ensure AI’s responsible and effective use in education.
View more
Exploring the Long-Term Effects of the Statewide Implementation of an Automated Writing Evaluation System on Students’ State Test ELA Performance
International Journal of Artificial Intelligence in Education2024
Automated writing evaluation (AWE) is an artificial intelligence (AI)-empowered educational technology designed to assist writing instruction and improve students’ writing proficiency. The present study adopted a quasi-experimental design using the inverse probability of treatment weighting method to explore the long-term effects of an AWE system known as Utah Compose on students’ state test English Language Arts (ELA) performance. The participants included 134,425 students in Grades 4–10 in Utah from school year 2015 to 2018. Findings showed AWE’s cumulative benefit to students’ ELA performance, but those cumulative effects decreased each year and peaked after three years of implementation. This study is the largest evaluation of AWE effects to date in terms of both its sample size and the duration of investigation. The findings regarding AWE’s cumulative effects on students’ state test ELA performance, which is a distal outcome at the state level, have significant implications for policy and practice regarding large-scale AWE implementation.
View more
Writing Motivation and Ability Profiles and Transition During a Technology-Based Writing Intervention
Frontiers in Psychology2023
We identified writing motivation and ability profiles and transition paths of 2,487 U.S. middle-school students participating in an automated writing evaluation (AWE) intervention using MI Write. Four motivation and ability profiles emerged from a latent transition analysis with self-reported writing self-efficacy, attitudes toward writing, and writing ability measures: Low, Low/Mid, Mid/High, and High. Most students started the school year in the Low/Mid (38%) and Mid/High (30%) profiles. Only 11% of students started the school year in the High profile. Between 50 and 70% of students maintained the same profile in the Spring. Approximately 30% of students were likely to move one profile higher in the Spring.
View more
Examining Human and Automated Ratings of Elementary Students’ Writing Quality: A Multivariate Generalizability Theory Application
American Educational Research Journal2022
We used multivariate generalizability theory to examine the reliability of hand-scoring and automated essay scoring (AES) and to identify how these scoring methods could be used in conjunction to optimize writing assessment. Students (n = 113) included subsamples of struggling writers and non-struggling writers in Grades 3–5 drawn from a larger study. Students wrote six essays across three genres. All essays were hand-scored by four raters and an AES system called Project Essay Grade (PEG). Both scoring methods were highly reliable, but PEG was more reliable for non-struggling students, while hand-scoring was more reliable for struggling students.
View more
Upper-Elementary Students’ Metacognitive Knowledge about Writing and Its Relationship to Writing Outcomes across Genres
The Elementary School Journal2022
This study investigated fourth and fifth graders’ metacognitive knowledge about writing and its relationship to writing performance to help identify areas that might be leveraged when designing effective writing instruction. Students’ metacognitive knowledge was probed using a 30-minute informative writing prompt requiring students to teach their reader how to be a good writer (i.e., a metawriting task). The metawriting task was coded for eight dimensions of metacognitive knowledge. Students’ writing performance was assessed via additional 30-minute prompts—two narrative, one informative, two persuasive—and evaluated for quality and length using automated essay scoring.
View more
Integrating goal-setting and automated feedback to improve writing outcomes: a pilot study
Innovation in Language Learning and Teaching2022
Purpose
This study presents results from a pilot intervention that integrated self-regulation through reflection and goal setting with automated writing evaluation (AWE) technology to improve students’ writing outcomes.
Methods
We employed a single-group pretest-posttest design. All students in Grades 5–8 (N = 56) from one urban, all female, public-charter middle school completed pretest and posttest measures of writing beliefs and writing performance. In between pretest and posttest, students completed monthly goal-setting activities via a Qualtrics survey and monthly persuasive writing practice via prompts completed within an AWE system.
View more
Investigating the promise of automated writing evaluation for supporting formative writing assessment at scale
Assessment in Education: Principles, Policy & Practice2022
We investigated the promise of a novel approach to formative writing assessment at scale that involved an automated writing evaluation (AWE) system called MI Write. Specifically, we investigated elementary teachers’ perceptions and implementation of MI Write and changes in students’ writing performance in three genres from Fall to Spring associated with this implementation. Teachers in Grades 3–5 (n = 14) reported that MI Write was usable and acceptable, useful, and desirable; however, teachers tended to implement MI Write in a limited manner. Multilevel repeated measures analyses indicated that students in Grades 3–5 (n = 570) tended not to increase their performance from Fall to Spring except for third graders in all genres and fourth graders’ narrative writing.
View more