Date
Investigating AI's role in educational assessments, this study compares AI- provided and teacher scores of hand-drawn scientific models by Multilingual Language Learners (MLLs) in elementary classrooms. Using Convolutional Neural Networks (CNN) for scoring, we aligned AI assessments with those of experienced teachers. The results show moderate agreement (Kappa = 0.326), with AI favoring mid-range scores, while teachers provided a broader score spectrum. This suggests AI's consistency may miss the interpretive nuances teachers offer. The study emphasizes careful AI integration to support the diverse assessments of MLLs, though it notes the limitations of a small sample size and the opaque AI scoring rationale. Our findings advocate for combining AI's analytical strengths with teacher expertise to enhance equitable, effective educational assessments.