Culturally and linguistically “Blind” or Biased? Challenges for AI Assessment of Models with Multiple Language Students
Investigating AI's role in educational assessments, this study compares AI- provided and teacher scores of hand-drawn scientific models by Multilingual Language Learners (MLLs) in elementary classrooms. Using Convolutional Neural Networks (CNN) for scoring, we aligned AI assessments with those of experienced teachers. The results show moderate agreement (Kappa = 0.326), with AI favoring mid-range scores, while teachers provided a broader score spectrum. This suggests AI's consistency may miss the interpretive nuances teachers offer.

