The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

TitleThe Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations
Publication TypeJournal Article
Year of Publication2016
AuthorsHa, M, Nehm, RH
JournalJournal of Science Education and Technology
Type of ArticleJournal Article
ISSN1059-0145 1573-1839
KeywordsComputer scoring Open-ended assessment Misspelled words Machine learning Misclassification Computers Assessment lexical analysis AACR
AbstractAutomated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non- ELLs? (2) To what degree do MSW impact conceptspecific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naı¨ve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments.


thumbnail of small NSF logo in color without shading

This material is based upon work supported by the National Science Foundation (DUE grants: 1438739, 1323162, 1347740, 0736952 and 1022653). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.