상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
학술저널

A Comparative Study Between Human Raters and ChatGPT in L2 Speaking Performance Assessment

A Comparative Study Between Human Raters and ChatGPT in L2 Speaking Performance Assessment

  • 48
영어교육연구 제35권 4호.jpg

This study aimed to compare L2 learners’ oral performance scores by human raters with ChatGPT in multiple scoring categories. It also examined score consistency within each group. A total of 39 university students participated in this study, completing various speaking tasks including a self- introduction and two picture descriptions. Statistical analyses using ANOVA and the Multi-Facets Rasch Model (MFRM) were used to analyze the scoring data from both human raters and ChatGPT. The ANOVA results revealed significant mean differences between the two rater groups across all three speaking tasks and all scoring categories (content/organization, vocabulary, collocation use, grammar), except for the category of collocation use in self-introduction. While participants’ vocabulary and collocation use showed gradual improvement in speaking tasks, substantial gains in content/organization and grammar were not observed due to time limitations. Furthermore, the MFRM results indicated that ChatGPT demonstrated higher sensitivity in discerning score patterns across different task types and scoring categories compared to the human raters. Overall, human raters consistently assigned higher scores across all task types. Although both groups scored differently, they maintained consistency in scoring, indicating agreement in their evaluation standards. Pedagogical implications pertaining to the potential advantages of integrating AI-based technology into performance-based assessments were further discussed. (201 words)

I. INTRODUCTION

II. LITERATURE REVIEW

III. METHODS

IV. RESULTS

V. DISCUSSION AND CONCLUSION

REFERENCES

(0)

(0)

로딩중