Comparison of Human and ChatGPT on Holistic and Analytic Scoring of College EFL Opinion Writing

한국영미어문학회
영미어문학
제158호
2025.09

113 - 136 (24 pages)
DOI : 10.21297/ballak.2025.158.113

Writing evaluation can be done using different approaches such as holistic and analytic scoring, and by different raters, including human raters and machine. This study investigated the inter-rater reliability in holistic and analytic scoring by human raters and ChatGPT and the associations between average scores assigned by the two raters. Data consisted of one-paragraph opinion essays (n = 196) written by 28 South Korean college freshmen, which were scored by two human raters and ChatGPT across two trials. Inter-rater reliability, independent-samples t-tests, and correlations were computed. Results indicated that both humans and ChatGPT demonstrated substantial consistency, with humans showing greater reliability in holistic judgments, while ChatGPT exhibited stronger reliability in analytic categories of language and organization. Holistic scores did not differ significantly between the two raters; however, humans tended to be stricter in content and organization, whereas ChatGPT was stricter in language and mechanics. Strong correlations across all scores further suggested that humans and ChatGPT produced comparable rank-order evaluations. Overall, these findings support the theoretical validity of incorporating AI into writing assessment and highlight its pedagogical potential as a complement to human raters in EFL writing evaluation.

1. Introduction

2. Literature Review

3. Methods

4. Results

5. Discussion and Conclusion

References

Comparison of Human and ChatGPT on Holistic and Analytic Scoring of College EFL Opinion Writing

(0)

(0)

(0)

(0)

Comparison of Human and ChatGPT on Holistic and Analytic Scoring of College EFL Opinion Writing

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

(0)