Investigating the Feasibility of Generic Scoring Models of E-rater for TOEFL iBT Independent Writing Tasks
Investigating the Feasibility of Generic Scoring Models of E-rater for TOEFL iBT Independent Writing Tasks
- 팬코리아영어교육학회
- 영어교육연구
- 제28권 1호
-
2016.03101 - 122 (22 pages)
- 54
The current study reports the findings from Phase 2 of a larger research study undertaken toinvestigate the feasibility of using generic scoring models for e-rater in the context of scoringessays for independent writing tasks for TOEFL CBT and TOEFL iBT. In Phase 1, sixdifferent variants of generic and hybrid scoring models of e-rater were created based ontransformed writing data from three different samples of TOEFL CBT prompts (n1=20, n2=20,n3= 40) with the help of ETS (Educational Testing Service) staff and then evaluated on aseparate sample of seven TOEFL CBT prompts (Lee, 2016). In the present investigation, thesesix generic/hybrid models were used, along with prompt-specific models, to score a total of3,126 essays written for two TOEFL iBT independent writing tasks from a field study and theirperformance was evaluated. Results of the analysis showed that (a) there were relatively smallscore variations among different automated scoring models and (b) similar levels of scoreagreement were achieved between the human-human rater pair and various human-automatedrater pairs, although the prompt-specific model behaved most similarly to the human raters. Interms of criterion-related validity of scores, the human rater scores turned out to be somewhatbetter indicators of test-takers’ overall ESL (English as a Second Language) languageproficiency than the automated scores in general. Nevertheless, the comparative advantage invalidity of human rater scores (over automated scores) seemed to diminish significantly, whenmore direct writing measures, such as scores for TOEFL CBT independent writing tasks, wereused as criterion measures.
I. INTRODUCTION
II. MODEL BUILDING FOR AUTOMATED ESSAY SCORING
III. METHOD
IV. RESULT AND DISCUSSION
V. SUMMARY AND CONCLUSIONS
ACKNOWLEDGEMENTS
REFERENCES
(0)
(0)