상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
학술저널

Investigating the Feasibility of Generic Scoring Models of E-rater for TOEFL iBT Independent Writing Tasks

Investigating the Feasibility of Generic Scoring Models of E-rater for TOEFL iBT Independent Writing Tasks

  • 54
124160.jpg

The current study reports the findings from Phase 2 of a larger research study undertaken toinvestigate the feasibility of using generic scoring models for e-rater in the context of scoringessays for independent writing tasks for TOEFL CBT and TOEFL iBT. In Phase 1, sixdifferent variants of generic and hybrid scoring models of e-rater were created based ontransformed writing data from three different samples of TOEFL CBT prompts (n1=20, n2=20,n3= 40) with the help of ETS (Educational Testing Service) staff and then evaluated on aseparate sample of seven TOEFL CBT prompts (Lee, 2016). In the present investigation, thesesix generic/hybrid models were used, along with prompt-specific models, to score a total of3,126 essays written for two TOEFL iBT independent writing tasks from a field study and theirperformance was evaluated. Results of the analysis showed that (a) there were relatively smallscore variations among different automated scoring models and (b) similar levels of scoreagreement were achieved between the human-human rater pair and various human-automatedrater pairs, although the prompt-specific model behaved most similarly to the human raters. Interms of criterion-related validity of scores, the human rater scores turned out to be somewhatbetter indicators of test-takers’ overall ESL (English as a Second Language) languageproficiency than the automated scores in general. Nevertheless, the comparative advantage invalidity of human rater scores (over automated scores) seemed to diminish significantly, whenmore direct writing measures, such as scores for TOEFL CBT independent writing tasks, wereused as criterion measures.

I. INTRODUCTION

II. MODEL BUILDING FOR AUTOMATED ESSAY SCORING

III. METHOD

IV. RESULT AND DISCUSSION

V. SUMMARY AND CONCLUSIONS

ACKNOWLEDGEMENTS

REFERENCES

(0)

(0)

로딩중