This study aims to examine any practical optimal conditions for pronunciation assessment that is used for evaluating Korean-speaking abilities, based on such a theory. In this study, the initial review of the concept of speaking performance assessment led to an assumption that interactions amongst tasks, raters, and examinee affect speaking performance evaluation. Based on such an assumption, an analysis was conducted applying the Generalizability Theory for the three facets. To this end, 33 examinee were given computer-based tests on three categories for speaking performance evaluation. Six raters then performed holistic and analytical ratings for pronunciation scale. Using the rating results, a G study was conducted on tasks, raters, and examinees, based on which a D Study was finally carried out.
First off, the result of the G Study indicates that the examinee's abilities are the error facet that accounts for the largest variance component. The very fact that this rating method reflects the differences among the test takers' abilities above all else, attests to the validity of this test. During the D study, reliability of analytical rating turned out to be relatively high compared to holistic rating. And Although reliability factor could show a significant rise by increasing the number of rater from one to two, any more increase in the number of raters showed merely a small growth in reliability factor. In the meantime, while increase in the number of tasks from one to two could result in a significant rise in reliability factor, any further increase in the number of tasks only exhibited a very little improvement in reliability factor. In this respect, such study results seem to be meaningful enough to help any consideration for practicality of speaking performance assessment.