메뉴 건너뛰기
Library Notice
Institutional Access
If you certify, you can access the articles for free.
Check out your institutions.
ex)Hankuk University, Nuri Motors
Log in Register Help KOR
Subject

Improving Speech Act Annotation Based on Human-AI Model Agreement
Recommendations
Search

사람과 AI 모델 간의 화행 주석 개선 방향 연구

논문 기본 정보

Type
Academic journal
Author
youngsook song (경희대학교) Cho Won Ik (서울대학교)
Journal
The Society for Korean Language & Literary Research 어문연구(語文硏究) 어문연구(語文硏究) 제52권 제1호 KCI Accredited Journals
Published
2024.3
Pages
71 - 92 (22page)

Usage

cover
Improving Speech Act Annotation Based on Human-AI Model Agreement
Ask AI
Recommendations
Search

Abstract· Keywords

Report Errors
In this study, we aim at improving the quality of speech act annotation by identifying and analyzing the annotation agreement between humans and the GPT-4 model to identify the categories of speech acts that the GPT-4 model frequently misclassifies. For this purpose, we selected a total of 33,138 sentences from the National Institute of Korean Language’s Messenger corpus (2022). The annotation was performed by human annotators and the GPT-4 model, and the classification accuracy was evaluated using the RoBERTa model. The accuracy score for the human-annotated messenger corpus is 85.50, while the accuracy score for the cases where both humans and the GPT-4 model annotated identically is 96.32. In particular, the category of speech acts that the GPT-4 model most frequently misclassifies is “sarcasm/humor.” This is likely due to highly frequent expressions such as “ㅋㅋㅋ ” (lol) in the messenger corpus, which causes the model to incorrectly analyze even “Statement” as “sarcasm/humor.” For example, even when a human annotator classified the utterance “I have snacks and ramen. ㅋㅋㅋ” as “Statement”, the GPT-4 model misclassified it as “sarcasm/humor.” In an experiment conducted using only the sentences where the model and human annotators agreed, the accuracy score reached 95.32. Instead of merely discussing annotator agreement for newly proposed corpora, it is necessary to distinguish speech acts that are difficult to annotate from those that are relatively easy. Furthermore, within these categories, it is essential to select sentences that are challenging to automatically annotate and conduct research focused on more challenging areas.

Contents

No content found

References (0)

Add References

Recommendations

It is an article recommended by DBpia according to the article similarity. Check out the related articles!

Recently viewed articles

Comments(0)

0

Write first comments.