메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

이동준 (서울대학교, 서울대학교 대학원)

발행연도
2018
저작권
서울대학교 논문은 저작권에 의해 보호받습니다.

이용수0

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (2)

초록· 키워드

오류제보하기
Word embedding is a strategy of mapping each word from a continuous vector space into one vector. It is the starting point of natural language processing task and greatly impacts the performance. Word2vec and Glove are among the most popular and widely used word embedding models. However, these models have limitations in that it is unable to learn the shared structure of words nor sub-word meanings. This is a serious limitation for morphologically rich languages such as Korean.
In this paper, we propose a new model which is an expansion of the previous skip-gram model to learn the sub-word information. The model defines each word vector as a sum of its morpheme vectors and hence, learns the vectors of morphemes. To test the efficiency of our embedding, we conducted a word similarity test and a word analogy test. Furthermore, by using our trained vectors as an input to the previous text classification model, we tested how much performance has actually been enhanced.

목차

I. Introduction 1
II. RelatedWorks 4
2.1 Skip-gram model[1] 4
2.2 Limitations of the skip-gram model 6
2.3 Existing study on Korean word embedding 8
III. Morpheme-based word embedding model 9
IV. Experiments 12
4.1 Training Corpus and Implementation Details 12
4.2 Evaluation Methods 12
4.3 Intrinsic Evaluation 13
4.3.1 Word Similarity Test 13
4.3.2 Word Analogy Test 15
4.3.3 Morpheme Vectors Visualization 16
4.4 Extrinsic Evaluation 18
V. Conclusion 21
References 22
Appendix 24
초록 25

최근 본 자료

전체보기

댓글(0)

0