메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색
질문

논문 기본 정보

자료유형
학술저널
저자정보
Hyun Cho (Kookmin University) Yeojin Chung (Kookmin University)
저널정보
한국데이터정보과학회 한국데이터정보과학회지 한국데이터정보과학회지 제31권 제1호
발행연도
2020.1
수록면
209 - 220 (12page)
DOI
10.7465/jkdi.2020.31.1.209

이용수

표지
📌
연구주제
📖
연구배경
🔬
연구방법
🏆
연구결과
AI에게 요청하기
추천
검색
질문

초록· 키워드

오류제보하기
Compared to clustering numerical data, clustering algorithms for categorical data have not been extensively studied, particularly for data with high-cardinality attributes. When categorical attributes have a large number of levels, clustering algorithms tend to suffer from the curse of dimensionality. In this study, we verified that a good clustering performance can be achieved in the presence of categorical attributes by com-bining clustering algorithms typically applied to numerical data with word embedding methods. Using word embedding methods that were originally developed for natural language processing, the levels of categorical attributes can be represented in a vector space, where the resulting embedding vectors would reflect the relationship between frequently appearing categories. We utilized Word2vec, GloVe, and fastText for category embedding. We also applied K-means and Gaussian mixture model for clustering the embedded data. The clustering performance of the proposed methods was compared with that of typical clustering algorithms for categorical data, namely, K-mode and robust clustering using links. In a simulation study and experiments employing real-life examples, the Gaussian mixture model with GloVe had the best performance, especially when the number of observations and complexity of data was increased.

목차

Abstract
1. Introduction
2. Backgrounds
3. Clustering with category embedding
4. Simulation experiments
5. Real data example
6. Conclusion
References

참고문헌 (45)

참고문헌 신청

함께 읽어보면 좋을 논문

논문 유사도에 따라 DBpia 가 추천하는 논문입니다. 함께 보면 좋을 연관 논문을 확인해보세요!

이 논문의 저자 정보

최근 본 자료

전체보기

댓글(0)

0

UCI(KEPA) : I410-ECN-0101-2020-041-000379850