소셜 폭소노미를 활용한 의미기반 트윗 군집화 및 요약기법 : Semantic Based Tweet Clustering and Summarization Exploiting Social Folksonomy :

허지욱

추천

검색

자료유형: 학위논문

저자정보: 허지욱 (한양대학교, 한양대학교 대학원)

지도교수: 이동호

발행연도: 2016

저작권: 한양대학교 논문은 저작권에 의해 보호받습니다.

이용수0

이 논문의 연구 히스토리 (2)

2018

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

허지욱 JIPS(Journal of Information Processing Systems) 2018.01 학술저널

2016

소셜 폭소노미를 활용한 의미기반 트윗 군집화 및 요약기법 : Semantic Based Tweet Clustering and Summarization Exploiting Social Folksonomy

허지욱 2016.01 학위논문

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

As the rapid growth of the Internet and smart multimedia devices, users can obtain general and common information from a folksonomy system and a Socail Network Services, such as Wikipedia, Flickr, Del.ici.ous. Twitter and Facebook. However, users must manually review all of the searched documents without any assistance from search engines, which requires too much time and effort. Therefor, it is necessary to analyze and refine these contents and then cluster them by corresponding to the interest of the user.
In this thesis, we have presented a novel semantic based tweet clustering and summarization system using by TagCluster that is collective intelligence from the Flickr to support analysis and calculation of words and sentences for Twitter which is one of the Social Network Services. The proposed system consists of 1) semantic based tweet clustering algorithm and 2) semantic based tweet summarization by exploiting folksononmy and user influence semantic analysis.
For semantic based tweet clustering, we propose semantic based K-means clustering algorithm which not only measures the similarity between the data represented as vector space model but also measures the semantic similarity between the data by exploiting the TagCluster for clustering a large volume of tweets. Tweet is often too short and informal to provide sufficient information possesses a major challenge. Therefore, previous clustering algorithm handles multimedia which provides a lot of data for analyzing information, such as documents, images, and videos, but not enough to apply to SNSs data which lacks sufficient contextual information.
For semantic based tweet summarization, we designed a novel document summarization system called FoDoSu that employs the TagClusters used by Flickr, a folksonomy system, for detecting key sentences from multiple documents. When analyzing the semantics of the words, there are many proper nouns and newly-coined words in the documents such as the names of people and products. It is hard to analyze the semantics of these words using WordNet because it does not cover proper nouns and newly-coined words. For this reason, we use the Flickr TagCluster instead of WordNet when analyzing the semantics of proper nouns and newly-coined words. The proposed method consists of word analysis step and sentence analysis step. In word analysis step. In word analysis step, we create a word frequency table for analyzing the semantics and contributions of words using LiteHITS algorithm which is modified the HITS algorithm. Then, by exploiting TagClusters, we analyze the semantic relationships between words in the word frequency table. In sentence analysis step, we create a summary of multiple documents by analyzing the importance of each word and its semantic relatedness to others. And then we extract the most meaningful tweets in each cluster, we also propose a new tweet summarization technique that analyzes the twitter user information for measuring the influence of users and exploits our designed document summarization method. Finally, through the experimental results, we show the effect of tweet summarization technique.

#트위터 #요약 #군집화

국문 요지 1
1. Introduction 4
2. Related Work 11
2.1 Tweet Clustering Technique 12
2.2 Tweet Summarization Technique 15
3. Folksonomy based Tweet Clustering and Summarization 22
3.1 System Architecture 24
3.2 Preprocessing 25
3.3 Semantic based K-means Clustering 26
3.3.1 Expand the semantic of tweet 26
3.3.2 Semantic Similarity 28
3.4 User Influence Analysis 31
3.5 Folksonomy based Document Summarization 34
3.5.1 Summarization Method Architecture 36
3.5.2 Preprocessing 38
3.5.3 Word Analysis 38
3.5.4 Sentence Analysis 47
3.6 Tweet Summarization 52
4. Experiments and Evaluation 57
4.1 Clustering Evaluation 57
4.1.1 Tweet DataSet and Evaluation Metric 57
4.1.2 Evaluation Tweet Clustering 58
4.2 Summarization Evaluation 63
4.2.1 Documnet Dataset and Evaluation Metric 63
4.2.2 Parameter Optimization 64
4.2.3 Experimental results on TAC2008 67
4.2.4 Experimental results on TAC2009 68
4.2.5 Effect of TagCluster 70
4.2.6 Comparison of Performance 73
4.2.7 Scope of Semantic Analysis of Words in Wordnet and
TagCluster 75
4.2.8 Evaluation Tweet Summarization 77
5. Conclusion 80
References 83
ABSTRACT 94

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (2)

초록· 키워드

목차

최근 본 자료

댓글(0)