메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

허지욱 (한양대학교, 한양대학교 대학원)

지도교수
이동호
발행연도
2016
저작권
한양대학교 논문은 저작권에 의해 보호받습니다.

이용수0

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (2)

초록· 키워드

오류제보하기
As the rapid growth of the Internet and smart multimedia devices, users can obtain general and common information from a folksonomy system and a Socail Network Services, such as Wikipedia, Flickr, Del.ici.ous. Twitter and Facebook. However, users must manually review all of the searched documents without any assistance from search engines, which requires too much time and effort. Therefor, it is necessary to analyze and refine these contents and then cluster them by corresponding to the interest of the user.
In this thesis, we have presented a novel semantic based tweet clustering and summarization system using by TagCluster that is collective intelligence from the Flickr to support analysis and calculation of words and sentences for Twitter which is one of the Social Network Services. The proposed system consists of 1) semantic based tweet clustering algorithm and 2) semantic based tweet summarization by exploiting folksononmy and user influence semantic analysis.
For semantic based tweet clustering, we propose semantic based K-means clustering algorithm which not only measures the similarity between the data represented as vector space model but also measures the semantic similarity between the data by exploiting the TagCluster for clustering a large volume of tweets. Tweet is often too short and informal to provide sufficient information possesses a major challenge. Therefore, previous clustering algorithm handles multimedia which provides a lot of data for analyzing information, such as documents, images, and videos, but not enough to apply to SNSs data which lacks sufficient contextual information.
For semantic based tweet summarization, we designed a novel document summarization system called FoDoSu that employs the TagClusters used by Flickr, a folksonomy system, for detecting key sentences from multiple documents. When analyzing the semantics of the words, there are many proper nouns and newly-coined words in the documents such as the names of people and products. It is hard to analyze the semantics of these words using WordNet because it does not cover proper nouns and newly-coined words. For this reason, we use the Flickr TagCluster instead of WordNet when analyzing the semantics of proper nouns and newly-coined words. The proposed method consists of word analysis step and sentence analysis step. In word analysis step. In word analysis step, we create a word frequency table for analyzing the semantics and contributions of words using LiteHITS algorithm which is modified the HITS algorithm. Then, by exploiting TagClusters, we analyze the semantic relationships between words in the word frequency table. In sentence analysis step, we create a summary of multiple documents by analyzing the importance of each word and its semantic relatedness to others. And then we extract the most meaningful tweets in each cluster, we also propose a new tweet summarization technique that analyzes the twitter user information for measuring the influence of users and exploits our designed document summarization method. Finally, through the experimental results, we show the effect of tweet summarization technique.

목차

국문 요지 1
1. Introduction 4
2. Related Work 11
2.1 Tweet Clustering Technique 12
2.2 Tweet Summarization Technique 15
3. Folksonomy based Tweet Clustering and Summarization 22
3.1 System Architecture 24
3.2 Preprocessing 25
3.3 Semantic based K-means Clustering 26
3.3.1 Expand the semantic of tweet 26
3.3.2 Semantic Similarity 28
3.4 User Influence Analysis 31
3.5 Folksonomy based Document Summarization 34
3.5.1 Summarization Method Architecture 36
3.5.2 Preprocessing 38
3.5.3 Word Analysis 38
3.5.4 Sentence Analysis 47
3.6 Tweet Summarization 52
4. Experiments and Evaluation 57
4.1 Clustering Evaluation 57
4.1.1 Tweet DataSet and Evaluation Metric 57
4.1.2 Evaluation Tweet Clustering 58
4.2 Summarization Evaluation 63
4.2.1 Documnet Dataset and Evaluation Metric 63
4.2.2 Parameter Optimization 64
4.2.3 Experimental results on TAC2008 67
4.2.4 Experimental results on TAC2009 68
4.2.5 Effect of TagCluster 70
4.2.6 Comparison of Performance 73
4.2.7 Scope of Semantic Analysis of Words in Wordnet and
TagCluster 75
4.2.8 Evaluation Tweet Summarization 77
5. Conclusion 80
References 83
ABSTRACT 94

최근 본 자료

전체보기

댓글(0)

0