메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

김예진 (전북대학교, 전북대학교 일반대학원)

지도교수
김종익
발행연도
2019
저작권
전북대학교 논문은 저작권에 의해 보호받습니다.

이용수0

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (3)

초록· 키워드

오류제보하기
Finding similar objects is an essential operation, which is widely required in many applications such as data cleaning, near duplicate detection, query relaxation, and spell checking. To quantify similarity between two objects, existing similarity functions are mainly based on the number of common elements between the objects, where an object is represented by a set of comparable elements. However, those similarity functions does not find a similar pair of two objects, when the objects does not share a common elements even though elements from the objects are very similar. To overcome this problem of existing similarity functions, we propose a record similarity function, which consider similarities among elements in the objects. Considering many important data objects are represented in textual form, we assume that an element in an object is a string and the similarity between elements are measured by the edit distance. Naturally, the proposed function can detect a similar pair of objects consisting of similar elements. A drawback of the proposed function is that the complexity of the record similarity is higher than existing similarity functions. Given a database containing many objects, it is computationally prohibited to find objects similar to a given query object by comparing the record similarity between the query and each object in the database. To efficiently find similar objects, we also propose novel filtering and verification techniques. Exploiting a given record similarity threshold, the proposed filtering techniques effectively reduce the number of candidate objects and the proposed verification techniques further reduce the final candidate objects by refining initially generated candidates. Through an experimental study, we show the record similarity function precisely find similar objects compared with the existing similarity function. We also show proposed filtering and verification techniques effectively reduce the number of candidate objects that require the full record similarity computation, and thus, they enable efficient processing of similarity search and join.

목차

1. 서론 1
2. 배경지식 및 관련연구 3
2.1 유사 병합 3
2.2 문자열 유사도 4
2.3 집합 유사도 5
2.4 관련 연구 6
3. 레코드 유사도 9
4. 유사 레코드 병합 12
4.1 역 색인 12
4.2 필터링 14
4.3 검증 19
5. 실험 24
5.1 레코드 유사도 성능 평가 26
5.2 역 색인 성능 평가 28
5.3 필터링 성능 평가 30
5.4 검증 성능 평가 32
5.5 확장성 평가 33
6. 결론 34
참고문헌 35

최근 본 자료

전체보기

댓글(0)

0