메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

최현화 (충남대학교, 忠南大學校 大學院)

지도교수
李圭哲
발행연도
2013
저작권
충남대학교 논문은 저작권에 의해 보호받습니다.

이용수1

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (3)

초록· 키워드

오류제보하기
We are currently witnessing a rapid growth of image data, triggered by the popularity of the Internet and the huge amount of user-generated content from Web 2.0 applications. Given such image collections, performing similarity search to find objects most similar to a given object is a classical problem with many applications. In principle, there are two basic types of similarity query: an ε-range query and a k-nearest neighbors (k-NN) query. Given a set of data points represented in a high-dimensional space and the distance measurement between them, an ε-range query is used to find all objects with a distance smaller or equal to a given range ε from a query point, whereas a k-NN query is used to return the k-most similar neighbors of the query point.
To support a similarity search in very large collections of high-dimensional data, a number of distributed index structures for high-dimensional data spaces have been proposed. Most of the approaches recently published focus mainly on supporting range queries, or operating in decentralized systems. Locality Sensitive Hashing (LSH) Forest and LSH At Large leverage the LSH to provide approximate results to a k-NN search. LSH has received considerable attention recently, because it was shown that its runtime is independent of the dimension of feature vectors for querying. However, LSH, which probabilistically assigns similar data to the same bucket in a hash table, is not suitable for skewed data and can be highly inaccurate in a similarity search. SkipIndex is based on a K-D-tree in partitioning the data space, and maps the data space into a skip graph overlay network by encoding the space into a unique key. The DiST(Distributed Search Tree) and VBI-tree(Virtual Binary Index-tree) are also tree-based approaches, which do not scale well when data dimensions are high.
Several approaches to parallelize an NN search on the centralized systems have been developed. The approaches like the Master R-tree and Master-Client R-trees, GPR-tree (Global Parallel R-tree), and distributed hybrid spill-tree have tried to parallelize the tree-based structure. Parallel VA-files(Vector Approximation-files) and parallel CBF(Cell-Based Filtering) proposed a parallel NN-search based on the VA-file to achieve a linear increase of search speed as the number of servers grows. However, the query response times of these solutions are not satisfactory for a search engine on the World-Wide Web.
In order to provide a similarity search on massive high-dimensional data in cloud computing services or web search services, we propose a high-dimensional indexing structure, called the Distributed Vector Approximation-tree (DVA-tree), which operates in a cluster environment. We have designed a centralized indexing structure to solve the search performance and accuracy problem of decentralized indexing methods. The DVA-tree partitions a feature vector space in a hierarchical tree manner and maps leaf nodes of the tree into separate servers in a cluster environment. Then, we build an index and retrieve neighbors by using the VA-file, as the local index on separate servers. In conclusion, we propose a mechanism that efficiently distributes VA-files in order to increase k-NN search performance, as well as to support index scalability for large-scale data.
In this thesis, we make the following contributions: (i) We present a new distributed index scheme, called DVA-tree, that combines a tree-based index structure with a scan method using the VA-files; (ii) we present an approximate k algorithm to process the distributed k-NN search and propose cost formulas for predicting the response time of the k-NN search; (iii) For ununiform datasets, we enhance the k-NN search performance of DVA-tree by VA-files using independent encoding of vector data; (iv) finally, we experiment and evaluate the efficiency and effectiveness of our approach using the skewed data and uniformly distributed data.
The experiment results of the DVA-tree show that our proposed method, the DVA-tree, has significant performance advantage over existing index structures on ununiform data. We believe that the DVA-tree is the most suitable candidate to support high search performance, as well as index scalability for large-scale datasets.

목차

1. 서 론 1
1.1. 연구 배경과 목적 1
1.2. 연구 내용과 범위 5
1.3. 논문 구성 7
2. 관련 연구 8
2.1. 단일 컴퓨팅 서버 기반 고차원 색인 방법 8
2.2. 분산 고차원 색인의 요구사항 12
2.3. 분산형 분산 고차원 색인 방법 14
2.3.1. 해쉬 기반 접근 14
2.3.2. 분할 기반 접근 16
2.4. 중앙 집중형 분산 고차원 색인 방법 20
2.4.1. 필터 기반 접근 20
2.4.2. 분할 기반 접근 21
2.5. 기존 연구의 문제점 25
3. 사전 지식 27
3.1. Hybrid spill-tree 27
3.2. VA-file 31
4. 분산 벡터 근사 트리 34
4.1. 분산 벡터 근사 트리 구축 34
4.2. K-최근접점 검색 41
4.3. K-최근접점 검색의 비용 모델 45
5. 비균일 데이터에 대한 검색 성능 강화 49
5.1. 연구 동기 49
5.2. 분산 벡터 근사 트리의 문제점 51
5.3. 분산 벡터 근사 트리 성능 개선 방법 53
6. 성능 실험 58
6.1. 기존 분산 고차원 색인 방법 비교 실험 61
6.2. 고확장성 실험 66
6.3. 독립적인 VA-file 구축에 따른 성능 비교 실험 69
7. 결론 및 향후 연구 72
참고문헌 75
ABSTRACT 82

최근 본 자료

전체보기

댓글(0)

0