메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학술저널
저자정보
Murtazaev, JAziz (Department of Computer Engineering, Ajou University) Kihm, Jang-Su (Department of Computer Engineering, Ajou University) Oh, Sangyoon (Department of Computer Engineering, Ajou University)
저널정보
한국인터넷방송통신학회 International journal of internet, broadcasting and communication : IJIBC International journal of internet, broadcasting and communication : IJIBC 제4권 제1호
발행연도
2012.1
수록면
13 - 17 (5page)

이용수

표지
📌
연구주제
📖
연구배경
🔬
연구방법
🏆
연구결과
AI에게 요청하기
추천
검색

초록· 키워드

오류제보하기
Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

목차

등록된 정보가 없습니다.

참고문헌 (0)

참고문헌 신청

이 논문의 저자 정보

최근 본 자료

전체보기

댓글(0)

0