딥러닝 전이 학습을 통한 전문 도메인의 이미지 해석 생성 방법론 : 전이 학습을 통한 이미지 캡셔닝 고도화 방법 :Generating Image Interpretation of Specialized Domain using Deep Learning-based Transfer Learning

김태진

추천

검색

자료유형: 학위논문

저자정보: 김태진 (국민대학교, 국민대학교 비즈니스IT전문대학원)

지도교수: 김남규

발행연도: 2021

저작권: 국민대학교 논문은 저작권에 의해 보호받습니다.

이용수26

이 논문의 연구 히스토리 (3)

2021

딥러닝 전이 학습을 통한 전문 도메인의 이미지 해석 생성 방법론 : 전이 학습을 통한 이미지 캡셔닝 고도화 방법

김태진 비즈니스IT 2021.01 학위논문

2020

전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론

김태진 , 김남규 한국지능정보시스템학회 학술대회논문집 2020.06 학술대회자료

전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론

김태진 , 김남규 지능정보연구 2020.06 학술저널

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

최근 텍스트 데이터와 이미지 데이터를 활용한 딥러닝 기술의 발전에 힘입어, 두 분야의 융합 분야인 이미지 캡셔닝에 대한 연구가 활발히 이루어지고 있다. 이미지 캡셔닝은 주어진 이미지에 대한 설명을 텍스트로 생성하는 기술이며, 이미지 이해와 텍스트 생성을 동시에 다루고 있다. 다양한 활용 가능성 때문에 인공지능 연구의 핵심 분야로 자리 잡고 있으며, 성능을 향상을 위한 여러 연구가 꾸준히 이루어지고 있다.
하지만 이러한 다양한 노력에도 불구하고, 이미지를 일반인의 관점이 아닌 특정 분야별 전문가의 시각에서 ‘해석’하기 위한 연구는 찾아보기 어렵다. 같은 이미지에 대해서도 그 이미지를 접한 사람의 전문 분야에 따라 집중해서 주목하는 부분이 다를 뿐만 아니라, 전문성의 수준에 따라 이를 표현하고 해석하는 방식도 상이하다. 따라서 본 연구에서는 전문가의 전문성을 모델에 이식하는 방법을 제안하고, 이를 통해 해당 분야에 특화된 이미지의 캡션을 생성하는 방안을 제안한다.
구체적으로 제안 방법론은 대량의 일반 데이터에 대해 학습을 수행해 사전 학습 모델을 구축한 후, 소량의 전문 데이터를 전이 학습해 해당 분야의 전문성을 이식한다. 또한, 본 연구에서는 학습 과정에서 발생할 수 있는 관찰간 간섭 문제를 방지하기 위해 ‘특성 독립 전이 학습’ 방법을 제안한다. 제안 방법론의 검증을 위해 MSCOCO의 이미지 캡셔닝 데이터 셋을 활용하여 사전 학습 모델을 구축하고, 실제 미술 치료사의 자문을 토대로 생성된 ‘이미지-전문 캡션‘ 데이터를 통해 전문성을 이식하는 실험을 수행하였다. 실험 결과 일반적인 관점에서 생성한 일반 캡션은 전문적 해석과 무관한 내용을 포함한 것과는 달리, 제안 방법론을 통해 생성된 전문 캡션은 전문적 해석에 필요한 내용을 모두 포함한 것을 확인하였다.

As deep learning has recently attracted attention, the application of deep learning is being considered as a way to solving problems in various fields. In particular, deep learning is known to exhibit excellent performance when applying unstructured data such as text, images and sounds, and its effectiveness have been proven in many studies. Thanks to the remarkable advance of image and text deep learning technology, interest in image captioning technology and its application is growing rapidly. Image captioning is a technology that automatically generates adequate captions for a given image by simultaneously processing both image understanding and text generation. Despite the high barriers to entry for image captioning, which require researcher should be able to handle both image and text data, their wide applicability has made it one of the key fields of A.I. research. In addition, many researches have been conducted to enhance the performance of image captioning in diverse aspects. Recent studies attempt to create advanced captions that not only can accurately describe the image, but also more sophisticatedly convey the information contained in the image.
In spite of many recent efforts to enhance the performance of image captioning, it is difficult to find any studies to interpret images from the viewpoint of experts in each domain, not from the viewpoint of the general public. Even for the same image, the point of interests may differ depending on the expertise domain of the person recognizing the image. In addition, the way of interpreting and expressing the image also differs depending on the level of expertise. The public tends to perceive the image from a holistic and general point of view, that is, identifying the components of the image and their relationships. Domain experts, on the other hand, tend to recognize the image based on their expertise, focusing on some specific components necessary to interpret the given image. It implies that even the same image has different meaningful parts of the image depending on viewers'' perspective.
Accordingly, image captioning needs to reflect this phenomenon.
Therefore, in this study, we propose a methodology to generate captions specialized in each domain for the image by using the expertise of experts. Specifically, after performing pre-training on a huge amount of general data, we transplant the expertise in the field through transfer-learning with a small amount of specialized data. However, applying transfer learning as-is with expertise data may lead to other type of problem. When a caption contains variety of features and is used for learning, it can cause a so-called ‘interference between observations’ problem, which make it difficult to perform pure learning of each feature perspective. For learning with huge amount of data, most of this problem is self-purified and has little effect on the results.
Conversely, in the case of fine-tuning that performs train using a small amount of data, the effect of such problem on learning can be relatively important. To solve this problem, therefore, we present a novel ‘Feature-Independent Transfer learning’ that performs transfer learning independently for each Feature.
In order to confirm the validity of the proposed methodology, we conducted experiments using the results of pre-training on MSCOCO dataset consisting of 120k images and about 600k general captions. In Addition, experiment was conducted to transplant expertise using the ‘image / expertise captions’ data created based on the advice of an art therapist. As a result of the experiment, it was verified that captions generated according to the proposed method generates captions from the viewpoint of transplanted expertise, whereas the captions generated through general image captioning method contains a number of components irrelevant to expertise interpretation.
In this paper, we propose a novel approach to specialized image interpretation. To achieve this objective, we present how to utilize transfer learning and generate captions specialized in the specific domain. In the future, it is expected that many researches will be widely conducted to solve the problem of lack of expertise data and improve performance of image captioning by applying the methodology to the transplantation of expertise in various domains.

1. 서론 1
2. 관련연구 7
2.1. 딥러닝 연구: 텍스트와 이미지 활용 7
2.2. 이미지 캡셔닝 연구 9
2.3. 전이 학습 연구 11
2.4. 데이터 분석에 기반한 미술 치료 연구 13
3. 제안 방법론 15
3.1. 관찰/해석 지도(O2I Map) 및 전문성 쿼드(E-Quad) 구축 15
3.2. 특성 독립 전이 학습 모델 구축 16
3.3. 전문 해석 캡션 생성 모델 18
4. 실험 및 결과 21
4.1. 실험 개요 21
4.2. 데이터 특징에 따른 캡션 품질 비교 23
4.3. 일반 캡션과 전문 캡션 비교 28
5. 결론 31
참고문헌 33
Abstract 40

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (3)

초록· 키워드

목차

최근 본 자료

댓글(0)