캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘 :Self-optimizing feature selection algorithm for enhancing campaign effectiveness

서정수

추천

검색

자료유형: 학위논문

저자정보: 서정수 (국민대학교, 국민대학교 비즈니스IT전문대학원)

지도교수: 안현철

발행연도: 2021

저작권: 국민대학교 논문은 저작권에 의해 보호받습니다.

이용수7

이 논문의 연구 히스토리 (3)

2021

캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘

서정수 2021.01 학위논문

2020

캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘

서정수 , 안현철 지능정보연구 2020.12 학술저널

캠페인 효과 제고를 위한 자기 최적화(Self-optimizing) 변수 선택 알고리즘

서정수 , 안현철 한국지능정보시스템학회 학술대회논문집 2020.11 학술대회자료

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

최근 온라인의 비약적인 활성화로 캠페인 채널들이 다양하게 확대되면서 과거와는 비교할 수 없을 수준의 다양한 유형들의 캠페인들이 기업에서 수행되고 있다. 하지만, 고객의 입장에서는 캠페인에 중복적인 노출로 인해 피로감이 커지면서 캠페인 자체를 스팸으로 인식하는 경향이 생기고 있고, 기업 입장에서는 캠페인에 투자하는 비용은 오히려 더 늘어났지만 실제 캠페인 성공률은 오히려 더 낮아지고 있는 등 캠페인 자체의 효용성이 낮아지고 있다는 문제점이 있어 실무적으로 캠페인의 효과를 높이고자 하는 다양한 연구들을 필요로 하고 있다. 특히 최근에는 기계학습을 이용하여 캠페인의 반응과 관련된 다양한 예측을 해보려는 시도들이 진행되고 있는데, 이 때 캠페인 데이터의 크기와 다양한 특징들로 인해 적절한 특징을 선별하는 것은 매우 중요해 지고 있다. 전통적인 특징 선택 기법으로 탐욕 알고리즘(Greedy Algorithm) 중 SFS(Sequential Forward Selection), SFFS(Sequen tial Floating Forward Selection), SBS(Sequential Backward Selection), SFBS(Sequential Floating Backward Selection) 등이 많이 사용 되었지만 최적 특징만을 학습 후 모델을 생성하므로 과적합의 위험성이 크고, 특징이 많은 경우 분류 예측 성능이 떨어지고 학습시간이 많이 소요된다는 한계점을 가지고 있다(Lee, Park and Lee, 2017). 이에 본 연구에서는 기존의 캠페인 수행시 효율성을 높이기 위해 더 개선된 방식의 특징 선택 알고리즘을 제안한다. 본 연구의 목적은 캠페인 시스템에서 캠페인별로 대상자를 선정하는 분류를 위해 기계학습을 사용하고 특징 부분 집합을 탐색시 SFFS의 기존 순차적인 방식을 개선하여 효율성을 높이는 것이다. 구체적으로는 먼저 각 특징들의 데이터 변형을 통해 분류 성능에 영향을 많이 끼치는 특징들을 선택하기 위해 우선순위를 확인하여 긍정적인 특징들을 먼저 선택을 하고 선택되지 않은 특징들을 정확도가 더 높아지지 않을 때 까지 순차방식을 적용하여 궁극적으로 전체적인 탐색 성능에 대한 효율을 높이고 분류 성능을 개선하였다. 또한 랜덤포레스트를 분류기를 통해 좀 더 일반화된 예측이 가능하도록 하였다. 실제 캠페인 데이터를 이용해 성능을 검증한 결과 순차 탐색방법의 SFFS 보다 훨씬 더 빠르게 최적의 특징 선택이 가능하였고 분류 성능에 있어서도 더 우수한 결과를 보였다. 전통적인 탐욕알고리즘 외에도 유전자알고리즘(GA, Genetic Algorithm), RFE (Recursive Feature Elimination) 같은 기존 기법들 보다도 제안된 모형이 더 우수한 탐색 성능과 예측 성능을 보임을 확인할 수 있었다. 또한 제안된 특징 선택 알고리즘은 도출된 특징들이 분류 성능에 얼마만큼의 영향을 미치는지의 특징별 중요도를 제공할 수 있어 예측 결과 분석 및 해석에도 도움을 줄 수 있다. 이를 통해 과거 경험기반으로 알고 있던 중요한 특징들에 대한 내용이 실제 캠페인 성공을 위해 영향을 미치는 특징들과 얼마나 다른지에 대한 분석과 이해가 가능할 것으로 기대된다. 이를 통해 현재의 기업에서 수행되는 다양한 유형들의 캠페인 기획시 많은 시간과 비용이 소요되는 부분을 개선할 수 있고 꼭 필요한 고객에게 적절한 캠페인 수행을 할 수 있게 될 것으로 기대한다. 기업에서의 다양한 캠페인들은 목적에 따라 캠페인을 수행하는 대상자들이 모두 다르기 때문에 다양한 캠페인 유형에 적합한 대상자 선정을 통해 무분별하게 수행되어지는 캠페인들을 성공 가능성이 높은 고객들에게만 실행 할 수 있도록 활용이 가능하게 되어 궁극적으로 캠페인 효과 제고를 위해 중요한 의의를 가지고 있다.

Recently, as campaign channels have been diversified due to the rapid activation of online, various types of campaigns are being carried out in companies that are incomparable to the past. However, from the customer''s point of view, the campaign itself is perceived as spam as fatigue increases due to repeated exposure to the campaign, and the company''s investment in the campaign has increased, but the actual campaign success rate is rather lower. There is a problem that the effectiveness of the campaign itself is decreasing, so various studies are needed to increase the effectiveness of the campaign in practice. In particular, recently, attempts have been made to make various predictions related to campaign response using machine learning. At this time, it is becoming very important to select appropriate characteristics due to the size and various characteristics of campaign data. Among the greedy algorithms, SFS (Sequential Forward Selection), SFFS (Sequential Floating Forward Selection), SBS (Sequential Backward Selection), SFBS (Sequential Floating Backward Selection), etc. have been widely used as traditional feature selection techniques. Since the model is created after learning, there is a high risk of overfitting, and when there are many features, classification prediction performance decreases and training time is required (Lee, Park and Lee, 2017). Therefore, in this study, we propose a more improved feature selection algorithm to increase the efficiency of the existing campaign. The purpose of this study is to use machine learning for classification that selects subjects for each campaign in the campaign system and to improve the efficiency of the existing sequential method of SFFS when searching for feature subsets. Specifically, first, positive features are selected first by checking the priority in order to select features that have a great influence on classification performance through data transformation of each feature, and unselected features are sequentially selected until the accuracy is not higher. By applying the method, ultimately, the efficiency of the overall search performance was increased and the classification performance was improved. In addition, the random forest was used to make more generalized predictions through the classifier. As a result of verifying the performance using actual campaign data, it was possible to select the optimal feature much faster than the SFFS of the sequential search method, and showed better results in classification performance. In addition to the traditional greedy algorithm, it was confirmed that the proposed model showed better search and prediction performance than existing methods such as Genetic Algorithm (GA) and Recursive Feature Elimination (RFE). In addition, the proposed feature selection algorithm can provide the importance of each feature of how much influence the derived features have on classification performance, which can help in the analysis and interpretation of prediction results. Through this, it is expected that it will be possible to analyze and understand how different the contents of the important characteristics known based on past experience are different from the characteristics that influence the actual campaign success. Through this, it is expected that it will be possible to improve the part that takes a lot of time and money when planning various types of campaigns carried out in current companies, and it is expected that it will be possible to carry out appropriate campaigns to the customers who need it. Since the target audience for various campaigns in a company is different depending on the purpose, it is possible to use the campaigns that are carried out indiscriminately through selection of targets suitable for various types of campaigns so that only customers with high probability of success can be used. Therefore, it has important significance for enhancing the effectiveness of the campaign.

Ⅰ. 서론 1
1.1 연구의 배경 1
1.2 기업에서의 캠페인 데이터 4
Ⅱ. 이론적 배경 6
2.1 기계학습 6
2.2 특징선택 7
2.3 랜덤포레스트 14
Ⅲ. SOFS 특징선택 알고리즘 16
3.1 탐색전략 17
3.2 SOFS 중요특징 선택 알고리즘 21
3.3 평가기법 22
3.4 정지기준 23
Ⅳ. SOFS 기반 캠페인 타겟팅 시스템 25
4.1 SOFS 상세 알고리즘 25
Ⅴ. 분석 및 결과 31
5.1 실험설정 및 환경 31
5.2 실험 설계 34
5.3 실험 결과 37
Ⅵ. 결론 45

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (3)

초록· 키워드

목차

최근 본 자료

댓글(0)