메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

지경엽 (충남대학교, 충남대학교 대학원)

지도교수
권영미
발행연도
2023
저작권
충남대학교 논문은 저작권에 의해 보호받습니다.

이용수13

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (3)

초록· 키워드

오류제보하기
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a media server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes amid various machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods (Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor (KNN), and Decision Tree) in terms of execution time and accuracy. In addition, to shorten the execution time by configuring the Hadoop framework using PC-based commodity hardware, among more than 200 Hadoop parameters, 21 major Hadoop configuration parameters affecting the execution time were selected and optimized through several cases of experiments. The execution time was shortened by setting the optimized values to the selected 21 parameters. The special thing is that in a distributed environment, increasing the block size for storing and managing data does not necessarily shorten the execution time and increasing the number of replicated data set does not increase proportionally the execution time. The types of main Hadoop configuration parameters include setting block size, adjusting the number of server threads, adjusting the number of blocks to be replicated, setting the number of MapReduce tasks, setting the memory size and heap size required for setting up MapReduce, adjusting the number of virtual cores allocated to the container, compressing the results of all MapReduce jobs running on the cluster, etc. The set value of the Hadoop configuration parameter was optimized to shorten the execution time through the experimental technique, based on the Hadoop framework, malicious mail prediction was made using the MapReduce programming technique using Naïve Bayes technology, a supervised machine learning technique and a bare metal server environment that does not apply the Hadoop framework, a prediction method for judging spam was also implemented with a Python program method with Naïve Bayes technology. Laplace smoothing was applied for stable calculation when calculating the prediction probability, and Laplace smoothing was applied to the Hadoop MapReduce Naïve Bayes method and the Bare metal Python Naïve Bayes method to compare the prediction error rate and accuracy between the two methods. To apply Laplace smoothing, Hadoop MapReduce Naïve Bayes method applied numerical value (1), Bare metal Python Naïve Bayes method applied eleven types of numerical values, among them, the result of applying 0.0001, which had the best duration time and good prediction error rate and accuracy, was compared with the result of Hadoop MapReduce Naïve Bayes method. As a result of comparing the accuracy and predictive error rate of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of identifying spam and ham by 1.04 times and the prediction error rate by 6.04 times compared to the bare metal Python Naïve Bayes method. In conclusion, the Hadoop MapReduce Naïve Bayes machine learning method presented in this paper was able to shorten the execution time through the appropriate selection of Hadoop configuration parameters and optimized settings, and improved the system stability through easy spanning-out and distributed data set replication. Therefore, I think that the proposed system architecture and algorithm of this paper improved the accuracy of malicious mail prediction and reduced the prediction error rate.

목차

제1장 서론 1
1.1 연구 배경 1
1.2 연구 방향 2
제2장 관련 기술 및 이전 연구 5
2.1 Hadoop 개요 및 구조 5
2.2 Hadoop 실행시간 개선 방법 6
2.3 Hadoop 2.0 주요 구성 요소 8
2.4 기계학습 구성요소 및 나이브베이즈 이론 개요 18
2.5 Hadoop 성능 개선 및 기계학습 기반의 스팸 필터링 관련 연구 자료 분석 21
제3장 Hadoop MapReduce와 나이브베이즈의 선택 배경과 성능 및 필터 링 정확도 향상을 위한 기술적 구현 방법 29
3.1 Hadoop MapReduce 및 나이브베이즈 선택 배경 29
3.2 Hadoop의 실행속도개선을 위한 configuration 파라미터 최적화 30
3.3 악성메일 예측 정확도를 높이기 위한 기계학습기법들의 분석 및 비 Hadoop 환경의 예측 정확도와 비교 42
제4장 Hadoop configuration 파라미터 최적화를 통한 실행시간 개선과 MapReduce 나이브베이즈 기법을 통한 악성메일 판단예측 시스템 구현 실험 46
4.1 본 논문의 시스템 환경 및 아키텍처 46
4.2 테스트 유형별 Hadoop configuration 파라미터 설정 내역 49
4.3 Hadoop 파라미터 개선을 통한 악성메일판단 시스템 성능(실행시간) 개선 결과 62
4.4 악성메일 판단시스템의 예측오류율 및 정확도 결과 74
4.5 실험 결과에 대한 분석과 의미 107
제5장 결론 110
[참고문헌] 112
[부록: 약어표] 120
ABSTRACT 122

최근 본 자료

전체보기

댓글(0)

0