머신러닝을 활용한 물류기업 신용평가 예측모형 연구 :(A) Study on the credit evaluation prediction model for logistics companies, using machine learning

권승면

추천

검색

자료유형: 학위논문

저자정보: 권승면 (중앙대학교, 중앙대학교 대학원)

지도교수: 우수한

발행연도: 2023

저작권: 중앙대학교 논문은 저작권에 의해 보호받습니다.

이용수29

이 논문의 연구 히스토리 (3)

2023

머신러닝과 딥러닝을 활용한 물류기업 신용등급 예측 모형 연구

권승면 한국관세학회 학술대회 2023.05 학술대회자료

머신러닝과 인공신경망을 활용한 수출제조기업 신용등급 예측연구

권승면 , 우수한 무역상무연구 2023.05 학술저널

머신러닝을 활용한 물류기업 신용평가 예측모형 연구

권승면 무역물류학과 2023.01 학위논문

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

1997년 외환위기 이후 미국 신용평가기관의 한국 정부와 금융기관에 대한 신용등급 하향 조정은 우리나라가 수출, 투자, 교류, 협력, 금융거래 등 국제사회와의 거래를 어렵게 요인이 되었으며 이를 회복하기까지 상당한 시간과 노력이 동반되었다. 그 중 대표적인 게 신용평가 산업의 발전이며 이는 객관성과 공정성을 강화하면서 금융산업을 든든하게 받쳐주며 한국 경제의 신뢰를 회복하고 안정성과 건전성을 높이는데 크게 기여하였다.
2007년 서브프라임 위기 이후 신용평가에 대한 신뢰도를 높이고 더욱 정교한 평가기법이 도입되어야 한다는 주장은 현재까지 이어지고 있다. 기업의 부실을 보다 객관적으로 예측하고 위험을 관리해야 한다는 공감대를 이루고 있으며 이를 위해 전문신용평가기관 입장에서도 각 산업이 가지고 있는 고유의 특성에 대한 면밀한 분석과 이를 평가에 반영하기 위한 노력을 추가적으로 기울어야 한다.
최근 미중 무역분쟁과 GVC 재편, 공급망 다변화, 소비 침체 등 국제 정세가 혼란한 상황에서 기업들의 실적 악화가 예상되는 가운데 산업과 기업의 철저한 신용관리가 요구되고 있다. 특히 물류산업은 모든 서비스 업종 가운데 가장 높은 국제수지를 창출할 만큼 중요한 산업이며 특히 코로나19 이후 산업의 중요성이 더욱 강조되면서 국가 경쟁력 뿐만 아니라 일상생활과도 밀접한 관련이 있기 때문에 철저한 신용관리가 요구된다. 또한 물류산업은 모든 산업 중 자기자본비율이 가장 낮으며 부채비율이 가장 높은 특징을 가지고 있다. 자기자본이 아닌 타인자본, 즉 금융자본으로 산업이 움직이기 때문에 금융 리스크가 매우 큰 산업이며 자칫 채무불이행의 위험을 보이는 경우 금융시장의 혼란 뿐만 아니라 연관산업의 연쇄 위기로도 이어질 수 있어 사전 예측이 무엇보다 중요하다 할 수 있다.
따라서 본 연구는 머신러닝을 활용해 물류기업만을 위한 신용평가 예측모형을 개발하고 물류산업 특수분류에 따라 3개 업종으로 나눠 예측성능이 우수한 모형의 변수 중요도를 살펴보기로 한다.
물류기업 1,268개사의 2011년부터 2020년까지 10개년 재무지표 66개를 확보하였으며 종속변수로는 신용평점을 활용하였다. 2등급, 3등급, 10등급으로 평가등급을 나눠 각 등급별 최적의 변수를 선정하였으며 예측모형에 적합하여 성능을 비교하였다. 사용한 예측모형은 로지스틱 회귀분석, 의사결정나무, 랜덤 포레스트, 그리디언트 부스트, XG 부스트, 앙상블, 인공신경망 모형이 있다. 추가적으로 1,268개사를 통계청의 물류산업 특수분류에 따라 화물운송업, 물류시설 운영업, 물류 관련 서비스업으로 나눠 각 분류에 따른 등급별 예측 모형 성능분석과 변수 중요도를 살펴보았다.
연구 결과 첫째, 최종변수 선정을 위해 전체변수, RFECV, SeleckKBest, PCA, LDA 등 총 5가지 방식을 사용하여 성능을 비교하였다. 랜덤 포레스트를 학습모형으로 사용한 결과 2등급에서는 RFECV를 활용한 46개 변수, 3등급은 REFCV를 활용한 51개 변수, 10등급은 SelectKBest를 활용한 49개 변수가 선정되었다. 이는 기존 통계적 방식 이외에 머신러닝을 활용한 변수 선정방식도 우수한 성능을 보여 활용이 가능함을 확인하였다.
둘째, 2등급 예측모형에서는 XG 부스트가 80.3%로 가장 성능이 좋았다. XG 부스트는 로지스틱 회귀분석 보다 8.7% 높은 성능을 보였으며 ANN보다도 6% 높은 성능을 보였다. 3등급도 마찬가지로 XG 부스트가 72.9%로 가장 높게 나타났다. 10등급 분석 결과 34.4%로 앙상블(Soft)가 가장 높게 나타났다. 다만 등급이 많이 질수록 성능이 급격히 저하됨을 확인할 수 있었으며 이 경우 사용에 신중한 접근이 필요하다.
셋째, 통계청 물류산업 특수분류에 따라 1,268개사를 화물운송업, 물류시설 운영업, 물류 관련 서비스업으로 나눠 분석하였다. 변수 중요도를 확인하기 위해 나무기반 모형에 넣어 분석한 결과, 화물운송업에서는 안정성 지표 4개, 수익성 지표 1개가 중요변수로 선정되었고 물류시설 운영업은 안정성 지표 3개, 수익성 지표 2개, 물류 관련 서비스업은 안정성 지표 2개, 수익성 지표 3개가 선정되었다. 이를 통해 각 업종이 가지고 있는 비즈니스적 특성을 반영한 정교한 평가 지표 선정의 필요성을 살펴보았다.
본 연구는 머신러닝 기법을 활용하여 최종변수를 선정했다는 점, 통상적으로 우수하다고 알려진 인공신경망 기법보다 머신러닝 기법이 물류기업의 고유한 재무적 특성과 현재 데이터에는 적합하다는 점, 물류산업을 업종별로 분류하여 신규지표 개발을 위한 중요변수를 확인했다는 점에서 중요한 의의가 있다. 물류산업에서 이뤄지지 않았던 재무지표를 활용한 신용평가 예측 연구라는 점에서 본 연구의 학문적인 기여로 제시할 수 있다.

Since the 1997 financial crisis, the U.S. credit rating agency''s downgrading of the Korean government and financial institutions has made it difficult for Korea to trade with the international community, including exports, investment, exchange, cooperation, and financial transactions. Korea has spent a lot of time and effort to recover from this. One of them is the development of the credit rating industry. The development of the credit rating industry has contributed greatly to restoring the trust of the Korean economy and enhancing stability and soundness by strengthening objectivity and fairness.
Since the subprime crisis in 2007, there have been calls for increasing credibility in credit ratings and introducing more sophisticated evaluation techniques. There is a consensus that corporate insolvency should be more objectively predicted and risk management, and for this, professional credit rating agencies should make additional efforts to analyze and reflect the unique characteristics of each industry in their evaluations.
Companies'' performance is expected to deteriorate at a time when the international situation is chaotic recently, such as the U.S.-China trade dispute, GVC reorganization, supply chain diversification, and consumption slump. Therefore, thorough credit management of each industry and company is required. In particular, thorough credit management of the logistics industry is required. This is because the logistics industry is an important industry that creates the highest international balance of payments among all service industries, and the importance of the industry is emphasized more after COVID-19, which is closely related to daily life as well as national competitiveness. In addition, the logistics industry has the lowest equity capital ratio and the highest debt ratio among all industries. Financial risks are very high because the industry moves with other capital, that is, financial capital, not equity capital, and if there is a risk of default, it can lead to not only confusion in the financial market but also a series of crises in related industries.
Therefore, this study will develop a credit rating prediction model only for logistics companies using machine learning and examine the importance of variables in models with excellent predictive performance by dividing them into three industries according to the special classification of the logistics industry.
A study was conducted to analyze 1,268 logistics companies. 66 financial indicators from 2011 to 2020 were secured, and credit ratings were used as a dependent variable. Evaluation grades were divided into 2nd, 3rd, and 10th grades, and the optimal variables for each grade were selected, and performance was compared according to the prediction model. The predictive models used include logistic regression, decision tree, random forest, gradient boost, XG boost, ensemble, and artificial neural network model. In addition, 1,268 companies were divided into cargo transportation, logistics facility operation, and logistics-related service industries according to the special classification of the logistics industry by the National Statistical Office. Each industry was put into a predictive model to analyze the performance, and the performance was analyzed by 2nd, 3rd, and 10th grades, and the importance of variables in the model that showed the best performance was examined.
The results of the research analysis are as follows.
First, the performance was compared using a total of five methods, including all variables, RFECV, SelectKBest, PCA and LDA, to select the final variable. As a result of using Random Forest as a learning model, 46 variables using RFECV were selected in the second grade, 51 variables using REFCV were selected in the third grade, and 49 variables using SelectKBest were selected in the 10th grade. Through this, In addition to the existing statistical method, it was confirmed that the variable selection method using machine learning showed excellent performance and could be used.
Second, in the second-class prediction model, the XG boost performed the best at 80.3%. XG boost showed 8.7% higher performance than logistic regression analysis and 6% higher performance than ANN. In the third grade, the XG boost was the highest at 72.9%. In the 10th grade analysis, the ensemble (Soft) was the highest at 34.4%. However, it was confirmed that the performance decreased rapidly as the grade was subdivided, and in this case, a careful approach is required.
Third, 1,268 companies were analyzed by dividing them into cargo transportation, logistics facility operation, and logistics-related service industries according to the special classification of the logistics industry of the National Statistical Office. As a result of analyzing by putting it in a tree-based model to confirm the importance of variables, four stability indicators and one profitability indicator were selected as important variables in the cargo transportation industry. Three stability indicators and two profitability indicators were selected for the logistics facility operation business. For the logistics-related service industry, two stability indicators and three profitability indicators were selected. Through this, the necessity of selecting sophisticated evaluation indicators reflecting the business characteristics of each industry was confirmed.
It is also important in that machine learning techniques are more suitable for the unique financial characteristics of logistics companies and current data than artificial neural network techniques, which are commonly known to be excellent. Finally, it is also important in that the logistics industry was classified into three detailed industries to identify important variables for the development of new indicators. It can be presented as an academic contribution to this study in that it is a credit rating prediction study using financial indicators that were not conducted in the logistics industry.

#물류기업 #신용등급 예측 #신용평가 #머신러닝 #인공신경망 #변수 중요도 #logistics company #credit evaluation prediction #credit rating #machine learning #artificial neural network #variable importance

제1장 서 론 1
제1절 연구 배경과 목적 1
제2절 연구 구성과 방법 5
제2장 신용평가에 대한 일반적 고찰 8
제1절 신용평가 개요 8
1. 신용평가와 신용평가제도의 의의 8
2. 신용평가 특성과 기능 11
제2절 신용평가 요소와 분류 12
1. 신용평가 요소 12
2. 신용평가 분류 13
제3절 신용평가 예측의 필요성 19
1. 투자자 19
2. 금융기관 21
3. 채무기업 22
4. 금융 감독기관 23
제4절 물류기업 신용평가 예측의 필요성 24
1. 물류산업의 중요성 24
2. 물류산업의 재무적 특성 27
제3장 문헌연구 검토 30
제1절 부도 기업 예측 연구 30
1. 전통적 부도 기업 예측 연구 30
2. 머신러닝 분석 예측 연구 33
제2절 신용평가 예측 연구 35
1. 전통적 신용평가 예측 연구 36
2. 머신러닝 분석 예측 연구 38
제3절 물류 분야 예측 연구 41
제4절 연구 방향 및 시사점 43
제4장 연구 모형 설계 및 분석모형 45
제1절 연구 모형 설계 45
1. 개요 45
2. 연구 절차 46
제2절 분석모형 48
1. 로지스틱 회귀(logistic regression) 48
2. 의사결정나무(decision tree) 49
3. 랜덤 포레스트(random forest) 51
4. 그리디언트 부스트(gradient boost) 52
5. XG 부스트(XGBoost) 53
6. 앙상블(ensemble) 55
7. 인공신경망(ANN) 56
제3절 분석모형의 성능 평가 57
1. 혼동행렬(confusion matrix) 57
2. ROC 곡선 59
제5장 자료수집 및 변수 선정 61
제1절 데이터 정의 61
1. 대상기업 정의 61
2. 재무 데이터 정의 61
제2절 데이터 수집 및 전처리 63
1. 데이터 수집 63
2. 데이터 전처리 및 분할 65
3. 데이터 불균형 해결 67
제3절 변수 선정 70
제6장 연구분석 결과 77
제1절 등급별 최종변수 선택 결과 77
1. 변수 선정법에 따른 성능 비교 77
2. 등급별 최종변수 선정 92
제2절 신용평가 예측모형 분석 결과 98
1. 로지스틱 회귀(logistic regression) 98
2. 의사결정나무(decision tree) 101
3. 랜덤 포레스트(random forest) 106
4. 그리디언트 부스트(gradient boost) 111
5. XG 부스트(XGBoost) 116
6. 앙상블(ensemble) 121
7. 인공신경망(ANN) 124
제3절 예측모형 성능 비교 128
제7장 물류산업 특수분류별 연구분석 결과 131
제1절 물류산업 특수분류 개요 131
1. 물류산업 특수분류 의의와 목적 131
2. 한국표준산업분류(KSIC)과 연계성 131
제2절 물류산업 특수분류별 예측모형 분석 134
1. 화물운송업 분석 결과 135
2. 물류시설 운영업 분석 결과 139
3. 물류 관련 서비스업 분석 결과 144
4. 최종 분석 결과 148
제8장 결 론 154
제1절 연구결과 및 시사점 154
제2절 연구의 한계 160
참고문헌 162
국문초록 170
Abstract 173

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (3)

초록· 키워드

목차

최근 본 자료

댓글(0)