This study conducted an experiment using data mining techniques and developed an insolvency prediction model for SMEs using only technological feasibility assessment information from the Korea SMEs and Startups Agency (KOSME). In general, financial statement data is used to predict corporate insolvency. As a financial statement, it is used as a means of reporting the use of assets operated by management, and only quantifies and shows the company''s past accounting information, but does not show future performance. In addition, the financial statements of start-up SMEs are more difficult to collect than listed companies and are very limited due to the lack of information available in the financial statements. Since financial statements have such shortcomings, in this study, non-financial information and technical feasibility evaluation information were used to predict corporate insolvency. In Experiment I, Companies were divided into three years based on the date of establishment. At this time, synthetic minority over-sampling technique (SMOTE) was used to solve the data imbalance between healthy and insolvent companies. A prediction models were created using two datasets and six algorithms of logistic regression, artificial neural networks, decision trees, and an ensemble of each algorithm. The highest prediction rate was a single decision tree model, with 68.1% for companies with fewer than three years and 80.6% for companies with more than three years. Likewise, among the ensemble algorithms, the decision tree models achieved the highest prediction rate of 69.1% and 82.7%, respectively. Based on the results of experiment I, the main evaluation indicators of insolvency predictions of companies fewer than three years were financing ability, the CEO’s reliability, and future profitability. And, as the main evaluation indicators of insolvency predictions of companies more than three years were credit status, financing ability, and competitive strength. In addition, insolvency rules were derived for SMEs from the decision tree-based prediction model and proposed ways to enhance the health of loans given to potentially insolvent companies using these derived rules. In Experiment Ⅱ, a harmonic average of support and confidence method (HSC), which is a new way to select important rules from the many rules in the decision tree and thereby build a core rule-based decision tree (CorDT) that more easily explains the insolvency factors related to SMEs was proposed. To this end, an insolvency prediction model for SMEs was developed using a decision tree algorithm and technological feasibility assessment data as non-financial datasets. Datasets divided into three types?general type, technology development type and toll processing type?applying the characteristics of manufacturing SMEs. For data balancing, six sampling techniques such as a Random Under-Sampling, SpreadSubsample, ClusterCentroids, Random Over-Sampling, SMOTE, and Adaptive Synthetic Sampling (ADASYN) were applied. As a result, the insolvency prediction model using the SMOTE, which is an oversampling technique, showed the highest performance with an average prediction rate of 77.6%. Next, important rules were selected by applying HSC to the decision trees with the highest performance and built CorDTs for three types of SMEs using the selected rules. Finally, CorDTs explained the causes of insolvency by type of SME and presented insolvency prevention strategies customized to the three types of SMEs. The results of this study show that it is possible to predict SMEs’ insolvency using data mining techniques with technological feasibility assessment information and find meaningful rules related to insolvency. And, since the original decision trees generally consisted of many rules, proposed the HSC method as a new way to select important rules from decision trees. Built a CorDT consisting of only the important rules using the HSC, more easily explained the key factors affecting the SMEs insolvency by technology type, and thereby suggested insolvency prevention strategies more efficiently.
Ⅰ. 서론 11. 연구 배경 및 목적 11) 연구 배경 12) 연구 목적 42. 연구 방법 및 구성 51) 연구 방법 52) 연구 구성 6Ⅱ. 이론적 배경 및 선행연구 91. 중소기업의 부실 및 예측 91) 부실 정의 92) 부실 예측 122. 재무정보 및 비재무정보 181) 재무정보 182) 비재무정보 19(1) 기술성 21(2) 사업성 23(3) 경영능력 243. 부실 예측 기법 251) 통계적 기법 25(1) 로지스틱 회귀 252) 기계학습 기법 28(1) 의사결정나무 28(2) 인공신경망 31(3) 앙상블 344. 데이터 균형화 기법 381) 필요성 382) 언더샘플링 39(1) RUS 40(2) SpreadSubsample 41(3) ClusterCentroids 423) 오버샘플링 43(1) ROS 43(2) SMOTE 45(3) ADASYN 46Ⅲ. 부실 예측 실험 설계 491. 업력으로 구분한 부실 예측 실험Ⅰ 491) 데이터 수집 및 전처리 492) 목표변수 및 독립변수 50(1) 목표변수 50(2) 독립변수 513) 변수 선정 534) 연구 설계 56(1) 유형의 분류 56(2) 실험 설계 562. 기술유형으로 구분한 부실 예측 실험Ⅱ 581) 데이터 수집 및 전처리 582) 목표변수 및 독립변수 59(1) 목표변수 59(2) 독립변수 603) 변수 선정 614) 연구 설계 61(1) 유형의 분류 61(2) 실험 설계 62Ⅳ. CorDT 개발 641. HSC 개발 및 검증 641) HSC 개발 642) HSC 검증 682. CorDT 도출 72Ⅴ. 부실 예측 실험 결과 및 전략 도출 781. 업력에 의한 부실 예측 실험Ⅰ 781) 실험 결과 782) 부실 규칙과 전략 81(1) 업력 3년 미만 기업 81(2) 업력 3년 이상 기업 842. 기술유형에 의한 부실 예측 실험Ⅱ 881) 실험 결과 882) CorDT를 이용한 부실 규칙과 전략 92(1) 일반형 기업 92(2) 기술개발형 기업 94(3) 임가공형 기업 97Ⅵ. 결론 991. 연구 결과 992. 연구 의의와 시사점 1023. 연구 한계와 연구 방향 104참고문헌 106