머신러닝을 활용한 자국 브랜드의 중고차 가격 예측 모델에 관한 연구 - 국내 중고차 시장을 중심으로 - :A Study on the Prediction Model of Used Car Price of Domestic Brand Using Machine Learning

임승준

추천

검색

자료유형: 학위논문

저자정보: 임승준 (홍익대학교, 홍익대학교 대학원)

지도교수: 류춘호

발행연도: 2023

저작권: 홍익대학교 논문은 저작권에 의해 보호받습니다.

이용수38

이 논문의 연구 히스토리 (5)

2024

SHAP Value를 활용한 국내 브랜드 중고차의 가격 예측 모델에 관한 연구: 차급 특성을 중심으로

임승준 , 이정호 , 류춘호 한국경영과학회지 2024.05 학술저널

앙상블 모델과 SHAP Value를 활용한 국내 중고차 가격 예측 모델에 관한 연구: 차종 특성을 중심으로

임승준 , 이정호 , 류춘호 서비스 연구 2024.03 학술저널

2023

머신러닝을 활용한 브랜드별 국내 중고차 가격 예측 모델에 관한 연구

임승준 , 이정호 , 류춘호 서비스 연구 2023.09 학술저널

머신러닝을 활용한 중고 자동차의 가격 예측 모델에 관한 연구: 국내 브랜드를 중심으로

임승준 , 이정호 , 류춘호 한국경영과학회지 2023.08 학술저널

머신러닝을 활용한 자국 브랜드의 중고차 가격 예측 모델에 관한 연구 - 국내 중고차 시장을 중심으로 -

임승준 경영학과 2023.01 학위논문

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

2022년 현재 국내 중고차 시장은 지속적으로 성장하고 있으며, 중고차 시장에서 온라인 중고차 플랫폼의 점유율은 50%에 가까워지고 있다. 온라인 중고차 플랫폼은 차량 제원에 그치지 않고 차량의 세부 옵션까지 공개함으로써 소비자들이 해당 차량의 정보를 손쉽게 확인할 수 있도록 하였다.
중고차 가격 예측의 기존 연구는 차량 제원을 활용한 연구가 대부분이었으며, 차량의 주행거리 및 사용기간과 중고차 가격 간 관계가 비선형의 모습으로 나타나는 경향이 존재하였다. 최근의 연구는 비선형관계를 해결하기 위해 다양한 머신러닝 모델을 활용하여 머신러닝 모델별 비용함수(Cost Function)를 비교하는 경우가 많았으며, 대부분의 연구 결과에서 분류형 머신러닝 모델인 랜덤 포레스트(Random Forest; RF) 모델의 우수성을 증명한 경우가 많았다. 변수와 결과값 간 비선형관계에서 분류형 머신러닝 모델은 결과값의 예측 오차율(MAPE)이 회귀형 머신러닝 모델에 비해 상대적으로 우수하나, 분류형 머신러닝 모델 간 변수의 영향력이 다르게 나타남과 동시에 영향력의 방향성을 알 수 없다는 단점이 존재했다. 반면 회귀형 머신러닝 모델은 변수의 영향력과 방향성 확인에 적합하나 변수와 결과값 간 비선형관계일 경우 결과값의 예측 오차율이 분류형 머신러닝 모델에 비해 상대적으로 떨어지는 단점이 존재했다. 이에 따라 본 연구는 회귀형과 분류형 머신러닝 모델을 차례대로 활용하여, 두 유형의 머신러닝 모델의 장점을 취합하고자 하였다.
본 연구는 중고차 온라인 플랫폼에서 크롤링(Crawling)과 스크래핑(Scraping)을 통해 차량 제원과 차량 옵션 자료를 수집하였고, 차량 제원의 일부 변수는 차량 제조사의 공식 카탈로그(Catalogue)를 활용하였다. 확보된 자료를 통해 라쏘(Lasso) 회귀형 머신러닝 모델을 활용하여 중고차 가격에 영향을 미치는 변수들의 영향력과 방향성을 확인하고, 영향력이 0인 변수들을 도출하였다. 다음으로 전체 변수를 활용한 분류형 머신러닝 모델과 영향력이 0인 변수를 제거한 분류형 머신러닝 모델 간 비용함수 수치를 비교하였다. 연구 결과 회귀형 머신러닝 모델을 통해 개별 브랜드와 전체자료에 대해 중고차 가격에 영향을 미치는 차량 제원 변수와 차량 옵션 변수를 확인하였다. 또한 전체 변수를 활용한 분류형 머신러닝 모델과 영향력이 0인 변수를 제거한 분류형 머신러닝 모델 간 비교를 통해 비용함수 수치의 큰 차이가 존재하지 않았다는 것을 확인하였다.
본 연구의 시사점은 다음과 같다. 첫째, 두 유형의 머신러닝 모델을 연속적으로 실행하여, 이들의 장점을 최대한 활용할 수 있는 발판을 마련했다고 판단된다. 둘째, 일부 브랜드와 전체자료에 대한 머신러닝 모델 간 비교를 통해 LGBR(Light Gradient Boosting Regression) 모델의 우수성을 확인하였다. 셋째, 개별 브랜드와 전체자료에서 차량 제원 변수와 차량 옵션 변수 중 어떤 세부 변수가 중고차 가격 예측에 영향을 미치는지, 이들 변수 간 영향력과 방향성을 확인하였다. 이를 통해 중고차 매매관계자들 간 정보의 불공평으로 인한 문제를 해결할 수 있는 하나의 방안이 될 것이라 사료 된다.
본 연구의 한계점은 다음과 같다. 우선 중고차 가격은 특성상 외부 요인에 따른 변동성이 존재한다. 이러한 중고차 가격의 변동성에 대응하기 위해서는 새로운 외부 요인(거시적 지표)을 추가하여 자료를 지속적으로 업데이트할 필요성이 있다. 다음으로 중고차 매매에서 가장 큰 문제는 정보의 불공평성으로 인해 중고차 매매자 간 신뢰수준이 낮은 것이다. 만약 차량 정보의 투명성이 확보되어 차량의 정비 및 사고 내역을 연구에 활용한다면 더욱 예측 정확도가 높은 머신러닝 모델의 구성이 가능할 것으로 예상된다. 마지막으로 본 연구는 중고차 가격을 예측하기 위해 하이퍼 파라미터 튜닝(Hyper Parameter Tuning)에 소모되는 시간을 고려하여 CRT(Classification Regression Tree), RFR, 그리고 LGBR 모델을 활용하였다. 그러나 추후 GBR(Gradient Boosting Regression) 모델과 XGBR(eXtra Gradient Boosting Regression) 모델을 추가하여 분류형 머신러닝 모델 간의 비교가 필요하다고 판단된다.

As of 2022, the domestic used car market is continuously growing, and the share of online used car platforms in the used car market is approaching 50%. The online used car platform discloses the vehicle''s model year, mileage, and various detailed options of the vehicle. Accordingly, consumers may easily check the information of the corresponding vehicle.
Most of the existing studies on used car price prediction have been studies using vehicle specifications. There was a tendency to appear in a nonlinear shape between the vehicle''s mileage and use period and the price of used cars. In order to solve this problem, a recent study predicted used car prices using a machine learning model and compared the cost function for each machine learning model.
Most of the findings have often demonstrated the superiority of the Random Forest (RF) model. However, classified machine learning models sometimes have different influences between dependent variables on the model, and the direction of their influence is also unknown. In addition, the MAPE(Mean Absolute Percent Error) was often relatively superior to the regression machine learning model. The regression machine learning model is suitable for checking the influence and direction between variables, but the predictive error rate of the result value was often relatively lower than that of the classified machine learning model.Accordingly, this study attempted to secure the advantages of these two types of machine learning models by executing a regression machine learning model and a classification machine learning model together.
This study collected the basic specifications and detailed option data of the vehicle using crawling and scraping on the online platform of used cars. Through the secured data, the direction and influence of variables that affect used car prices are checked using the Lasso regression machine learning model, and variables with zero influence in predicting the result value are derived. Next, the cost function figures of the classified machine learning model using all variables and the classified machine learning model removing variables with zero influence were compared using classified machine learning models.
This study identified vehicle parameters and option variables that affect the price of used cars by brand/total data. And as a result of research on classified machine learning models, there was no significant difference in cost function in comparison between classified machine learning models using all variables and classified machine learning models excluding variables with zero influence.
Through the results of this study, the advantages of the two types of machine learning models were maximized by continuously executing the two types of machine learning models. In addition, the excellence of the LGBR(Light Gradient Boosting Regression) model was confirmed through comparison between machine learning models for all data.
Finally, it was confirmed which detailed variable of the vehicle''s specification variable and the vehicle''s option variable had a significant effect on the price of used cars. In addition, the magnitude and direction of influence between these variables were confirmed. Through this, it is believed that it will be one way to solve the problem caused by information inequality among used car sales officials.
The limitations of this study are as follows. First, the price of used cars has variability according to external factors due to their characteristics. In order to cope with the volatility of used car prices, it is necessary to continuously update the data by adding new external factors. Second, the biggest problem with used car sales is the low level of trust between sellers due to information inequality. If transparency in vehicle information is secured and the vehicle maintenance and accident details are used for analysis, it is expected that a more accurate model configuration will be possible.
This study used some of the classified machine learning models to predict used car prices. The time-consuming Classification and Regression Tree(CRT), RFR, and LGBR machine learning models for Hyper Parameter Tuning were utilized. However, there is a need to add Gradient Boosting Regression(GBR) and eXtra Gradient Boosting Regression(XGBR) machine learning models to compare them between classified machine learning models.

#중고차 가격 #라쏘 회귀 #랜덤 포레스트 회귀 #LGB 회귀

1장. 서론 1
1. 연구의 배경 및 목적 1
2. 연구방법 및 구성 4
2장 선행 연구 및 이론적 배경 5
1. 중고차 시장 5
1) 중고차 시장의 현황 5
2) 중고차 가격에 관한 연구 7
2. 머신러닝 9
1) 머신러닝의 정의와 종류 9
2) 머신러닝 모델을 활용한 중고차 가격 예측에 관한 연구 11
3장. 자료 수집과 변수설정 16
1. 자료 수집 16
2. 변수의 조작적 정의와 측정 방법 21
1) 차량 제원 22
2) 차량 옵션 24
3. 표본 특성 25
4장. 연구 모형 28
1. 머신러닝(Machine Learning) 32
1) 머신러닝의 종류 32
2) 라쏘(Lasso) 회귀 모델 33
3) CRT(Classification and Regression Tree) 모델 34
2. 머신러닝 모델별 최적 모델의 실행 과정 37
1) 과대적합과 과소적합(Overfitting & Underfitting) 37
2) K-Fold 교차검증(K-Fold Cross Validation) 38
3) 머신러닝의 하이퍼 파라미터 튜닝(Hyper Parameter Tuning) 39
3. 비용함수(Cost Function)와 R^2 41
1) MSE(Mean Squared Error) 41
2) RMSE(Root Mean Squared Error) 41
3) MAE(Mean Absolute Error) 42
4) MAPE(Mean Absolute Percentage Error) 42
5) R^2 43
5장. 연구 결과 44
1. 브랜드별 머신러닝 모델 간 비교 44
1) 기아 45
2) 르노 54
3) 쉐보레 63
4) 쌍용 72
5) 제네시스 81
6) 현대 90
2. 전체자료의 머신러닝 모델 간 비교 99
1) 전체자료 99
2) 브랜드별과 전체자료의 최우수 분류형 머신러닝 모델 결과 110
6장 결론 112
1. 연구 결과의 요약 112
2. 연구의 의의 116
3. 연구의 한계점 117
참고문헌 119
영문초록 123

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (5)

초록· 키워드

목차

최근 본 자료

댓글(0)