Prediction of lost customers in the telecommunications industry utilizing machine learning on large data platforms
Keywords:
boosting, feature engineering, selection techniques, churn, datasetAbstract
The telecom industry study is crucial for boosting businesses' profitability, particularly by
accurately predicting churn. This research focused on developing a customized churn prediction
system for SyriaTel, a telecom company. High AUC values were essential for precise churn
forecasts, and the dataset was split into 70% training and 30% testing sets. Cross-validation
aided in reliable model assessment and hyper parameter tuning. Feature engineering and
selection techniques were employed to prepare the features for machine learning algorithms.
Addressing data imbalance, under-sampling and tree-based algorithms were utilized. Four treebased
models were chosen: Decision Tree, Random Forest, Gradient Boosting Machine, and
XGBOOST. Success relied on strategic planning and inclusion of mobile social network features.
XGBOOST outperformed with a 93.301% AUC on the SyriaTel dataset, followed by GBM,
Random Forest, and Decision Tree. Testing with a new dataset showed XGBOOST's AUC at 89%.
Regular model retraining is necessary due to non-stationary data. Incorporating Social Network
Analysis improved churn prediction in telecom