Objective
In this research work, the study of Survival Analysis applied to customer churn in a Telco has been carried out. For this purpose, different techniques have been analyzed, comparing the characteristics, applications and results of each one. The techniques chosen have been statistical, such as the Kaplan-Meier model and the Cox Proportional Hazards model, and Machine Learning techniques, such as the Survival Tree, Random Survival Forest and Gradient Boosting models.
This type of analysis makes it possible to obtain the probability over time of an event occurring. In this case, the event selected was the abandonment of the company by customers.
The dataset used is composed of real data from a telecommunications company in Spain, using only those individual customers.
The analysis focused on three main applications:
On the one hand, comparisons have been made of the survival curves of different groups of customers according to variables, making it possible to detect which ones have a positive effect on customer retention and which ones increase the risk of abandonment.
On the other hand, the importance of the different characteristics in the abandonment rate has been obtained, making it possible to detect which aspects affect consumers to a greater extent when it comes to continuing to contract services.
Finally, using different techniques, prediction models for specific consumers have been developed, calculating the survival curve of a customer with certain characteristics. Two metrics, Harrell’s Concordance Index and Brier’s Integrated Loss, were used to determine the predictive ability of the models. The first one determines whether the model properly orders the test data, and the second one determines the difference between the predicted and the actual curve.
The results obtained are higher in the Machine Learning models, especially in the Gradient Boosting model, which uses survival trees as base estimators and optimizes the Cox proportional hazards function.
In conclusion, in this research work we have used machine learning techniques mainly used in medicine, to predict the abandonment of a company’s customers, and to obtain information about their behavior based on their characteristics. Based on the results obtained, decisions can be made to retain these customers.
BACHELOR’S THESIS BY:
SANTIAGO JUSTE VALVERDE
Academic Experience
- Double Degree in Computer Engineering and Business Administration, Universidad Carlos III de Madrid (September 2018 – September 2024)
- ERASMUS+ en Graz University of Technology. Austria (2021 – 2022)
Work Experience
- Product Analyst – MasOrange (September 2024 – present)
- Machine Learning Researcher – Universidad Carlos III de Madrid in collaboration with Grupo MásMóvil (September 2023 — May 2024)
Technical skills
- Programming languages: Python, Java, C/C++, SQL, HTML/CSS/JS.
- Development libraries: Pandas, Numpy, Tensorflow, Sci-kit Learn.
- Cloud Platforms: Google Cloud.
- Frameworks: Git.