Customer churn uml

Customer Churn Analysis

MAIN TOOL

Python

Secondary tool

Power Point

INDUSTRY

Business Analytics

πŸ“š About the Project

Customer churn prediction is a critical task in business analytics aimed at identifying customers likely to discontinue a service or product. This project leverages machine learning models to predict customer churn using theΒ Churn Modelling.csvΒ dataset, encompassing demographic, financial, and behavioral features. The project is divided into six key parts:

  1. Data Cleansing: Preprocessing, missing value handling, encoding categorical variables, and handling outliers.
  2. KNN Model: Using the k-Nearest Neighbors algorithm for churn prediction.
  3. Decision Tree, Random Forest, and Gradient Boosting Models: Comparing the performance of these models.
  4. Support Vector Machine (SVM): Using different kernels to improve prediction.
  5. Neural Networks: Implementing artificial neural networks for classification.
  6. Model Comparison: Evaluating and comparing all models for the best performance.

πŸš€ Key Highlights

Data Preprocessing

  • Handled missing values and irrelevant columns.
  • Encoded categorical variables using one-hot encoding.
  • Addressed class imbalance usingΒ SMOTE (Synthetic Minority Oversampling Technique).

Machine Learning Models

  • k-Nearest Neighbors (KNN):
    • Achieved optimal results withΒ k = 9Β using cross-validation.
  • Decision Tree, Random Forest, and Gradient Boosting:
    • Hyperparameter tuning withΒ GridSearchCVΒ for maximum efficiency.
  • Support Vector Machines (SVM):
    • Explored linear, RBF, polynomial, and sigmoid kernels.
  • Neural Networks:
    • Compared solvers for weight optimization and activation functions.

Model Comparison

  • Conducted a detailed comparison using:
    • Accuracy
    • Precision
    • Recall
    • F1-Score
    • ROC Curve and AUC values

πŸ“Š Tools & Technologies

  • Programming Languages: Python
  • Libraries:
    • Preprocessing: pandas, numpy
    • Visualization: matplotlib, seaborn
    • Machine Learning: scikit-learn, imbalanced-learn
  • Techniques: Hyperparameter tuning, SMOTE, cross-validation, feature importance analysis

πŸ“ˆ Key Results

  • Best Performing Model: Random Forest with an AUC of 0.86 and strong overall predictive accuracy.
  • Feature Importance:
    • Age and NumOfProducts were consistently the most important predictors across models.
  • Insights:
    • Addressed challenges like class imbalance and overfitting with techniques such as SMOTE and pruning.

πŸ“‚ Project Structure

β”œβ”€β”€ Data/ β”‚ β”œβ”€β”€ Churn_Modelling.csv β”œβ”€β”€ Analysis/ β”‚ β”œβ”€β”€ scripts/ β”‚ β”‚ β”œβ”€β”€ data_cleansing.py β”‚ β”‚ β”œβ”€β”€ knn_model.py β”‚ β”‚ β”œβ”€β”€ decision_tree_model.py β”‚ β”‚ β”œβ”€β”€ svm_model.py β”‚ β”‚ β”œβ”€β”€ neural_network_model.py β”œβ”€β”€ Visualizations/ β”‚ β”œβ”€β”€ roc_curves/ β”œβ”€β”€ Reports/ β”‚ β”œβ”€β”€ Customer_Churn_Analysis.pdf β”œβ”€β”€ README.md


πŸ“œ Detailed Report

For a comprehensive understanding of the project, including detailed methodologies, visualizations, and results, refer to the full report:Β Customer Churn Analysis Report.


🀝 Contributions & Feedback

If you’d like to contribute, suggest improvements, or have any questions, feel free to open an issue and please reach out viaΒ LinkedIn


Author: Syed Faizan
Master’s Student in Data Analytics and Machine Learning

Python Code and the Report of the Analysis

Powered By EmbedPress