上QQ阅读APP看书，第一时间看更新

Description of the dataset

The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop our predictive model. The churn-80 and churn-20 datasets can be downloaded from the following links, respectively:

However, as more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

Note that the latter set is only used to evaluate the model (that is for demonstration purposes). For a production ready environment, telecommunication companies can use their own dataset with necessary preprocessing and feature engineering. The dataset has the following schema:

State: String
Account length: Integer
Area code: Integer
International plan: String
Voicemail plan: String
Number email messages: Integer
Total day minutes: Double
Total day calls: Integer
Total day charge: Double
Total eve minutes: Double
Total eve calls: Integer
Total eve charge: Double
Total night minutes: Double
Total night calls: Integer
Total night charge: Double
Total intl minutes: Double
Total intl calls: Integer
Total intl charge: Double
Customer service calls: Integer