Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Machine Learning On Prediciton of Customer churn in telecom industry, Lab Reports of Machine Learning

University of Engineering & Management Machine Learning

Project on Machine Learning Machine Learning On Prediciton of Customer churn in telecom industry

Typology: Lab Reports

2019/2020

On special offer

~~30 Points~~

Limited-time offer

Uploaded on 12/23/2020

saikat-dey 🇮🇳

1 document

1 / 35

This page cannot be seen from the preview

Don't miss anything!

Project Title: Prediction Of Customer Churn in Telecom

Company using Machine Learning.

Group Members:

Abhijan Bhattacharyya, UEM Kolkata (304201700900005)

Prajwal Bhimrao Walde, UEM Kolkata (304201700900406)

Saikat Dey, UEM Kolkata (304201700900516)

Soumya Ghosh, UEM Kolkata (304201700900678)

Sreetama Chanda, UEM Kolkata (304201700900722)

Document sign date :19 Sep, 2019

On special offer

Partial preview of the text

Download Machine Learning On Prediciton of Customer churn in telecom industry and more Lab Reports Machine Learning in PDF only on Docsity!

Project Title : Prediction Of Customer Churn in Telecom

Company using Machine Learning.

Group Members :

Abhijan Bhattacharyya, UEM Kolkata (304201700900005) Prajwal Bhimrao Walde, UEM Kolkata (304201700900406) Saikat Dey, UEM Kolkata (304201700900516) Soumya Ghosh, UEM Kolkata (304201700900678) Sreetama Chanda, UEM Kolkata (304201700900722)

1. Acknowledgement. 2. Project Objective. 3. Project Scope. 4. Hardware and Software Requirements. 5. Data Description. 6. Exploratory Data Analysis. 7. Data Pre-processing. 8. Model Building. 9. Future scope of improvements. 10. Conclusion.

PROJECT OBJECTIVE

Customer churn is a financial term that refers to the loss of a client or customer—that is, when a customer ceases to interact with a company or business. Similarly, the churn rate is the rate at which customers or clients are leaving a company within a specific period of time. A churn rate higher than a certain threshold can have both tangible and intangible effects on a company’s business success. Ideally, companies like to retain as many customers as they can. With the advent of advanced data science and machine learning techniques, it’s now possible for companies to identify potential customers who may cease doing business with them in the near future. In this article, you’ll see how a telecom company can predict customer churn based on different customer attributes such as age, gender and more. The details of the features used for customer churn prediction are provided in a later section.

PROJECT SCOPE

The broad scope of the Telecom Company Customer Churning Prediction Machine project includes :

The system will be available on an online platform for access of the higher authorities and members of telecom company for observing the possibilities of customer churning for certain facts in their services.
The system will provide basic information about each and every customers attached with the respective telecom company and the reason behind their churning.
The system will provide higher accuracy of prediction so that the telecom company can fix their customers issues and problems to prevent customer churning and bring new customer to the company.

DATA DESCRIPTION

Source of Data : Kaggle.com Taking a closer look, we see that the dataset contains 21 columns (also known as features or variables ). The first 20 columns are the independent variable, while the last column is the dependent variable that contains a binary value of 1 or 0. Here, 1 refers to the case where the customer left the telecom operator, and 0 is the case where the customer didn’t leave the telecom operator after a period of time. This is known as a binary classification problem , where you have only two possible values for the dependent variable—in this case, a customer either leaves the telecom operator after a period of time or doesn’t. To print the whole column and rows of data, the code is: data =pd.read_csv("Telecom.csv") print(data.head())

Now, to know the data type of the columns and the no of elements present in each columns we write: data =pd.read_csv("Telecom.csv") data.info()

StreamingTV – It denotes whether the customer stream medias on TV.
StreamingMovies – It denotes whether the customer stream movies and cinemas 15 .Contract – It denotes the time period of subscription of a particular telecom plan by the customer.
Paperless Billing – It denotes the payment method which is either online or offline.
Payment Method – It denotes the method of payment in online.
Monthly Charges – It denotes the amount of charges required.
Total Charges – It denotes the total bill of a customer including several other services each individual customer has chosen.
Churn – It denotes whether the customer has churned from the telecom company. All twenty columns are categorical columns present in the datasheet on which our project is based on. There is no null value present in the datasheet. To confirm the null value occurrence and calculate its percentage we can proceed with the following code : data=pd.read_csv(“Telecom.csv”) print(data.isnul().sum())

Our dataset doesn’t contain any outliers. Outliers are unwanted or junk data that reduce the accuracy of the machine designed for prediction. However, to be on the safe side we have checked if any outliers are important for our project and if there are any outliers present or not. The code is : sns.boxplot(x=data[“tenure”],hue=data[“Churn”],data=data) Like this we have plotted the other boxplot with respect to churn vs other data columns.

DATA PRE-PROCESSING

First, we checked the data types of each columns. For the total charge column we explicitly change the data type from object to float and for the other object type columns we replace all the “yes” by 1 and all “no” by 0. Now we count the number of yes and no in each column. From this observations, we can see that in each column the number of no is greater than yes. And from the Count Plot we observe that the “No Phone Service” data and “ No Internet Service” data is not affecting the final outcome. So we replace this data with zero(0). And for the string type columns ( Gender , Tenants etc) we used Label Encoder to encode those strings into particular integer values. According to correlation we plot the countplot of each column with respect to churn to get a clear understanding about the more related columns. Now we plot the discplot to measure the frequency distribution for the highly corelated columns and for outliers we used boxplots to identify the outliers. For those columns we are not evenly distributed we scaled the values of that column to get a clear understanding of the distribution curve

MODEL BUILDING

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score,precision_score,recall_score from sklearn.metrics import f1_score from sklearn.metrics import confusion_matrix from sklearn.feature_selection import SelectKBest,f_classif data=pd.read_csv("Telecom.csv") data.drop("customerID",axis=1,inplace=True) data["TotalCharges"]=data["TotalCharges"].replace(r'\s+',np.nan,reg ex=True) data["TotalCharges"]=pd.to_numeric(data["TotalCharges"]) le_is=LabelEncoder() data["internet_n"]=le_is.fit_transform(data["InternetService"]) data.drop("InternetService",axis=1,inplace=True) data.replace(["Yes","No"],[1,0],inplace=True) data.replace(["No internet service","No phone service"],["9","9"],inplace=True) le_gender=LabelEncoder()

Machine Learning On Prediciton of Customer churn in telecom industry, Lab Reports of Machine Learning

Related documents

Partial preview of the text

Download Machine Learning On Prediciton of Customer churn in telecom industry and more Lab Reports Machine Learning in PDF only on Docsity!

Project Title : Prediction Of Customer Churn in Telecom

Company using Machine Learning.

Group Members :

CONTENTS

PROJECT OBJECTIVE

PROJECT SCOPE

DATA DESCRIPTION

DATA PRE-PROCESSING

MODEL BUILDING