Anda di halaman 1dari 3


HR Forecasting using R, Python & Tableau

Submitted towards the partial fulfillment of the criteria for award of Genpact Data
Science Prodegree by Imarticus

Submitted By:
Himanshu Prajapati
Ninad Gomase
Shubham Panchal

Course and Batch:

Scope & Objective
The main objective of this project is to explore the organization’s data to derive useful
insights which will help the higher management to take important decisions. Further,
the scope of this project is to build classification models to predict which employee is
likely to churn and help the organization to devise policies and attract back the right

Business Problem Statement

For organization’s long term success, below are the set of questions which will be
answered as part of this project which will help business to take critical decisions.

1. Identify the overall attrition rate of the organization

2. Whether highly experienced professionals or fresher’s are leaving the
3. Whether attrition rate is impacted by the employees who are underpaid w.r.t
their education qualifications
4. From which department employees are majorly moving out of the organization
5. Whether employees working overtime are leaving the organization
6. Is the attrition rate impacted due to the travel time
7. Which Job roles are majorly impacted by attrition
8. Implement Machine Learning model to Predict Employee attrition rate
9. Improve model’s output accuracy using Boosting techniques

Data Sources
The dataset contains attrition. The following is a description of our dataset:

Classes: 2 (‘Yes’ and ‘No’)

Attributes (Columns): 35
Instances (Rows): 1490

This data was extracted from the kaggle portal found at:

Analytics Tools
Forecasting & Predictions:
1. Python 2.7
2. Jupyter Notebook
3. RStudio

Data Visualization & Insights:

1. Tableau Desktop (student version)

Data store
1. Microsoft Excel 2016

Analytics Approach
1. Collecting data (5% efforts)
2. Deriving insights from Historic data (25% efforts)
3. Data Wrangling (20% efforts)
4. Implementing Machine learning model (20% efforts)
5. Evaluating Model accuracy (15% efforts)
6. Boosting Model performance using different techniques (15% efforts)

Machine Learning Algorithm’s used

1. Logistic Regression
2. K- Nearest Neighbors Algorithm
3. Support Vector Machine
4. Decision Trees

Boosting Techniques
1. Gradient Boosting (GBM)
2. XGBoost
3. AdaBoost

KPIs, Timelines, Milestones

1. Data collection completed.
2. Derived insights from historic data using Tableau Desktop
3. Started implementing ML algorithm’s