Anda di halaman 1dari 26

Data Scientist,

Business
Analytics
Project
Business Analytics
Agenda
•  Competencies and character of a data scientist
•  What is a BA project?
•  Key questions to ask
•  Practice session – interactive
•  Concept of Big Data
A simple question
•  Can you do this?
Exercise on Reading – Read this
YELLOW BLUE ORANGE
BLACK RED GREEN
PURPLE YELLOW RED
ORANGE GREEN BLACK
BLUE RED PURPLE
GREEN BLUE ORANGE
Exercise on Reading – Read this
YELLOW BLUE ORANGE
BLACK RED GREEN
PURPLE YELLOW RED
ORANGE GREEN BLACK
BLUE RED PURPLE
GREEN BLUE ORANGE
Data Scientist
•  Data?
•  Scientist?
•  Data Scientist
•  “A data scientist is a person responsible for
collecting, analyzing, and interpreting large
amounts of data to identify ways to help business
improve operations and gain competitive edge
over their rivals”. – Whatis.com
Data Scientist – Technical
Skills
•  Tools.
o  SAS, R, Python, Excel.
•  Technology.
o  Big Data, Hadoop, Map-Reduce, Yarn.
•  Techniques/Algorithms.
o  Linear Regression, Logistic Regression, Decision Tree, KNN, Random Forest,
etc.
•  When to use what?
•  What are benefits of each?
•  What are the costs (risks) of each?
•  What kind of accuracy this wants?
•  Specific analysis versus quick-and-dirty analysis.
Data Scientist – Domain
Expertise
•  Business
o  Banking, Financial Services, Manufacturing, Retail, Education, Aviation,
Healthcare, etc.

•  Functional
o  Marketing, Sales, Finance, Administration, Human Resources, Customer
Services, Customer Delivery, etc.

•  Marketing for a bank’s products and services versus


marketing for a consumer product company.
•  Human Resources challenges for an airline versus a
business school.
Data Scientist – Domain
Expertise
•  Sales of a retail chain versus financial services.
•  Customer delivery in a bank versus IT company.
•  Geography.
o  Sales in India versus Europe.
o  Warranty in America versus Japan.

•  Regulations.
•  Statutory.
•  Legal.
•  Governmental issues and priorities.
Data Scientist – Business
Expertise
•  Implementation Constraints.
•  Data Availability.
o  The more you have, the more you want!

•  Technology Availability.
•  Ease of understanding.
•  Costs versus benefits.
•  Ease of implementation.
o  Versus Accuracy.

•  Rigidity versus flexibility of organizations.


•  Open to change, innovation.
Data Scientist – Cultural
Awareness
•  Ethnicity.
o  Hispanic, Asian, African, European, etc.

•  Geography.
o  Sales in India versus Europe.
o  Warranty in America versus Japan.

•  Societal behavior.
•  Confidentiality.
•  Example: “Indica is more prone to accidents than
Santro” – Is there a reasoning to this statement?
Data Scientist –
Leadership
•  What is leadership?
•  Encourages/takes risks.
•  Leads people by example.
•  Does things himself when required.
•  Is aware of client sensitivities.
•  Is aware of team limitations.
•  Is aware of self limitations.
•  Sets a clear agenda.
•  Assertive, yet polite and understanding.
•  Persuasive.
•  Is a stable person.
o  Cannot be pushed around.
•  Enjoys a great clout in the organization.
Data Scientist –
Management
•  What is management?
•  Scope.
•  Assumptions and constraints.
•  Time.
•  Cost/Price.
•  Stakeholders.
•  Benefits.
•  Simple and efficient.
Data Scientist –
Communication
•  What is communication?
o  Understanding
o  Being understood
o  Relevant
•  Verbal.
•  Written.
•  Cultural dimensions.
•  Assertive.
•  Negotiator.
•  Persuasive.
•  Impactful.
•  Listens more, talks less.
Data Scientist – Recap
Communicati Technically
on savvy savvy

Management DATA Domain


savvy SCIENTIST savvy

Leadership Business
savvy savvy
Culturally
savvy
Business Analytics Project
•  Business Analytics?
•  Project?
o  “A project can be defined as a temporary endeavor to create a unique
product or service”.

•  Temporary.
o  Definitive start and end dates.
o  Grouping of people for this definite purpose.
o  Cross-functional team.

•  Unique.
o  Clearly defined.
o  Pre-defined inputs.
o  Known outputs.
o  Changes the status-quo.
BA Project: Examples
•  Defining a customer as Good or Bad based on
some patterns.
•  Estimating the likely price of a car based on
technical parameters.
•  Maximizing the store sales by understanding sales
patterns better.
o  Market basket analysis.
o  Paired products.
•  Understanding the likely default rate.
•  Estimating the likely revenue for next fiscal.
•  Estimating the likely insurance cost for automobiles
based on various parameters.
BA Project (specific): Steps
•  Problem definition.
o  Business
o  Analytics
•  Problem and constraints understanding.
o  What is the business trying to solve?
o  Technology
o  Time
o  Confidentiality
o  Location
o  Costs
o  Accuracy
•  High-level planning and task breakdown.
o  Consultative teams
o  Dedicated team
•  Integration.
•  Results interpretation.
•  Model optimization.
•  Business presentation.
This is a project with a clear outcome expectation. The data is
captured that is specific to the outcome needed.
BA Project (Insights):
3 key questions
1.  Can we do it with the existing data?
o  Yes
o  No

2.  How useful this analysis will be for the stakeholder?


o  High
o  Medium
o  Low

3.  What analysis is required for this?

This is a project where the data is given and any insight that comes
out of it is welcome. This is more exploratory in nature.
3 key questions: An
example
•  Bank data
o  Age
o  Job
o  Marital status
o  Education
o  Default (Y/N)
o  Balance (in Rs.)
o  House loan (Y/N)
o  Contact (Landline/Mobile/Email)
o  Last contacted day
o  Last contacted month
o  Last contact duration
o  Last contact days number
o  Outcome
o  Subscribed (Y/N)
3 key questions: An
example
•  Let us start with answering the following questions.
•  Who are the various stakeholders?
•  What questions will they (each of them) have with
this data?
•  Create a spreadsheet to clearly document.
o  Can we do this with the existing data?
o  How useful will this analysis be for the stakeholder?
o  What analysis is required for this?

•  You will start working on those which can be done


and those that are of high value to the
stakeholder(s).
Big Data
•  Computing has become cheaper now than before.
•  Commodity hardware can be used.
o  As against specialized and expensive enterprise servers.
•  Organizations collect a lot of data through social
media.
•  Data is text based and pictures (emoticons, thumbs
up, thumbs down), etc.
•  The need to gauge customer ‘sentiment’ is more
now than before.
•  There is a need to scan through large amounts of
data that is text and pictures.
•  Context matching.
Big Data
•  3 V’s
o  Volume – Increasing exponentially (Text à MP3 à Video à High Definition
Video)
o  Velocity – 400 million tweets/day
o  Variety – structured (pre-defined), unstructured (emails, chat
conversations, pictures, HD Video, screenshots, etc.)

•  There is a need to scan high volumes of data to


extract the trend/sentiment.
•  Hadoop platform.
•  Map-Reduce.
o  Key-Value pairs.

•  Pig, Hive, Spark, Storm, etc.


Map-Reduce: Pictorial
Depiction
Map-Reduce: Pictorial
Depiction
Thank You

Anda mungkin juga menyukai