About me
BS in Computer Science and Engineering from University of
Connecticut
In the Healthcare Industry for over 19 years
Programmer most of my career - Architect, Designer
Worked in the SOA space for a number of years
Lead engineer in the mobile application space
Now Lead engineer in the Big Data Analytics Space - Hadoop
!
In my spare time
Love to travel with the family
Video games, music, movies
Community relations work
Fan of College basketball
2
The blowback
What we
accomplished
Roadmap to the
future
Lessons Learned
Questions?
Edge'Node'For'Hadoop'Client
Logs
Web
IVR
Portal
Mobile
Realtime Feed
Flume
SQOOP/
Flume
Chronos
NoSQL
Storing
weblogs
Log files to
HDFS
Scalding
Python
SAS
Pentaho
R
?
(Scala)
Cascading
Hive/Impala
(Java)
(SQL)
Analysis/Modeling'Tools
Analysis
Live Data
Streams
Web Analytics
Jobs
Tableau
Spotfire
Platfora
Event detection(Storm)
Real%me'Data'Store'or'event'processing
Data'Science'Tools'
SQOOP
Flume
MapReduce'Distributed'Programming'Framework
*Use Spark
Streaming and
append to Hadoop
output - Realtime
events
HDFS
SQOOP
CED/
Claims
Clinical
Data
RDBMS
Cognos
Microstrategy
?
6
Visualization
filestack
Hadoop'Cluster'running'HDFS'and'MapReduce
Includes'Management,'Monitoring'and'Security
Teradata
RDBMS
External'Hadoop'Output
Or'in'HDFS
Teradata
Hadoop'
Cluster'#3
Hadoop'
Cluster'#2
8
Back'up'or'copy'data'from'HDFS'to'a'
redundant'cluster'for'quick'recovery
*'For'future'implementa%on'TBD
10
11
Use Case 1
12
Use Case 2
13
Success!
Ready to tackle tougher more
complicated problems
!
14
15
& Challenges
16
But Why?
Overuse of the words Big & Data
!
17
Building
!
!
Service
!
!
Being
!
!
Brand
Ops efficiency
Customer Centric
Product
!
!
a Customer Persona
Efficiency
Impact
18
Predictive
!
!
Data
!
!
threat modeling
Archival
Network
Efficiency
19
Hadoop is complementary
20
Hadoop is Complementary
Hadoop excels at processing and analyzing large volumes of
distributed, unstructured, structured and semi-structured
data in batch or near real-time fashion for analysis
!
NoSQL databases are adept at storing and serving up multistructured data in near-real time for web-based applications
!
Massively parallel OLAP databases are best at providing
analysis of large volumes of mainly structured data Teradata
!
SAS/R - Modeling and Business Intelligence
!
Tableau - Visualization
21
22
23
What we accomplished?
Evangelized Hadoop
!
R on Hadoop
!
24
25
What we accomplished?
ETL - Ingest, Transform and Move patterns
!
27
28
29
Lessons Learned
Overuse of the words Big & Data
!
The overlap
!
31
Vendors
!
Vendor Partnerships
32
WWYS
Difficult to see. Always in motion
is the future
Yoda
!
33
Questions?
!
atif71@gmail.com
34