by
Published with permission for restricted use in STeP-IN SUMMIT 2014 in agreement with
full copyrights from owner(s) / author(s) of material. All rights reserved. No part of this
publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording or otherwise without
the prior consent of the owner(s) / author(s). This edition is manufactured in India and is
authorized for distribution only during STeP-IN SUMMIT 2014 as per the applicable
conditions.
Produced By Hosted By
www.stepinforum.org www.qsitglobal.com
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
1
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
Agenda
2
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
3
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
Design Workload
Messaging Servers
Message/sec
Bytes/sec
Hadoop
Number of jobs in parallel
NoSQL
Operations/sec
Read/Write/Update ratio
8
4
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
Monitoring Tools
Nagios, Zabbix, Ganglia, JMX utilities
Diagnostic Tools (APM)
visualVM, AppDynamics, Compuware
9
Benchmarks
Hadoop
TestDFSIO: Distributed i/o benchmark
mrbench: A map/reduce benchmark that can create many
small jobs
nnbench: A benchmark that stresses the namenode
NoSQL
Workload I: 90% read, 10% insert
Workload II: 50% read, 50% update
10
5
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
6
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
The challenges
Develop a near real time analytics solution to analyze the user feedback and
interactions
Solution uses Kafka, HBase and Hive as major technologies
SLA to support 50K messages per minute
Optimize and tune the Kafka clusters for maximum throughout
Real time monitoring of test environment for bottleneck identification
13
Benefits Realized
Optimum hardware utilization for complete solution
Zero performance issues on Go live
Maximum throughput of Kafka servers
14
7
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings
Q&A
15
Thank You
For more info mbatterywala@impetus.co.in, sbhale@impetus.co.in
16