Anda di halaman 1dari 9

STeP-IN SUMMIT 2014

11th International Conference on Software Testing


June 2014 at Bangalore, Hyderabad, Pune - INDIA

Performance testing Hadoop based big data analytics


solutions

by

Mustufa Batterywala, Performance Architect, and


Shirish Bhale, Director of Engineering, Impetus Infotech

Copyright: STeP-IN Forum

Published with permission for restricted use in STeP-IN SUMMIT 2014 in agreement with
full copyrights from owner(s) / author(s) of material. All rights reserved. No part of this
publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording or otherwise without
the prior consent of the owner(s) / author(s). This edition is manufactured in India and is
authorized for distribution only during STeP-IN SUMMIT 2014 as per the applicable
conditions.

Practices Experience Knowledge Automation

Produced By Hosted By

www.stepinforum.org www.qsitglobal.com
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Performance testing Hadoop based Big Data


Analytics solutions
1

Key Big Data Technologies


Map Reduce
Apache Hadoop, Cloudera, Hortonworks, MapR etc.
NoSQL
Cassandra, Mongo DB, Oracle NoSQL, Neo4j etc.
Messaging queues
Kafka, ActiveMQ, RabbitMQ, ZeroMQ etc.
Search
Lucene, Elastic Search, Solr
2

1
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Agenda

Big Data Performance Testing Focus Areas


Challenges
Performance Testing Approach
Solutions
3

Big Data Performance Test Focus Areas


4

2
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Performance Testing Challenges


Diverse technologies
Unavailability of tools
Test scripting
Test environment
Limited monitoring solutions
Lack of diagnostic solutions
5

Performance Testing Approach


6

3
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Performance Test Environment


Leverage cloud
Automate environment creation
Single node deployment
Optimize, Tune
Set up test cluster
Replication
Fail over configuration
7

Design Workload
Messaging Servers
Message/sec
Bytes/sec
Hadoop
Number of jobs in parallel
NoSQL
Operations/sec
Read/Write/Update ratio
8

4
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Performance Testing Solutions


Performance Test Tools
YCSB (Yahoo Cloud Serving Benchmark), SandStorm,
JMeter, Cassandra stress test

Monitoring Tools
Nagios, Zabbix, Ganglia, JMX utilities
Diagnostic Tools (APM)
visualVM, AppDynamics, Compuware
9

Benchmarks
Hadoop
TestDFSIO: Distributed i/o benchmark
mrbench: A map/reduce benchmark that can create many
small jobs
nnbench: A benchmark that stresses the namenode
NoSQL
Workload I: 90% read, 10% insert
Workload II: 50% read, 50% update
10

5
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Critical Performance Parameters


NoSQL
Data Storage
Commit Logs
Concurrency
Caching
JVM parameters
11

Critical Performance Parameters


mapReduce configurations
Number of mappers & reducers
HDFS chunk size
Io.sort.mb, Io.sort.factor
Memory
Message queue configurations
Sync v/s async
Timeouts
12

6
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Real World Experience


About the application
Online Social networking website with almost 100 million registered users across the
globe
Hundreds and thousands of users are online at any given time

The challenges
Develop a near real time analytics solution to analyze the user feedback and
interactions
Solution uses Kafka, HBase and Hive as major technologies
SLA to support 50K messages per minute
Optimize and tune the Kafka clusters for maximum throughout
Real time monitoring of test environment for bottleneck identification
13

Real World Experience


Impetus contributions
Proposed and implemented performance test strategy for analytics solution
Identified key performance components namely Kafka and Hbase for focus testing
Proposed SandStorm as performance testing tool based on project requirements
Prepared Kafka clients to simulate expected data ingestion volumes
Optimized and tune single Kafka cluster in EC2 on medium instance
Executed tests with varying message rate and reached to the max throughput of 50k
messages per minute using 3 server cluster
Real time monitoring using SandStorm to identify performance bottlenecks

Benefits Realized
Optimum hardware utilization for complete solution
Zero performance issues on Go live
Maximum throughput of Kafka servers
14

7
Performance testing Hadoop based Big Data
STeP-IN SUMMIT 2014 Analytics solutions Conference Proceedings

Q&A
15

Thank You
For more info mbatterywala@impetus.co.in, sbhale@impetus.co.in
16

Anda mungkin juga menyukai