Iran Telecommunication Research Center Tehran, Iran Abstract The availability of dependable physical infrastructure is essential to the existence of big data. In fact, without the accessibility of a robust physical infrastructure, big data probably couldnt emerge as much. In order to support such a huge volume of data, a physical infrastructure for big data must be big enough to handle the associated requests. This means satisfying both the quantity and the quality aspects of data handling. By quantity we refer to the substantial figure of the computational nodes, and by quality we refer to their high performance. Redundancy is a key factor to improve service availability in big data. On the other hand, low latency is a crucial necessity of big data. These requirements together, greatly contribute to the energy consumptions of infrastructure that support big data. In this paper, we briefly review some energy attributes, as well as some conservation techniques that are effective and can alleviate the energy utilization of big data infrastructure. We mainly focus on the study of data center as it is the pivot component in the big data ecosystem. Keywords: Big Data, Energy Management, Data Center.
I.
INTRODUCTION
Managing and analyzing data often offers many
advantages across all industry sectors. The businesses struggle to extract valuable information with respect to their available customers, products and services, and want to provision the future of their business. Initially when a corporate has small number of customers, and usually with similar purchasing behavior, the business analysis becomes simple; yet, in contemporary societies with immense number of large corporations competing against each other, having access to the valuable information of the customers, products, and services can be very valuable, both economically and socially. Indeed, in modern societies with large number of businesses, each with a sophisticated business model and a large number of customers, an enormous data volume is generated every year. According to Intel corporation, 4 zettabyte of data was created only in
45
2013 (1 zettabyte equals 1 billion terabytes) [1]. Such
amount of data is generated by various sources including emails, documents, customer records, pictures, videos, social media, mobile communication, etc., just to name a few. Such various data types together with the emergence of Internet of Things (IoT) ubiquitous sensory data, create a significant volume with a large set of forms and formats. Such a large set of data types are unstructured. In this case, it will be practically impossible to think about performing data management and analysis in the traditional ways. Big data is considered as the capability to manage and analyze such a huge volume of diverse data, with an acceptable processing speed and an acceptable tolerable time frame, to allow realtime analysis and perform high quality services [2]. Accordingly, big data is referred to with three known characteristics: Volume which refers to the amount of data, Velocity which refers to the processing and fetching time of data, and Variety which refers to the types of data.