29/1/2013
Trifecta
Team Guide : GuruMurthy Patrudu Members : Dasari Megahana Duvvi Sravani K.Manichand
Table of Contents
Description 1.0 Introduction
Page No.
2.0
2.3 Hardware Interface 4 2.4 User Characteristics 4 2.5Architecture Design 5 2.6Use Case Model Description .. 6
2.7Class Diagram ................................................................................ 6 2.8Sequence Diagrams...................................................................... 7 2.9 ER Diagram ................................................................................... 7 Specific Requirements 3.1 Use Case Reports . 7
3.0
Trifecta/ GVPCOE
Hadoop is an emerging industry standard for distributed data processing .Vast amounts of data are currently stored as SQL databases and lack distributed processing capabilities. In this project we propose a tool which can simplify the migration from SQL databases to Hadoop framework.
1.1 Purpose:
Large data can be processed concurrently because of disturbuted file system and disturbuted computing framework called map/reduce of hadoop
1.2
Scope:
We believe that this effort will allow organisations to port their current applications to hadoop with ease
Page 1
Version 1.0
Tool for porting SQL queries to Hadoop Software Requirements Specification Trifecta
1.4 References:
2. L. Guo, E. Tan, S. Chen, X. Zhang, and Y. E. Zhao, Analyzing patterns of user content generation in online social networks, in KDD, 2009. 2. E. Friedman, P. M. Pawlowski, and J. Cieslewicz, SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-dened functions, PVLDB, vol. 2, no. 2, pp. 14021413, 2009. 3.D. J. DeWitt, E. Paulson, E. Robinson, J. F. Naughton, J. Royalty,S. Shankar, and A. Krioukov, Clustera: an integrated computation and data management system, PVLDB, vol. 1, no. 1, pp. 2841, 2008
1.5
Technologies to be used:
Hadoop library framework(APACHE) is used for building the tool Eclipse is used as development platform to create the proposed tool.
Trifecta/GVPCOE
Page 2
version1.0
1.6
Requirements: This section will describe the functions of actors, their roles in the ystem and the constraints faced by the system.
Page 3
Front End Client: SQL queries are given as input and respective Hadoop map/reduce applications are produed.
Data Base Server: Hadoop Server. Back End: Hadoop map/reduce framework.
2.4 User Characteristics: This is a tool which ports SQL queries to Hadoop. SQL
query is given as input to the front end and its corresponding hadoop map/reduce application is produced. This application is given as input to hadoop map/reduce framework which has the capability of distributed computing which then processes and gives the result.
Page 4
2.7Architecture
Design:
Page 5
Page 6
2.10Sequence Diagrams:
2.11ER Diagram:
3.Specific
Requirements:
3.1 Use Case Reports: 1.Accept sql query from user. 2.Parse the query and obtain the hadoop file for the sql table fieldname or hadoop file are assumed to be as column names or sql table. 3.Determine hadoop program template from type of query. 4.Generate hadoop program from template generating column name and file name.
Page 7