23
International Journal of Computer Applications (0975 8887)
National Conference on Recent Trends in Information Technology (NCRTIT-2016)
[12] depends on a shared-nothing [17] design like Bigtable. Table 1: Airport Data Set [6]
Each DB2 server is accountable for a subset of the columns in
a table which it stores in a relational database. Both databases Attribute Description
afford a complete relational model with transactions. The Airport ID Unique OpenFlights identifier for this
limitation is that it is not scalable for huge amount of data as airport
data increases to a very larger extent. Hence apache hive Name Name of airport. May or may not
supports for huge amount of data contain the City name.
City Main city served by airport. May be
In this paper Apache Hive is considered for analysing large spelled differently from Name.
datasets stored in Hadoop's HDFS and compatible file systems Country Country or territory where airport is
such as Amazon S3 filesystem. It provides an SQL-like located.
language called HiveQL[9] with schema on read and IATA/FAA 3-letter FAA code, for airports
transparently converts queries to MapReduce, Apache located in Country "United States of
Tez[10] and Spark jobs. All three execution engines can run America"
in Hadoop YARN. To accelerate queries, it provides indexes, ICAO 4-letter ICAO code.
including bitmap indexes [11]. Latitude Decimal degrees, usually to six
significant digits. Negative is South,
3. CHALLENGES IN BIG DATA positive is North.
The uses of Big Data in various fields of knowledge are
Longitude Decimal degrees, usually to six
immense in the sense its potentiality of micro and macro
significant digits. Negative is West,
levels of analysis of the data. For instance, the tools in Big
positive is East.
Data help the Institutions to study the quantitative and
qualitative learning abilities of students from different strata Altitude In feet.
of the society. Even the behavioral learning and the Timezone Hours offset from UTC. Fractional
psychological attitudes of the student may also be estimated hours are expressed as decimals, eg.
through the tools of Big Data. Big Data can also be used in India is 5.5.
analyzing the cognitive abilities and the impact of health in DST Daylight savings time. One of E
acquiring the knowledge since health condition of the students (Europe), A (US/Canada), S (South
usually affects on learning process. America), O (Australia), Z (New
Zealand), N (None) or U (Unknown).
Further, the scope of big data is so vast that it has been used in See also: Help: Time
globalized urban societies in planning the locality, intelligence Tz database time Timezone in "tz" (Olson) format, eg.
transportation, air ambulance monitoring system, road "America/Los_Angeles". zone
mapping, environment and natural disaster prediction.
Big Data is supported by range of technologies such as Table 2: Airline Data Set [6]
Hadoop [4]. Traditional relational data base skill are still in Attribute Description
high demand but increasingly, so are the skills needed to work Airline Unique OpenFlights identifier for this
with the generation of non-relational data bases known as airline. ID
NoSQL. These NoSQL data bases which are often open
Name Name of the airline
source are built to handle the processing of large volumes of
Alias Alias of the airline. For example, All
data and use different design strategies, architectures and
Nippon Airways is commonly known
query languages. One of the biggest challenges in Big Data is
as "ANA".
Big Data analytics, where analyze examining and interpret
IATA 2-letter IATA code, if available.
Big Data.
ICAO 3-letter ICAO code, if available
In this paper first tables were created for the below mentioned Callsign Airline callsign.
Data Set [6]. The Data set was loaded into the created tables Country Country or territory where airline is
on an HDFS system. The Hive queries were applied and the incorporated
results were analyzed. Active "Y" if the airline is or has until
recently been operational, "N" if it is
4. ANALYSIS OF AIRPORT DATA defunct. This field is not reliable: in
The proposed method is made by considering following particular, major airlines that stopped
scenario under consideration flying long ago, but have not had their
An Airport has huge amount of data related to number of IATA code reassigned (eg.
flights, data and time of arrival and dispatch, flight routes, No. Ansett/AN), will incorrectly show as
of airports operating in each country, list of active airlines in "Y".
each country. The problem they faced till now its, they have
ability to analyze limited data from databases. The Proposed
model intension is to develop a model for the airline data to
provide platform for new analytics based on the following
queries.
The data description is as shown in Table 1 to Table 3
24
International Journal of Computer Applications (0975 8887)
National Conference on Recent Trends in Information Technology (NCRTIT-2016)
Table 3: Route Data Set [6] a) list of airports operating in the country India,
Attribute Description b) list of airlines having zero stops
Airline 2-letter (IATA) or 3-letter(ICAO) c) list of airlines operating with code share
code of the airline.
d) list highest airports in each country
Airline ID Unique OpenFlights identifier for e) list of active airlines in United State
airline
Source airport 3-letter (IATA) or 4-letter (ICAO) 5. METHODOLOGY
code of the source airport In this paper the tools used for the proposed method is
Hadoop , Hive and Sqoop which is mainly used for structured
Source airport ID Unique OpenFlights identifier for data. Assuming all the Hadoop tools have been installed and
source airport having semi structured information on airport data [7, 8]. The
above mentioned queries have to be addressed
Destination airport 3-letter (IATA) or 4-letter (ICAO)
code of the destination airport. Methodology used is as follows:
Destination airport Unique OpenFlights identifier for 1. Create tables with required attributes
ID destination airport.
2. Extract semi structured data into table using the load
Codeshare "Y" if this flight is a codeshare (that a command
is, not operated by Airline, but
3. Analyze data for the following queries
another carrier), empty otherwise.
a) list of airports operating in the country India
Stops Number of stops on this flight ("0" for b) list of airlines having zero stops
direct) c) list of airlines operating with code share
Equipment 3-letter codes for plane type(s) d) which country has highest airports
generally used on this flight, separated
by spaces e) list of active airlines in United State
25
International Journal of Computer Applications (0975 8887)
National Conference on Recent Trends in Information Technology (NCRTIT-2016)
26
International Journal of Computer Applications (0975 8887)
National Conference on Recent Trends in Information Technology (NCRTIT-2016)
27
International Journal of Computer Applications (0975 8887)
National Conference on Recent Trends in Information Technology (NCRTIT-2016)
[9] HiveQL Language Manual [15] Rowstron A., and Druschel P. Pastry: Scalable,
distributed object location and routing for largescale
[10] Apache Tez peer-to-peer systems. In Proc. of Middleware 2001 (Nov.
[11] Working with Students to Improve Indexing in Apache 2001), pp. 329.350.
Hive [16] Stoica I., Morris R., Karger D., Kaashoek, M. F., and
[12] Baru C. K., Fecteau G., Goyal A., Hsiao H., Jhingran A., Balakrishnan H. Chord: A scalable peer-to-peer lookup
Padmanabhan S., Copeland, To appear in OSDI 2006 13 service for Internet applications. In Proc. of SIGCOMM
G. P., and Wilson W. G. DB2 parallel edition. IBM (Aug. 2001), pp. 149.160.
Systems Journal 34, 2 (1995), 292.322. [17] Stonebraker M. The case for shared nothing. Database
[13] ORACLE.COM. www.oracle.com/technology/products/- Engineering Bulletin 9, 1 (Mar. 1986), 4.9.
database/clustering/index.html. [18] Zhao B. Y., Kubiatowicz J., and Joseph A. D. Tapestry:
[14] Ratnasamy S., Francis P., Handley M., Karp R., and An infrastructure for fault-tolerant wide-area location
Shenker S. A scalable content-addressable network. In and routing. Tech. Rep. UCB/CSD-01-1141, CS
Proc. of SIGCOMM (Aug. 2001), pp. 161. 172. Division, UC Berkeley, Apr. 2001.
IJCATM : www.ijcaonline.org 28