0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
31 tayangan4 halaman
Hadoop is a distributed File System (dfs) that stores and processes data in a distributed fashion. It can be used to store and process large amounts of data in real time. It is a powerful tool for storing and processing large volumes of data.
Hadoop is a distributed File System (dfs) that stores and processes data in a distributed fashion. It can be used to store and process large amounts of data in real time. It is a powerful tool for storing and processing large volumes of data.
Hadoop is a distributed File System (dfs) that stores and processes data in a distributed fashion. It can be used to store and process large amounts of data in real time. It is a powerful tool for storing and processing large volumes of data.
What is Virtualization How above three are inter-related to each other What is Big Data Introduction to Analytics and the need for big data analytics Hadoop Solutions - Big Picture Hadoop distributions Comparing Hadoop Vs. Traditional systems Volunteer Computing Data Retrieval - Radom Access Vs. Sequential Access NoSQL Databases The Motivation for Hadoop Problems with traditional large-scale systems Requirements for a new approach Hadoop: Basic Concepts What is Hadoop? The Hadoop Distributed File System How MapReduce Works Anatomy of a Hadoop Cluster Hadoop Demons Namenode Datanode Secondary namenode Job tracker Task tracker HDFS at detail Blocks and Splits Replication Data high availability Data Integrity Cluster architecture and block placement Programming Practices & Performance Tuning Developing MapReduce Programs in Local Mode Pseudo-distributed Mode Fully distributed mode Writing a MapReduce Program Examining a Sample MapReduce Program Basic API Concepts The Driver Code The Mapper The Reducer Hadoop's Streaming API Setup Hadoop cluster Install and configure Apache Hadoop Make a fully distributed Hadoop cluster on a single laptop/desktop Install and configure Cloudera Hadoop distribution in fully distributed mode Install and configure Horton Works Hadoop distribution in fully distributed mode Monitoring the cluster Getting used to management console of Cloudera and Horton Works Hadoop Security Why Hadoop Security Is Important Hadoop's Security System Concepts What Kerberos Is and How it Works Configuring Kerberos Security Integrating a Secure Cluster with Other Systems Managing and Scheduling Jobs Managing Running Jobs Hands-On Exercise The FIFO Scheduler The FairScheduler Configuring the FairScheduler Hands-On Exercise Cluster Maintenance Checking HDFS Status Hands-On Exercise Copying Data Between Clusters Adding and Removing Cluster Nodes Rebalancing the Cluster Hands-On Exercise Name Node Metadata Backup Cluster Monitoring and Troubleshooting General System Monitoring Managing Hadoop's Log Files Using the NameNode and JobTracker Web UIs Hands-On Exercise Cluster Monitoring with Ganglia Common Troubleshooting Issues Benchmarking Your Cluster Hadoop Ecosystem covered as part of Hadoop Administrator Eco system component: Ganglia Install and configure Ganglia on a cluster Configure and use Ganglia Use Ganglia for graphs. Eco system component: Nagios Nagios concepts Install and configure Nagios on cluster Use Nagios for sample alerts and monitoring Eco system component: Hive Hive concepts Install and configure hive on cluster Create database, access it console Develop and run sample applications in Java/Python to access hive Eco system component: Sqoop Install and configure sqoop on cluster Import data from Oracle/Mysql to hive Overview of other Eco system component: Oozie, Avro, Thrift, Rest, Mahout, Cassandra, YARN, MR2 etc