Anda di halaman 1dari 25

201408

Cloudera Manager Training:


Hands-On Exercises
General Notes ............................................................................................................................ 2
In-Class Preparation: Accessing Your Cluster ................................................................ 3
Self-Study Preparation: Creating Your Cluster .............................................................. 4
Hands-On Exercise: Working with Cloudera Manager ................................................ 5
Hands-On Exercise: Enabling High Availability and Adding Services .................... 8
Hands-On Exercise: Monitoring and Using Hue, Impala, and Hive ...................... 14
Hands-On Exercise: Host Templating and the Cloudera Manager API ............... 19
Hands-On Exercise: Parcels and Rolling Restarts ...................................................... 22
Hands-On Exercise: Working With Users ..................................................................... 24

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 1


Not to be reproduced without prior written consent.
General Notes
The Cloudera Manager training course uses Amazon Web Services (AWS) and EC2
instances to create a cluster in the cloud. There will be a total of four EC2 instances.
The first virtual machine will run the Cloudera Manager services. Using AWS
credentials, Cloudera Manager will provision the other three EC2 instances.

If you are in an in-class environment for this course, your instructor will give you
the necessary credentials and DNS name for the first EC2 instance.

If you are taking this course online, you will need to follow the AWS installation
instructions to create your cluster.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 2


Not to be reproduced without prior written consent.
In-Class Preparation: Accessing Your
Cluster
In this preparatory exercise you will configure networking for your four instances.

Accessing Your Cluster: Cloud Training Environment


If you are in a Cloudera class and have been told by your instructor to perform
this section, please do so. Otherwise, please skip to the first Hands-On Exercise.

Your instructor will give you the information you need to access the cluster. This
information will include the DNS name or IP address of the computer running
Cloudera Manager. It will include the username and password to access Cloudera
Manager.

Open your browser and go to the server running Cloudera Manager using port 7180.
The DNS name will vary. For example, if the DNS name is:

ec2-72-44-45-204.compute-1.amazonaws.com

Then the URL to type in the browser is:

ec2-72-44-45-204.compute-1.amazonaws.com:7180

The browser will show the login page. Do not log in at this time. If there is an error,
double check the DNS name or IP address and try again. If you are unable to access
the login page, please ask your instructor for help.

This is the end of the setup activity


for the cloud training environment.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 3


Not to be reproduced without prior written consent.
Self-Study Preparation: Creating
Your Cluster
In this preparatory exercise you will create your cluster.

Creating Your Cluster: Cloud Training Environment


If you are in a Cloudera class, please skip to the first Hands-On Exercise.

You will need to create and install your cluster using Cloudera Manager. The
documentation contains complete step-by-step instructions. You can find the
documentation at:

http://tiny.cloudera.com/install

The easiest method is Installation Path A using Amazon Web Services.

You will need a least four hosts. One host will run the Cloudera Manager services.
The rest will run the Hadoop services.

The version of Cloudera Manager should be 5.0.2 and the version of CDH should be
5.0.0.

When installing the cluster, you should only install the HDFS, YARN, and ZooKeeper
services. When prompted for the edition of Cloudera Manager to install, choose the
trial for the Data Hub Edition.

This is the end of the setup activity


for the self-study training
environment.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 4


Not to be reproduced without prior written consent.
Hands-On Exercise: Working with
Cloudera Manager
In this exercise you will start working with Cloudera Manager. This will take
you through many of the day-to-day operations you will perform on your
cluster.

Step 1: Logging In

Before you can start working with Cloudera Manager, you must use your browser to
connect to the host running Cloudera Manager.

1. Open your browser and go to the public DNS name of the host running Cloudera
Manager using port 7180. The DNS name is assigned by AWS and will vary. For
example, if the DNS name is:

ec2-72-44-45-204.compute-1.amazonaws.com

Then the URL to type in the browser is:

ec2-72-44-45-204.compute-1.amazonaws.com:7180

2. The username is admin; the password is admin.

3. Select Remember Me and then click Login.

Step 2: Viewing the Home Page



The home page in Cloudera Manager gives an overview of the health of your cluster.

1. At the home page, look at the Status section.

2. Verify that all of the services and Cloudera Management Services are in good
health.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 5


Not to be reproduced without prior written consent.
3. Click on the context menu for the cluster and service to familiarize yourself with
the commands.

The context menu has the following icon:


4. In the Charts section, look at the charts for the cluster.

5. Move your mouse over the chart to get the absolute value of the point in time.

6. Click on the point in time to expand the chart details.

7. Press the left and right arrows to get the next and previous values in the chart.

8. Click on the X in the popup to close the chart details.

9. To the right of the charts are links to change the amount of time shown in the
chart. Change the time to one hour then two hours and observe how the charts
update.

Step 3: Searching in Cloudera Manager


Cloudera Manager includes a way to search through all settings and services to
quickly find what you are looking for.

1. In the top right, click on the search box.

2. Type in HDFS and the context menu will display the relevant search items.

3. In the service section, click on HDFS-1.

4. Notice that the search brought you to the HDFS overview page.

5. This time, use your keyboard and press the / key to access the search.

6. Type in YARN and the context menu will come up with the relevant search
items.

7. In the service section, click on YARN-1.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 6


Not to be reproduced without prior written consent.
8. Notice that the search brought you to the YARN overview page.

Step 4: Using the Timeline and Status


Using the timeline, you can look at a specific period of time. Moving the timeline will
update all of the data on the page, such as the charts and status.

1. In the timeline, move the start marker (leftmost) back an hour.

2. Move the end marker (rightmost) back an hour.

3. Notice that the services data is updated with the statistics for the selected time
period.

4. In the Health History section, click on the Show link for the various events.

5. Notice that the timeline will automatically be moved back to the time of the
event and that the charts update as well.

6. To return to the present time, click on the Now button to the right of the
timeline.

7. Notice that the timeline has moved back to the present time.

This is the end of the Exercise

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 7


Not to be reproduced without prior written consent.
Hands-On Exercise: Enabling High
Availability and Adding Services
In this exercise you will enable high availability (HA) for HDFS. You will also
install and configure some new services.

Step 1: Enabling HDFS High Availability (HA)


To remove the single point of failure in HDFS, we will enable HA in the cluster. This
will change the SecondaryNameNode to run as a Standby NameNode.

1. From the Clusters tab, select the HDFS service.

1. From Actions, select Enable High Availability.

2. In the second row, click on the Standby NameNode column.

3. In all three rows, click on the JournalNode column.

4. Click on Continue.

5. Click on Continue to accept the Nameservice Name.

6. The new JournalNodes need to be configured with the directory where their
state information will be stored. Change only the JournalNode Edits Directory
setting for all of the JournalNodes to:

/data0/dfs/jn

7. Click on Continue.
Cloudera Manager will start enabling high availability. If the step Formatting
the name directories of the current NameNode fails, dont worry it is expected
to fail, since the directory is already formatted.

8. Once the process is complete, click on Finish.

9. Click on OK.
We will perform the steps this message talks about in the next exercise.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 8


Not to be reproduced without prior written consent.
Step 2: Verifying Automatic Failover
When the Active NameNode fails, we want HDFS to automatically fail over to the
Standby NameNode.

1. Click on the Instances tab.

2. Find the column Automatic Failover.

3. Verify that the value is Yes.

Step 3: Performing a Manual Fail Over


To verify the correct setup, we will manually fail over the HDFS service.

1. Click on Manual Failover.

2. Once the failover is completed, click on Close.

Step 4: Viewing the NameNode Web UI


Most services in Hadoop have a Web UI that gives some information about the
service. Cloudera Manager takes many of the statistics and other information shown
in these Web UIs and displays them in its UI. Each service provides these links as a
convenience.

1. Click on the Web UI button and choose one of the nodes.

2. Look through the information presented by the service's Web UI. Once you are
done, close the tab or window.

Step 5: Adding Services


In this step, we will be adding some Hadoop Ecosystem services. Hive and Impala
are services that use a SQL-like language to process data. Hive is an abstraction on
top of MapReduce. Impala runs its own role instances. Oozie is a workflow manager
for Hadoop. It allows automation of entire Hadoop workflows with MapReduce, Hive,

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 9


Not to be reproduced without prior written consent.
and other ecosystem projects. Hue is a browser-based environment for graphically
interacting with a Hadoop cluster.

To install some services, a certain number of prerequisite services must be installed


first. We want to install the Hue web interface and all prerequisite services.

1. Go to the Cloudera Manager main page.

2. Click on the context menu for the cluster and click on Add a Service

The context menu has the following icon:


3. Click on the radio button next to Hue.

4. Click on Continue.

An error will appear, showing that Hue requires services like Hive to be
installed before you can install Hue.

5. Click on Close.

6. Click on the radio button next to Hive.

7. Click on Continue.

8. Click on Continue.

9. Click on Continue.

10. Click on Test Connection.

11. Click on Continue.

12. Click on Continue.

The Hive service will start installing.

13. Once the installation has finished, click on Continue.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 10


Not to be reproduced without prior written consent.
14. Click on Finish.

15. Use the same steps to install the Oozie service and accept all defaults.

16. Use the same steps to install the Impala service and accept all defaults.

17. Use the same steps to install the Hue service. When prompted to select the
dependencies, choose the row with the impala service defined. Otherwise,
accept all defaults.

18. Click on the context menu for the Hue service and click on Start.

19. Once the service is started, click on Close.

Step 6: Adding Another DataNode


The HDFS service is now showing bad health. This is because there are only two
DataNodes running on the cluster. You must add another DataNode to replicate the
HDFS blocks three times, which is the Hadoop default.

1. Click on the HDFS service.

2. Click on the Instances tab.

3. Notice the validation warning saying that there are only two DataNodes running
and that the suggested number is three.

4. Click on Add.

5. In the DataNode section, click on Select hosts.

6. Click on All hosts.

7. Click on Continue.

8. Click on Finish.

9. Click on the check box for the newly added DataNode. It will be the only
one that has a status of Stopped.

10. Click on Actions for Selected and then click on Start.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 11


Not to be reproduced without prior written consent.
11. Click on Start.

12. Once the new instance is started, click on Close.

The HDFS service will begin to replicate blocks to the new DataNode. During this
time, the HDFS service will still say that it is in Bad Health. Once all the blocks have
three-fold replication, the service will change to Good Health.

Step 7: Configuring HA for Hue


In the previous exercise, we enabled high availability. We need to make some
configuration changes to allow Hue to work with HA.

1. Click on Clusters then HDFS.

2. Click on the Instances tab.

3. Click on Add.

4. Click in the row with the fewest Added Roles. This will add the HttpFS
instance to that host.

5. Click on OK.

6. Click on Continue.

7. Click on the check box for the newly added HttpFS instance. It will be the
only one that has a status of Stopped.

8. Click on Actions for Selected and then click on Start.

9. Click on Start.

10. Once the new instance is started, click on Close.

11. Click on Clusters then Hue.

12. Click on the Configuration tab and then View and Edit.

13. In the row for HDFS Web Interface Role, choose httpfs.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 12


Not to be reproduced without prior written consent.
14. Click on Save Changes.

15. Go to the Cloudera Manager main page.

16. The cluster needs to be restarted to pick up the configuration changes. Click on
the context menu for the cluster and click on Restart.

17. Click on Restart.

18. Click on Close.

This is the end of the Exercise

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 13


Not to be reproduced without prior written consent.
Hands-On Exercise: Monitoring and
Using Hue, Impala, and Hive
In this exercise you will use Hue to run Impala and Hive queries and monitor
their services.

Step 1: Setting Up Hue


1. Click on Clusters then Hue.

2. Click on Hue Web UI. This will open a new tab or window for the Hue interface.

3. Log in with the username training and password training.


This creates the default superuser for Hue. If you need log in to Hue again, you
will need to use this username and password.

4. Click on Next.

5. Click on All to install all application examples.

6. Once the application examples are installed, click on Next.

7. Click on Next.

8. Click on Hue Home.

Step 2: Querying Impala



You can run Impala queries from within Hue.

1. Click on Query Editors, then Impala.

2. In the query box, type:

invalidate metadata;

3. Click on Execute.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 14


Not to be reproduced without prior written consent.
4. In the query box, type:

SELECT * FROM sample_07


WHERE total_emp > 6003930;

5. Click on Execute.

6. In the query box box, type:

SELECT AVG(salary), SUM(total_emp) FROM sample_07;

7. Click on Execute.

Step 3: Monitoring Impala Queries


Cloudera Manager monitors the queries and health of the Impala service. It gives
information about each query that you run.

1. Go back to the tab or window for Cloudera Manager.

2. Click on Clusters then Impala.

3. Click on the Queries tab.

4. In the list of queries, find the last Impala query that you ran. Look at the data
that is tracked by Cloudera Manager.

5. Click on the Details button for that row.

6. This page gives even more information about Impalas execution plan and
information about the query, as well as displaying the query itself. Using this
information, you can debug slow queries.

7. Click on Clusters then Impala.

8. Click on the Best Practices tab.

9. This page shows whether the best practices for Impala are being followed. A
chart shows each best practice. Read through the descriptions of each chart.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 15


Not to be reproduced without prior written consent.
Step 4: Creating an Impala Trigger
You can use Cloudera Manager to trigger events when a certain threshold is passed.
This can change the status of a service.

1. Click on the Charts Library tab.

2. Find the Impala Queries chart.

This chart shows the number of queries per second that Impala served.

3. Mouse over the chart and click on the context menu for it.

4. Click on Create Trigger.

5. We want to create a trigger that will change the status if the Impala
service is being used too much, indicating that we need to expand the
cluster with new nodes.

6. Give the trigger the name Impala Usage.

7. Change the Stream Threshold to 50.

8. Click on Create Trigger.

This trigger will now change the Impala services heath to Concerning
whenever there are 50 queries per second to the Impala service.

Step 4: Running Hive Queries



You can run Hive queries from within Hue.

1. Go back to the tab or window for Hue.

2. Click on Query Editors then Hive.

3. In the query box, type:

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 16


Not to be reproduced without prior written consent.
SELECT * FROM sample_08
WHERE description LIKE "%engineer%"
ORDER BY salary DESC;

4. Click on Execute.

5. In the query box, type:

SELECT * FROM sample_08


WHERE description NOT LIKE "%engineer%"
ORDER BY salary DESC;

6. Click on Execute.

7. In the query box, type:

SELECT isEngineer, AVG(salary) as avgsalary


FROM (
SELECT
INSTR(description, "engineer") != 0 as isEngineer,
salary
FROM sample_08
) engineersubselect
GROUP BY isEngineer;

8. Click on Execute.

Step 5: Hive and MapReduce Monitoring


Cloudera Manager monitors Hive queries and MapReduce job with the YARN service.

1. Go back to the tab or window for Cloudera Manager.

2. Click on Clusters then YARN.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 17


Not to be reproduced without prior written consent.
3. Review the charts for the YARN service showing the Hive query activity.

This is the end of the Exercise

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 18


Not to be reproduced without prior written consent.
Hands-On Exercise: Host Templating
and the Cloudera Manager API
In this exercise, we will create a host template for new hosts. We will use the
Cloudera Manager API to get status about the cluster.

Step 1: Creating a template


As the number of hosts in the cluster grows, we want a simple way to configure the
new hosts. Using Host Templating you can quickly add new hosts that have certain
role instances and configurations.

1. Go to the Hosts tab.

2. Click on the Templates tab.

3. Click on Click here to create a new template.

4. Give the template the name Worker Host.

5. Under HDFS, check DataNode and leave the configuration group as it is.

6. Under YARN, check NodeManager and leave the configuration group as it is.

7. Click on Create.

The next time a worker node is added, you can use the Worker Host template to
quickly set up the host to run a DataNode and NodeManager.

Step 2: Cloudera Manager API


Cloudera Manager has a built-in RESTful API to get and set information about the
cluster and its status. This API can be used with a browser, curl or other
programming languages that support HTTP verbs.

1. Note the base URL of the Cloudera Manager server.

For example, the base URL could be:

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 19


Not to be reproduced without prior written consent.
http://ec2-54-191-61-124.us-west-
2.compute.amazonaws.com:7180

2. Open a new browser tab and type in the base URL followed by:

/api/v6/tools/echo?message=hello%20world

For example, the full URL would look like:

http://ec2-54-191-61-124.us-west-
2.compute.amazonaws.com:7180/api/v6/tools/echo?message=
hello%20world

3. After hitting enter, you will see browser update to JSON and echo back the
message in the URL.

This uses Cloudera Manager's built-in echo to verify connectivity and


functionality.

4. Change the browser URL to the base URL followed by:

/api/v6/clusters

For example, the full URL would look like:

http://ec2-54-191-61-124.us-west-
2.compute.amazonaws.com:7180/api/v6/clusters

5. After hitting enter, you will see some basic information about the cluster. The
API will return the name of the cluster and version information. The cluster
name is needed for other API calls.

6. Go back to the Cloudera Manager tab.

7. Click on Support then API Documentation.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 20


Not to be reproduced without prior written consent.
8. This will bring up a page containing all of the documentation about Cloudera
Managers API. Familiarize yourself with the various calls.

This is the end of the Exercise

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 21


Not to be reproduced without prior written consent.
Hands-On Exercise: Parcels and
Rolling Restarts
In this exercise, you will update the version of CDH and deploy the update with
rolling restarts to prevent downtime.

Step 1: Downloading the Parcels


The cluster is using CDH 5.0.0, and newer minor versions have been released. Before
we can download the new parcels, we must update the URL of where to download
the parcels.

1. Click on Administration then Settings.

2. Find the setting that says Remote Parcel Repository URLs.

3. Change the setting to say:

http://archive.cloudera.com/cdh5/parcels/5.0.1/

4. Click on the Save Changes button on the top right.

5. Click on the New Parcels button.

6. Find the parcel for CDH 5.0.1 and click on Download.

The parcel will start downloading in the background. This may take a few
minutes to finish. Feel free to explore Cloudera Manager during this time.

Step 2: Distributing and Activating the Parcels


1. Once the download is finished, click on Distribute.

This will distribute the parcel to all hosts in the cluster.

2. Click on Activate.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 22


Not to be reproduced without prior written consent.
3. Click on Rolling Restart.

4. Check all services to restart under rolling and basic.

Not all services support rolling restarts. Services that do not support rolling
restarts are listed as basic.

5. Under Roles to include, click on All Roles.

6. Click on Confirm.

Cloudera Manager will stop certain services, rolling restart some services, and start
all services. Once the process is done, the newer version of CDH will be active.

This is the end of the Exercise


Copyright 2012-2014 Cloudera, Inc. All rights reserved. 23


Not to be reproduced without prior written consent.
Hands-On Exercise: Working With
Users
In this exercise, we will create new users and see how their permissions work.

Admin User
Your current user is an admin user. Cloudera Manager creates this user by default.
Other users can be created with less permissions.

Step 1: Change the Admin Password


For security purposes, you should change the default password for the admin user.

1. Click on Administration then Users.

2. Find the row for the admin user and click on the Change Password button.

3. Change the password to newpassword and click on Update.

Step 2: Adding Users


1. Click on Add User.

2. Create a read-only user called readonly.

3. Create a limited administrator user call limited.

Step 3: Logging In As Different Users


1. On the top right, click on admin then Logout.

2. At the prompt, login as the readonly user.

3. Click on the context menu for the cluster. Notice that there are no buttons to
add services or stop the cluster.

Copyright 2012-2014 Cloudera, Inc. All rights reserved. 24


Not to be reproduced without prior written consent.
4. Go to the HDFS service.

5. Click on the Configuration tab the View. Notice that the user can view all
configurations, but cannot make any changes.

6. On the top right, click on readonly then Logout.

7. At the prompt, login as the limited user.

8. Click on the context menu for the cluster. Notice that there are no buttons to
add services or stop the cluster.

9. Go to the HDFS service.

10. Click on the Configuration tab the View. Notice that the user can view all
configurations, but cannot make any changes.

11. Click on the Hosts tab.

12. Click on Actions for Selected. Notice that this user can decommission hosts.

This is the end of the Exercise



Copyright 2012-2014 Cloudera, Inc. All rights reserved. 25


Not to be reproduced without prior written consent.

Anda mungkin juga menyukai