If
you
are
in
an
in-class
environment
for
this
course,
your
instructor
will
give
you
the
necessary
credentials
and
DNS
name
for
the
first
EC2
instance.
If
you
are
taking
this
course
online,
you
will
need
to
follow
the
AWS
installation
instructions
to
create
your
cluster.
Your
instructor
will
give
you
the
information
you
need
to
access
the
cluster.
This
information
will
include
the
DNS
name
or
IP
address
of
the
computer
running
Cloudera
Manager.
It
will
include
the
username
and
password
to
access
Cloudera
Manager.
Open
your
browser
and
go
to
the
server
running
Cloudera
Manager
using
port
7180.
The
DNS
name
will
vary.
For
example,
if
the
DNS
name
is:
ec2-72-44-45-204.compute-1.amazonaws.com
Then
the
URL
to
type
in
the
browser
is:
ec2-72-44-45-204.compute-1.amazonaws.com:7180
The
browser
will
show
the
login
page.
Do
not
log
in
at
this
time.
If
there
is
an
error,
double
check
the
DNS
name
or
IP
address
and
try
again.
If
you
are
unable
to
access
the
login
page,
please
ask
your
instructor
for
help.
You
will
need
to
create
and
install
your
cluster
using
Cloudera
Manager.
The
documentation
contains
complete
step-by-step
instructions.
You
can
find
the
documentation
at:
http://tiny.cloudera.com/install
You
will
need
a
least
four
hosts.
One
host
will
run
the
Cloudera
Manager
services.
The
rest
will
run
the
Hadoop
services.
The
version
of
Cloudera
Manager
should
be
5.0.2
and
the
version
of
CDH
should
be
5.0.0.
When
installing
the
cluster,
you
should
only
install
the
HDFS,
YARN,
and
ZooKeeper
services.
When
prompted
for
the
edition
of
Cloudera
Manager
to
install,
choose
the
trial
for
the
Data
Hub
Edition.
Step 1: Logging In
Before
you
can
start
working
with
Cloudera
Manager,
you
must
use
your
browser
to
connect
to
the
host
running
Cloudera
Manager.
1.
Open
your
browser
and
go
to
the
public
DNS
name
of
the
host
running
Cloudera
Manager
using
port
7180.
The
DNS
name
is
assigned
by
AWS
and
will
vary.
For
example,
if
the
DNS
name
is:
ec2-72-44-45-204.compute-1.amazonaws.com
ec2-72-44-45-204.compute-1.amazonaws.com:7180
2.
Verify
that
all
of
the
services
and
Cloudera
Management
Services
are
in
good
health.
5. Move your mouse over the chart to get the absolute value of the point in time.
7. Press the left and right arrows to get the next and previous values in the chart.
9.
To
the
right
of
the
charts
are
links
to
change
the
amount
of
time
shown
in
the
chart.
Change
the
time
to
one
hour
then
two
hours
and
observe
how
the
charts
update.
2. Type in HDFS and the context menu will display the relevant search items.
4. Notice that the search brought you to the HDFS overview page.
5. This time, use your keyboard and press the / key to access the search.
6.
Type
in
YARN
and
the
context
menu
will
come
up
with
the
relevant
search
items.
3.
Notice
that
the
services
data
is
updated
with
the
statistics
for
the
selected
time
period.
4. In the Health History section, click on the Show link for the various events.
5.
Notice
that
the
timeline
will
automatically
be
moved
back
to
the
time
of
the
event
and
that
the
charts
update
as
well.
6.
To
return
to
the
present
time,
click
on
the
Now
button
to
the
right
of
the
timeline.
7. Notice that the timeline has moved back to the present time.
4. Click on Continue.
6.
The
new
JournalNodes
need
to
be
configured
with
the
directory
where
their
state
information
will
be
stored.
Change
only
the
JournalNode
Edits
Directory
setting
for
all
of
the
JournalNodes
to:
/data0/dfs/jn
7.
Click
on
Continue.
Cloudera
Manager
will
start
enabling
high
availability.
If
the
step
Formatting
the
name
directories
of
the
current
NameNode
fails,
dont
worry
it
is
expected
to
fail,
since
the
directory
is
already
formatted.
9.
Click
on
OK.
We
will
perform
the
steps
this
message
talks
about
in
the
next
exercise.
2.
Look
through
the
information
presented
by
the
service's
Web
UI.
Once
you
are
done,
close
the
tab
or
window.
2. Click on the context menu for the cluster and click on Add a Service
4. Click on Continue.
An
error
will
appear,
showing
that
Hue
requires
services
like
Hive
to
be
installed
before
you
can
install
Hue.
5. Click on Close.
7. Click on Continue.
8. Click on Continue.
9. Click on Continue.
15. Use the same steps to install the Oozie service and accept all defaults.
16. Use the same steps to install the Impala service and accept all defaults.
17.
Use
the
same
steps
to
install
the
Hue
service.
When
prompted
to
select
the
dependencies,
choose
the
row
with
the
impala
service
defined.
Otherwise,
accept
all
defaults.
18. Click on the context menu for the Hue service and click on Start.
3.
Notice
the
validation
warning
saying
that
there
are
only
two
DataNodes
running
and
that
the
suggested
number
is
three.
4. Click on Add.
7. Click on Continue.
8. Click on Finish.
9.
Click
on
the
check
box
for
the
newly
added
DataNode.
It
will
be
the
only
one
that
has
a
status
of
Stopped.
The
HDFS
service
will
begin
to
replicate
blocks
to
the
new
DataNode.
During
this
time,
the
HDFS
service
will
still
say
that
it
is
in
Bad
Health.
Once
all
the
blocks
have
three-fold
replication,
the
service
will
change
to
Good
Health.
3. Click on Add.
4.
Click
in
the
row
with
the
fewest
Added
Roles.
This
will
add
the
HttpFS
instance
to
that
host.
5. Click on OK.
6. Click on Continue.
7.
Click
on
the
check
box
for
the
newly
added
HttpFS
instance.
It
will
be
the
only
one
that
has
a
status
of
Stopped.
9. Click on Start.
12. Click on the Configuration tab and then View and Edit.
13. In the row for HDFS Web Interface Role, choose httpfs.
16.
The
cluster
needs
to
be
restarted
to
pick
up
the
configuration
changes.
Click
on
the
context
menu
for
the
cluster
and
click
on
Restart.
2. Click on Hue Web UI. This will open a new tab or window for the Hue interface.
4. Click on Next.
7. Click on Next.
invalidate metadata;
3. Click on Execute.
5. Click on Execute.
7. Click on Execute.
4.
In
the
list
of
queries,
find
the
last
Impala
query
that
you
ran.
Look
at
the
data
that
is
tracked
by
Cloudera
Manager.
6.
This
page
gives
even
more
information
about
Impalas
execution
plan
and
information
about
the
query,
as
well
as
displaying
the
query
itself.
Using
this
information,
you
can
debug
slow
queries.
9.
This
page
shows
whether
the
best
practices
for
Impala
are
being
followed.
A
chart
shows
each
best
practice.
Read
through
the
descriptions
of
each
chart.
This chart shows the number of queries per second that Impala served.
3. Mouse over the chart and click on the context menu for it.
5.
We
want
to
create
a
trigger
that
will
change
the
status
if
the
Impala
service
is
being
used
too
much,
indicating
that
we
need
to
expand
the
cluster
with
new
nodes.
This
trigger
will
now
change
the
Impala
services
heath
to
Concerning
whenever
there
are
50
queries
per
second
to
the
Impala
service.
4. Click on Execute.
6. Click on Execute.
8. Click on Execute.
5. Under HDFS, check DataNode and leave the configuration group as it is.
6. Under YARN, check NodeManager and leave the configuration group as it is.
7.
Click
on
Create.
The
next
time
a
worker
node
is
added,
you
can
use
the
Worker
Host
template
to
quickly
set
up
the
host
to
run
a
DataNode
and
NodeManager.
2. Open a new browser tab and type in the base URL followed by:
/api/v6/tools/echo?message=hello%20world
http://ec2-54-191-61-124.us-west-
2.compute.amazonaws.com:7180/api/v6/tools/echo?message=
hello%20world
3.
After
hitting
enter,
you
will
see
browser
update
to
JSON
and
echo
back
the
message
in
the
URL.
/api/v6/clusters
http://ec2-54-191-61-124.us-west-
2.compute.amazonaws.com:7180/api/v6/clusters
5.
After
hitting
enter,
you
will
see
some
basic
information
about
the
cluster.
The
API
will
return
the
name
of
the
cluster
and
version
information.
The
cluster
name
is
needed
for
other
API
calls.
http://archive.cloudera.com/cdh5/parcels/5.0.1/
The
parcel
will
start
downloading
in
the
background.
This
may
take
a
few
minutes
to
finish.
Feel
free
to
explore
Cloudera
Manager
during
this
time.
2. Click on Activate.
Not
all
services
support
rolling
restarts.
Services
that
do
not
support
rolling
restarts
are
listed
as
basic.
6. Click on Confirm.
Cloudera
Manager
will
stop
certain
services,
rolling
restart
some
services,
and
start
all
services.
Once
the
process
is
done,
the
newer
version
of
CDH
will
be
active.
Admin User
Your
current
user
is
an
admin
user.
Cloudera
Manager
creates
this
user
by
default.
Other
users
can
be
created
with
less
permissions.
2. Find the row for the admin user and click on the Change Password button.
3.
Click
on
the
context
menu
for
the
cluster.
Notice
that
there
are
no
buttons
to
add
services
or
stop
the
cluster.
5.
Click
on
the
Configuration
tab
the
View.
Notice
that
the
user
can
view
all
configurations,
but
cannot
make
any
changes.
8.
Click
on
the
context
menu
for
the
cluster.
Notice
that
there
are
no
buttons
to
add
services
or
stop
the
cluster.
10.
Click
on
the
Configuration
tab
the
View.
Notice
that
the
user
can
view
all
configurations,
but
cannot
make
any
changes.
12. Click on Actions for Selected. Notice that this user can decommission hosts.