Anda di halaman 1dari 37

FusionSphere

Maintenance
Objectives
Upon completion of this course, you will be able to:
Describe general troubleshooting process.

List the suggested information for troubleshooting collection.

Illustrate general troubleshooting analysis methods.

Deal with the upgrade operations of FusionSphere Openstack

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 2
Contents
1. Routine Maintenance
1.1 Power-On and Power-Off

1.2 Adding a Node

1.3 Removing a Node

1.4 Backup and Restoration Policies

2. Health Check and Log Collection

3. Troubleshooting

4. Upgrade and Patching

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 3
1.1 Power-On and Power-Off
Power-off sequence:

Stop FusionStorage Power off the host


Stop service VMs. Power off service hosts.
Manager VMs (on the where FusionManager is Power off cabinets.
(on the portal) (on the portal)
portal) deployed.

Power-on sequence:

Power on the host


Power on service hosts. Start FusionStorage Start service VMs.
Power on cabinets. where FusionManager is
(on the portal) Manager VMs (window) (on the portal)
deployed.

Restrictions:
Power off the FusionSphere system by data center (DC). In a DC, power off hosts in available zones (AZs) that do not accommodate the Glance and Keystone services.
Then power off hosts in AZs that accommodate the Glance services but do not accommodate the Keystone services. Finally, power off the AZ that accommodates the
Keystone service.

When powering on the FusionSphere system, first power on hosts in the AZ that accommodates the Keystone services. Then power on hosts by DC. In each DC, first
power on hosts in AZs that accommodate the Glance services and then hosts in other AZs.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 4
1.2 Adding a Node
Scenario: With the increase of services, computing resources may fail to keep up with service requirements. You can add computing nodes to the system to expand computing
resource capacity.

Capacity expansion process

Power on the server to be added using Log in to the portal.


Start
PXE.

Switch to the capacity expansion page.

Select the server to be added.

(Optional) deploy roles.

End Submit configurations.

Restriction: The system does not support management node capacity expansion.
GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 5
1.3 Removing a Node
Scenario: If the computing resource utilization in an AZ is low for a long period of time, remove some nodes from the AZ to reduce the system capacity..

Capacity reduction process

Log in to the FusionManager Migrate VMs on the node to be removed


Start portal. to other nodes.

Delete the node (host) from the


host group.

Power off the node to be removed.

Log in to a node in the AZ.

Delete host configuration data.

End

Restriction: The system does not support management node capacity reduction .

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 6
1.4 Backup and Restoration Policies
Backup Policies
Maintenance engineers must back up the management data of services in each AZ before performing important operations, such as critical data modification. The
backup is used to ensure that the abnormal data can be restored when exceptions occur on the system or the operations does not achieve the expected results. The
management data of each service can be automatically or manually backed up.
The automatic backup function is enabled. (FusionSphere OpenStack at 03:00 and FusionManager at 02:00)
Third-party file transfer protocol (FTP) and File Transfer Protocol over SSL (FTPS) backup servers can be configured.
By default, a set of data is stored in seven copies on the third-party backup server. (The number of data copies can be configured.)
All data backups (seven backups by default) are automatically stored on the server local disks
Restoration policies
If an exception occurs or the operation has not achieved the expected result after an important operation, such as system upgrade or critical data modification,
recover data based on the restoration policies.
Before data restoration, ensure that no configuration operation is in progress.
During data restoration, do not perform any configuration operation.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 7
1.4.1 Backup Policy Configuration and Manual
Backup
Set backup policy Manual backup
Log in to the active FusionManager Start
Log in to the FusionManager portal.
node.

Run the command to set the number Log in to the management node.
Configure the third-party server.
of backups.

Run the backup command.

The backup policy includes the third-party backup server (portal) and the number of data backups to
Check the backup progress and result.
be stored (log in to the node and run the command to configure the number).
Three manual backup methods are available: No
Does data need to be backed
up to a third-party backup
Run OpenStack commands to manually back up data of all services or FusionManager data. server?

Log in to the FusionManager portal and perform manual backup (third-party server backup is Yes
available). Check the backup result on the third-party backup
server.
Log in to the active FusionManager node and perform manual backup. (Data can be backed up
to a third-party backup server.)
End

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 8
1.4.2 Manual Restoration
Run commands on the FusionManager node to manually
restore the data.
Start
The following two manual data restoration methods are available:
Log in to the active FusionManager node.
Run OpenStack commands to manually restore FusionManager data.
Check the backup package information.
The steps are similar to the description in the previous slides.

Log in to the active FusionManager node and run commands to


Log in to the standby FusionManager node.
perform manual restoration.
Stop services on the standby
FusionManager node.

Stop services on the active FusionManager


node.

Run restoration commands on the active


FusionManager node.

Start services on the active FusionManager


node.

Start the FusionManager standby node


services.

End

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 9
Contents
1. Routine Maintenance

2. Health Check and Log Collection


2.1 Log Collection

2.2 Health Check

3. Troubleshooting

4. Upgrade and Patching

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 10
2.1 Log Collection
Logs are collected over the FTP connection.

Nodes must be added before you use the log collection function for the first time.

You can use FusionCare to collect logs of the following products: FusionCompute, FusionManager, FusionStorage, FusionSphere OpenStack, and FusionAccess.

FusionCare collects logs of the OS, modules, scripts, and the watch dog process.

FusionCare can be used to collect logs of itself.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 11
2.2 Health Check
FusionCare uses a browser/server (B/S) architecture.

Nodes must be added before you use the health check function for the first time.

FusionCare can check the health status of FusionCompute, FusionManager, FusionStorage, FusionSphere OpenStack, and FusionAccess.

FusionCare can be used to perform health checks for key processes, configuration files, hardware status, and other related items.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 12
Contents
1. Routine Maintenance

2. Health Check and Log Collection

3. Troubleshooting
3.1 Troubleshooting Procedure 3.5 Checking Data Configuration
3.2 Checking Alarm Information 3.6 Checking Device Indicator Status
3.3 Viewing Monitoring Information 3.7 Fault Rectification
3.4 Querying Operation Logs

4. Upgrade and Patching

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 13
3.1 Troubleshooting Procedure

Troubleshooting Procedure:
Collect fault information.
Identify the fault.
Locate the fault.
Rectify the fault.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 14
3.1.1 Fault Information Collection and Identification
Fault information provides important clues for troubleshooting. System maintenance personnel must collect fault information as much as possible.
The information to be collected includes:
Fault symptom
Fault occurrence time and frequency
Fault location
Scope and impact of a fault
Device running status before a fault occurs
Operations performed before a fault occurs and the operation results
Device indicator status when a fault occurs

Before rectifying a fault, determine the fault type and the impact scope based on the information collected.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 15
3.1.2 Fault Locating
Common methods for locating a fault are as follows:
Check the alarm information on the management portal.
Check whether the monitoring information is normal on the management portal.
Query operation logs and analyze whether the operation process is correct.
Check whether the data configuration is correct on the management portal.
Check the device indicators to determine whether the devices are running properly.

Find out the exact cause for the fault from multiple possible causes by analyzing and comparing possible causes, and using ot her possible methods.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 16
3.2 Checking Alarm Information
The fault page on the FusionManager portal displays active alarms by default. Check whether the active alarms are related to the fault.
There are four alarm severities: critical, major, minor, and warning.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 17
3.3 Viewing Monitoring Information
On the performance monitoring page of the FusionManager portal, check the performance statistics to determine whether the fault is caused by
performance deterioration.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 18
3.4 Querying Operation Logs
On the operation log page of the FusionManager portal, check the operations performed by users to determine whether the fault is caused by user
operations.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 19
3.5 Checking Data Configuration
On the FusionManager device portal, check the hardware status, and check whether the data configuration is correct.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 20
3.6 Checking Device Indicator Status
Check the hardware indicator status to see whether exceptions occur, such as indicator steady red, devices not powered on, and no data transmission. The checking
items include:
Server BH622 V2 l Buttons and indicators on the panel MM620 Indicators on the panel

Server blade
SMM module
Switch modules
Power module 1. ALM indicator
2. ACT indicator
Fan module 3. Data transmission status indicator
Storage device 4. Connection status indicator

Controller enclosure DM Indicators on the panel


Disk enclosure
Switch
Access switch
Aggregation switch
1. Power switch 1. Data transmission status indicator
2. Status indicator 2. Connection status indicator
3. UID indicator 3. HLY indicator
4. Hard disk active indicator
5. Hard disk fault indicator
GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 21
3.7 Fault Rectification
Rectify the fault based on the located fault cause.
Alarm:
If an active alarm is caused by the fault, follow the procedure in the alarm online help to handle the alarm.
You can click the alarm name on the alarm portal of FusionManager to view the alarm online help.
Follow the steps provided in the alarm online help to rectify the fault.
Monitoring: If the fault is caused by performance deterioration, expand the system capacity.
Operation error:
If the fault is caused by a misoperation, roll back this operation.
Data configuration error:
If the data is incorrectly configured, reconfigure the data.
Hardware status error:
If the fault is caused by some physical device faults indicated by indicators, rectify the fault based on the specific indicator status. For example, if the power
indicator is off, power on or restart the device. If no data transmission is in progress, reconnect or replace the cables.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 22
Contents
1. Routine Maintenance

2. Health Check and Log Collection

3. Troubleshooting

4. Upgrade and Patching


4.1 Upgrade Process

4.2 Upgrade Overview

4.3 Create Upgrade Project

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 23
4.1 Upgrade Process
Upgrade Process:

FusionManager FusionSphere OpenStack

Note:
FusionManager and FusionSphere OpenStack are used in the NFV scenarios. During the upgrade process, upgrade FusionManager first and then
FusionSphere OpenStack.
FusionManager in all-in-one mode is used in NFV scenarios.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 24
4.2 Upgrade Overview
Start
Preparations for the upgrade: Create a project, distribute upgrade packages, and
perform a pre-upgrade check before the upgrade. The preparations must be completed
Create a project.
three days to half an hour before the upgrade. These operations have no adverse
impact on the system.
Upgrade: Perform the upgrade, complete the upgrade, and submit the project. You can Distribute the software packages.
choose to perform the upgrade online or offline. In offline mode, you must stop VMs
before performing the upgrade, which interrupts tenant services. However, the time
End Perform a pre-upgrade check.
required by the offline upgrade is short. In online mode, you must migrate and
upgrade VMs in batches, which does not interrupt tenant services. However, the time
required by the online upgrade is long. Complete the upgrade in the same day when Complete the rollback. Perform the upgrade.
the upgrade is performed, and submit the upgrade project one week after the upgrade.
During the week, you can roll back the upgrade if exceptions occur. Roll back the nodes in batches. Upgrade the nodes in batches.
Rollback: Perform rollback operations and complete the rollback. Perform a rollback if
the upgrade fails or exceptions occur after the upgrade. Online and offline rollback To ensure high service continuity during
modes are supported. the upgrade, migrate and upgrade VMs Submit the project.
on the board in batches. VMs are
migrated in FusionManager, and other
functions are migrated in the tools End
upgrade.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 25
4.3 Create Upgrade Project
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

Configure the software package


Create a project. Select the node type. Configure the node information.
directory.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 26
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

Configure the software package


Create a project. Select the node type. Configure the node information.
directory.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 27
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

Configure the software package


Create a project. Select the node type. Configure the node information.
directory.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 28
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

Configure the software package


Create a project. Select the node type. Configure the node information.
directory.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 29
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 30
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 31
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 32
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 33
4.3 Create Upgrade Project (Cont.)
Perform a pre-upgrade
Create a project. Distribute software packages. Perform the upgrade. Complete the upgrade. Submit the project.
check.

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 34
Summary
Routine Maintenance

Health Check and Log Collection

Troubleshooting

Upgrade and Patching

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 35
Question
What is the flow of power off and power on?

GRETA
El mejor cambio es la transformacin Copyright 2017 Huawei Technologies Co., Ltd. Todos los derechos reservados Page 36