Anda di halaman 1dari 34

Reference No.

Product Name RNC

Target Audience For internal use Product Version Refer to product versions

WCDMA RAN
Prepared by Document Version V1.3
Maintenance Dept.

Solutions to RNC Emergencies

Huawei Technologies Co., Ltd

Prepared by RAN Maintenance Dept. Date

Reviewed by Date

Reviewed by Date

Approved by Date
Solutions to RNC Emergencies INTERNAL

2018-2-18 Huawei Confidential Page 2 of 34


Solutions to RNC Emergencies INTERNAL

Revision Records

Date Version Description Author

Revised according to the reviewing


2009-1-6 V1.1 Xu Zhijie 43183
comments of TSD.

Revised according to the comments of Jian


2009-1-12 V1.2 Xu Zhijie 43183
Huazhao.

2009-3-13 V1.3 Modify the method of collecting CHR log. Xu Zhijie 43183

2018-2-18 Huawei Confidential Page 3 of 34


Solutions to RNC Emergencies INTERNAL

Contents

1 Overview........................................................................................................................................ 4

2 Applicable Versions and Ranges................................................................................................ 4

3 Solutions to Access Emergencies.............................................................................................. 5


3.1 Handling Process................................................................................................................. 5
3.2 Querying Operation Logs..................................................................................................... 7
3.3 Viewing Related Alarms....................................................................................................... 7
3.3.1 Transmission Alarms.................................................................................................. 7
3.3.2 Device Alarms............................................................................................................ 8
3.3.3 IU Interface Signaling Plane Alarms..........................................................................8
3.3.4 IU Interface User Plane Alarms..................................................................................8
3.3.5 IU Link Congestion Alarms........................................................................................8
3.3.6 Node B Alarms........................................................................................................... 8
3.4 Tracing and Analyzing Related Signaling.............................................................................8
3.4.1 Tracing IU Interface Signaling....................................................................................8
3.4.2 Tracing IOS/CDT/IFTS Signaling...............................................................................8
3.5 Collecting and Analyzing Traffic Statistics............................................................................8

4 Solutions to KPI Deterioration Problems...................................................................................8


4.1 Judging Whether Faults are Present on the Same FRM/DPU Board or the Same DSP......8
4.2 Judging Whether Faults are Present in the Same SPU Subsystem.....................................8
4.3 Judging Whether Faults are Present on the Same Interface Board (IUB Interface).............8

5 Information Collection Checklist.................................................................................................8


5.1 Emergency........................................................................................................................... 8
5.2 After Services are Restored................................................................................................. 8

6 How to Use Common Tracing Tools and Obtain Logs..............................................................8


6.1 CDT/IFTS Tracing................................................................................................................ 8
6.2 Tracing IOS Signaling.......................................................................................................... 8
6.3 Tracing IU Interface Signaling.............................................................................................. 8
6.4 Obtaining BAM Logs............................................................................................................ 8
6.5 Obtaining CHR/Text Logs..................................................................................................... 8
6.6 Obtaining Performance Files................................................................................................8

2018-2-18 Huawei Confidential Page 4 of 34


Solutions to RNC Emergencies INTERNAL

1 Overview

Typically the following two types of emergencies (incidents) are present in the live
network:
 Emergencies of the access type
 Emergencies of the KPI deterioration type
This document provides some practical solutions to these emergencies, aiming at:
 Restoring services quickly;
 Shortening the period in which services are influenced (from the time when the
influence begins to the time when services are restored);
 Improving customer satisfaction, and;
 Improving skills of front-line GTS and related support personnel.

2 Applicable Versions and Ranges

This document is applicable to the following versions:


 RAN6.0
 RAN6.1
 RAN10
This document is applicable to the following situations:
 A customer complaint is received, saying that the problem has a great impact on
call access and calls initiated and received by many users in the live network.
 Networking planning gives a feedback that the KPI index such as access
success ratio or call drop ratio in the live network is seriously deteriorated, which
greatly affects call access and voice quality.
 If the services in a subsystem, subrack, or all RNC are affected (low access
success ratio or high call drop ratio), all services under an interface board are
affected, or if a DSP is faulty, you can quickly restore the services according to
this document.
 If the services of a base station are affected, you do not need to restore them
according to this document.

2018-2-18 Huawei Confidential Page 5 of 34


Solutions to RNC Emergencies INTERNAL

3 Solutions to Access Emergencies

Typically, in the case of access emergencies, users cannot enjoy services, such as
AMR voice, PS service. That is, UE cannot access the network by dial-up.
Except the transmission problems, usually the services are required to be restored
within one hour at the field. You can follow the following three steps to analyze and
deal with the problems:
 View related alarms.
 Trace and analyze related signaling.
 Analyze traffic statistics.
The engineers at the field need to check the following items:
 Type of the affected services, CS or PS?
 Whether UE can be normally registered?
 In the CS case, whether the originated or terminating call is affected?
 In the PS case, whether UE can be attached, whether the service can be
successfully set up, and whether the rate is normal?
 Since when the service is affected. Usually it is subject to the time when the first
user complained.
 Range of the affected cell. You can judge whether the cluster cell or all the cells
are affected based on the complaints.
The engineers at the field can analyze alarms, trace signaling, and traffic statistics to
check:
 Which interface (IU/IUB/IUR) affects the service.
 The range of affected users. Is a single subsystem, a subrack or all RNC
affected? If a subsystem causes the failure, you can swap or reset the SPU, or
reset the corresponding FMR/DPU board.
 Whether an interface board causes the failure. If yes, you can swap or reset the
interface board.

3.1 Handling Process


The flow of handling access emergencies is shown as follows:

2018-2-18 Huawei Confidential Page 6 of 34


Solutions to RNC Emergencies INTERNAL

2018-2-18 Huawei Confidential Page 7 of 34


Solutions to RNC Emergencies INTERNAL

3.2 Querying Operation Logs


You can use the LST OPTLOG command to query operation logs, checking whether
the failure is caused by misoperation. You can also ask the customer whether any
misoperation was conducted at the CN/NRNC/intermediate transmission device side.
If some parameters are modified at the RNC side and the modification surely affects
the service, you can recover the original parameters and check whether the service is
restored.
If the customer tells you that they modified some parameters at the
CN/NRNC/intermediate transmission device side, you can ask the customer to
recover the original parameters, and then check whether the service is restored.

While asking the customer about the operation process, you must query alarms and
trace signaling at the same time, and thus make sure that the emergencies can be
restored in time.

3.3 Viewing Related Alarms


By viewing related alarms, you can effectively distinguish, quickly isolate and deal
with the problem. This step must be completed within 15 minutes.

3.3.1 Transmission Alarms

If the following transmission alarms are present, the alarm time is consistent with the
failure time, and the alarms seriously affect IU/IUR interface or IUB interface (many
E1s under the interface board are abnormal, which affects many Node Bs connected
with the interface board), take the following measures to restore the service:
 Optical interface alarms

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-901 Optical port loss of signal ALM-981 The CSU reference out of lock

Optical port multiplex section remote Ingress tributary unit alarm


ALM-902 ALM-982
defect indication indication signal

Egress tributary unit alarm


ALM-903 Optical port loss of frame ALM-983
indication signal

2018-2-18 Huawei Confidential Page 8 of 34


Solutions to RNC Emergencies INTERNAL

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-906 Optical port loss of cell delineation ALM-984 Ingress tributary unit loss of pointer

ALM-954 Tributary unit alarm indication signal ALM-985 Egress tributary unit loss of pointer

ALM-955 Tributary unit loss of pointer ALM-988 Optical port out of cell delineation

Lower order path signal label


ALM-956 ALM-9198 Receive line OOF
mismatch

ALM-960 Lower order path unequipped ALM-9199 Receive line LOF

Higher order path signal label


ALM-961 ALM-9200 Receive line LOS
mismatch

LRDI-multiplex section remote


ALM-962 Higher order path unequipped ALM-9201
receive failure indicator

LAIS-multiplex section alarm


ALM-964 SDH loop ALM-9202
indicator

Administrator unit alarm indication


ALM-965 ALM-9203 Loss of pointer (LOP)
signal

Optical port multiplex section alarm


ALM-972 ALM-9204 PAIS-path alarm indicator
indication signal

ALM-979 Loss of reference clock of optical ALM-9014 OAM AIS

ALM-980 The DCRU data out of lock ALM-9015 OAM RDI

 F4F5 alarms

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-401 F5 loss of continuity ALM-403 F5 remote alarm indication

ALM-402 F5 alarm indication signal ALM-406 F5 continuity LB failed alarm

 E1/T1 alarms

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-1101 E1/T1 loss of signal ALM-1103 E1/T1 remote alarm indication

ALM-1102 E1/T1 loss of frame alignment ALM-1104 E1/T1 alarm indication signal

2018-2-18 Huawei Confidential Page 9 of 34


Solutions to RNC Emergencies INTERNAL

 Link alarms

Alarm ID Alarm Name Alarm ID Alarm Name

Fractional IMA link remote


ALM-1001 IMA link loss of frame ALM-1019
reception failure

IMA link out of delay Fractional IMA/FRAC ATM link loss


ALM-1002 ALM-1020
synchronization of cell delineation

The Receive part of FRAC IMA link


ALM-1003 IMA link remote failure indicator ALM-1024
failure

ALM-1004 IMA link Rx fault ALM-1026 IMA link is blocked

ALM-1005 IMA link remote TX unusable ALM-1027 FRAC IMA link is blocked

ALM-1006 IMA link remote Rx unusable ALM-2303 APS link failure

ALM-1007 IMA/UNI link loss of cell delineation ALM-2602 PPP/MLPPP link down

ALM-1015 Fractional IMA link loss of frame ALM-2603 PPP/MLPPP link loop

Fractional IMA link out of delay


ALM-1016 ALM-2604 MLPPP group down
synchronization

Fractional IMA link remote reception


ALM-1017 ALM-2605 MLPPP band width insufficient
defect indication

Fractional IMA link remote


ALM-1018
transmission failure

 FE port alarms

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-851 FE/GE link down ALM-853 FE/GE link receive defect indication

ALM-852 FE/GE link send defect indication

 MSP alarms

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-2501 MSP K1/K2 mismatch alarm ALM-2506 MSP unit-bid mode mismatch alarm

2018-2-18 Huawei Confidential Page 10 of 34


Solutions to RNC Emergencies INTERNAL

ALM-2502 MSP K2 mismatch alarm ALM-9272 WRSS MSP K1/K2 mismatch alarm

If MSP alarms are present, you must collect MSP logs from related boards before and
after swapping and resetting the interface board. You can use the DSP MSPREP
command to collect and feedback logs.

The above alarms indicate that the optical interface is faulty, or the intermediate
transmission devices are faulty, or the optical fiber is faulty, or the data (with MSP
enabled) negotiated between the local end and the peer end is not consistent. You
need to judge whether the unit that gave alarms is associated with the affected
service and the alarm time is consistent with the failure time. If yes, take the following
measures.
Handling measures:
(1) If a backup interface board is available, swap the interface board. Otherwise go
to step 2. Check whether the service is restored. If not, go to step 2.
(2) Reset the interface board. If the failure still persists, go to step 3.
(3) Notify the user to check whether the intermediate transmission device or the
peer device is faulty.

3.3.2 Device Alarms

If the following alarms are present and the alarm time is consistent with the failure
time, take the following measures to restore the service:

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-101 Fan fault alarm ALM-652 GE Link Of backplane fault

ALM-106 Water alarm ALM-653 GE link of sub board fault

ALM-110 Board voltage abnormity ALM-654 Inter board HiGig link fault

ALM-113 High ambient temperature alarm ALM-655 Intra board HiGig link fault

Inter board HiGig Trunk group


ALM-115 High subrack temperature alarm ALM-656
communication fault

ALM-118 Board fault-abnormal voltage 1.20V ALM-657 GE switch unit fault

2018-2-18 Huawei Confidential Page 11 of 34


Solutions to RNC Emergencies INTERNAL

Alarm ID Alarm Name Alarm ID Alarm Name

Inter board HiGig communication


ALM-119 Board fault-abnormal voltage 1.25V ALM-658
fault

Failure of high-speed
ALM-120 Board fault-abnormal voltage 1.30V ALM-659 communication link on the
backplane of service boards

ALM-121 Board fault-abnormal voltage 1.50V ALM-661 GESW Trunk group link fault alarm

ALM-122 Board fault-abnormal voltage 1.80V ALM-662 GE Trunk group link fault alarm

SCU GESW backplane GE link


ALM-123 Board fault-abnormal voltage 2.50V ALM-665
descend

ALM-124 Board fault-abnormal voltage 3.30V ALM-666 DSP communication link failure

ALM-128 PIU chip selftest error ALM-667 GESW subboard GE link descend

ALM-130 Subboard PLL status abnormal ALM-668 SPU GESW panel GE link descend

WMUX/WMUXb reference clock Transaction board backplane GE


ALM-201 ALM-669
abnormity link descend

ALM-203 Backplane 32M clock abnormity ALM-670 NP failure

WMUX/WMUXb clock phase-locked Number of ME BD allocation


ALM-205 ALM-672
loop failure failures exceed threshold

WMUX 19M clock phase-locked loop


ALM-206 ALM-673 CORE heartbeat stop alarm
unlocked

WMUX 32M clock phase-locked loop Backplane ATM bus interface


ALM-207 ALM-701
unlocked abnormity

ALM-223 CPLD 33M clock alarm ALM-751 GE/FE conversion unit fault

ALM-301 WRBS board unavailable ALM-752 GE interface unit fault

ALM-302 Board subsystem fault ALM-801 HPI communication failure

ALM-307 Chip selftest failure ALM-802 DSP TDM abnormity

WFIE/WFEE/WEIE board microcode


ALM-315 ALM-804 DSP RFN abnormity
thread abort

WFIE/WFEE/WEIE board PCI


ALM-316 ALM-807 DSP start failure
channel abnormity

2018-2-18 Huawei Confidential Page 12 of 34


Solutions to RNC Emergencies INTERNAL

Alarm ID Alarm Name Alarm ID Alarm Name

Handshake failure between DSP


ALM-317 Subboard failure ALM-809
and main CPU

ALM-317 Subboard failure ALM-1161 TDM switching module failure

ALM-319 IXF1104 component abnormity ALM-1304 IPC interrupted

ALM-331 Microcode thread abort ALM-1305 MOT link disconnection fault

ALM-332 Communication channel abnormal ALM-1309 WHPU microcode thread abort

Back board GE link down of


ALM-333 Subboard abnormal ALM-2612
interface board

ALM-378 Packet switch unit fault ALM-9027 Fan fault alarm

ALM-379 The board logic fault ALM-9029 APC chip fault alarm

ALM-380 SubBoard status abnormal ALM-9046 ASX chip faulty

ALM-388 Board chip fault ALM-9130 COBA board faulty

ALM-389 Export clock fault ALM-9141 WRSS board unavailable

Fan monitoring board un-


ALM-393 Board temperature alarm ALM-9241
accessible alarm

ALM-601 ATM switching module failure ALM-9242 WRSS fan fault alarm

ALM-620 ATM switching logic alarm ALM-9245 SAR chip faulty

ALM-621 Logic module not functional ALM-9268 WRSS chip faulty

ALM-651 GE link of panel fault ALM-9281 HPU board packet discarded

The above alarms indicate that the device is faulty or the ambient environment is
abnormal. You need to reset or change the board, or improve the ambient
environment.
Handling measures:
(4) If the above alarms are present, follow the handing measures in the alarm help
to restore the service as soon as possible.
(5) If the service cannot be restored by taking the measures in the alarm help and
the board cannot be reset, run the RST BRD command to reset the
corresponding board.

2018-2-18 Huawei Confidential Page 13 of 34


Solutions to RNC Emergencies INTERNAL

(6) After the board is reset, the problem still persists. And if a backup board is
available, run the INH BRD command to inhibit the board.
(7) Replace the board at the local end. In case of non-emergency, replace the board
at night.
(8) If the fan subrack failure alarm is present, you need to provide some devices to
cool down the equipment, such as an electric fan.
(9) If the water alarm is present, improve the environment of the equipment room.

3.3.3 IU Interface Signaling Plane Alarms

If the following alarms about the IU interface signaling plane are present and the
alarm time is consistent with the failure time, take the following measures to restore
the service.

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-1403 MTP-3b DSP inaccessible ALM-1507 SCCP subsystem prohibited

ALM-1404 MTP-3b signaling route unavailable ALM-1615 AAL2 adjacent node unavailable

ALM-1406 MTP-3b signaling route inhibited ALM-1802 SAAL link unavailable

ALM-1413 MTP-3b signaling link unavailable ALM-1851 SCTP link down

ALM-1506 SCCP DSP unavailable ALM-1861 M3UA link fault

If the above alarms are present, view the alarm parameters to check whether the
signaling connected with the IU interface is disconnected or intermittent. The failure
may be caused by the interface board, the intermediate transmission device, or the
inconsistent data (with MSP enabled) negotiated between the local end and the peer
end.
Handling measures:
(10) If a single link is intermittent, you can block the link to solve the issue
temporarily:
- DEA MTP3BLNK
- DEA M3LNK
(11) Swap the IU interface board, and check whether the service is restored.
(12) If not, swap the SPU board.
(13) If the problem still persists, reset the active and standby IU interface boards at
the same time, and then check whether the service is restored. If not, reset the
active and standby SPU boards at the same time.

2018-2-18 Huawei Confidential Page 14 of 34


Solutions to RNC Emergencies INTERNAL

(14) If the problem still persists, trace the failure link messages (SAAL/SCTP/SCCP),
then check whether one-way link is present or packets are dropped, and thus
check whether the intermediate transmission device or the peer device is faulty.
If sure that the intermediate transmission device or the peer device is faulty,
notify the customer to troubleshoot the transmission devices.
(15) If the problem still persists, collect the messages at the
SAAL/SCTP/MTP3B/M3UA/SCCP layer, alarm logs, and CHR logs in the failure
period, and then feedback them to R&D personnel.

3.3.4 IU Interface User Plane Alarms

If the following alarms about the IU interface user plane are present and the alarm
time is consistent with the failure time, take the following measures to restore the
service.

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-1603 AAL2 path blocked by peer end ALM-1711 Path forward congested

ALM-1606 AAL2 path unavailable ALM-1901 Path to SGSN faulty

ALM-1710 IP PATH down ALM-1902 Link to SGSN faulty

The above alarms indicate that the user interface is faulty.

If the above alarms are not present, use the following methods to check the
networking:
ATM networking:

2018-2-18 Huawei Confidential Page 15 of 34


Solutions to RNC Emergencies INTERNAL

 If the peer device supports the LB function and the RNC version is not 29, you can
check the path status by using the LOP:VCL command (select AAL2Path). If the
path status is UP, it means that the path is normal. If the path status is DOWN,
delete the faulty path and then add a path. Then use the LOP:VCL command to
query the path status. If the status is still DOWN, swap or reset the interface
board. If the problem still persists, it indicates that the intermediate transmission
device or the CN device is faulty. You need to notify the customer to troubleshoot
the transmission devices.
 If the peer device does not support the LB function or the RNC version is V29, run
the DSP AALVCCPFM command to query packet sending and receiving of
AAL2Path. If path sends and receives packets, it means that the path status is
normal. If the path can only receive packets and cannot send packets, it indicates
that the local device is faulty. Delete the faulty path and then add a path to restore
the service. If the problem still persists, swap or reset the IU interface board, and
then observe packet sending and receiving of the path. If the path can still send
packets but cannot receive packets, it means that the intermediate transmission
device or the CN device is faulty. You need to notify the customer to troubleshoot
the transmission devices.
 After having detected the faulty AAL2Path in the above two ways, you can block
AAL2Path to solve the problem temporarily. If no faulty path is detected, you can
use the following methods:
 Block AAL2Path(BLK AAL2PATH) of all IU interfaces, and then unblock
AAL2Path(UBL AAL2PATH) one by one. In the process of unblocking path, check
whether the CS service is normal. If yes, unblock the next AAL2Path. Each time IU
interface can keep only one activated AAL2Path. If the CS service is abnormal, it
means that this AAL2Path is faulty. Bock it to mitigate the problem temporarily.
 If all the AAL2Paths cannot be blocked at the field, you can block faulty AAL2Paths
by querying idle CIDs. Run the DSP AAL2PATH command to query CID and
bandwidth use. For an abnormal AAL2 PATH, the number of idle CIDs is almost
248.
 If the IUPS user plane is faulty, you can run the PING IP command to check
whether the IP address of the SGSN interface board (it supports the ping function)
and that of the SGSN GTPU are normal. If pinging the IP address of the peer
interface board times out, it means that the IPOA link is faulty. Swap or reset the
IU interface board to troubleshoot the RNC. If the problem still persists, notify the
customer to troubleshoot the intermediate transmission device or the SGSN. If the
pinging the GTPU address of the SGSN times out, you need to notify the customer
to troubleshoot the SGSN.
IP networking:

2018-2-18 Huawei Confidential Page 16 of 34


Solutions to RNC Emergencies INTERNAL

 Enable the ping test of IPPATH (MOD IPPATH:PATHCHK=ENABLED) and check


whether the destination IP address is consistent with the PEERIPADDR(ADD
IPPATH) parameter. Check whether alarm 1711 (reporting path failure) is present.
If yes, delete the faulty IPPATH and then add IPPATH, and then check whether the
alarm is cleared. If the alarm is still present, swap or reset the IU interface board. If
the problem still persists, it indicates that the intermediate transmission device or
CN device is faulty. You need to notify the customer to troubleshoot the
transmission devices.
 Use the PING IP command, with the source IP address of IPADDR(ADD IPPATH)
and the destination IP address of PEERIPADDR(ADD IPPATH). If the ping
operation times out, delete the faulty IPPATH and add IPPATH. Use the ping
command to check the result. If the ping operation still times out, swap or reset the
IU interface board. If the problem still persists, it indicates that the intermediate
transmission device or CN device is faulty. You need to notify the customer to
troubleshoot the transmission devices.
 If the ping detection is enabled, multiple path faults are present at the IU interface.
The service cannot be affected. If all paths at the IU interface are faulty, you need
to notify the customer to troubleshoot the transmission devices. If the ping
detection is disabled, you find multiple faulty paths. In this case, you need to
delete faulty paths to mitigate the problem temporarily.

Handling measures:
(16) BLK the paths whose alarms are present, and then check whether the service is
restored.
(17) Swap the IU interface board, and then check whether the service is restored.
(18) If the problem still persists, reset the IU interface board.

3.3.5 IU Link Congestion Alarms

If the following alarms are present and the alarm time is consistent with the failure
time, take the following measures to restore the service.

Alarm ID Alarm Name Alarm ID Alarm Name

MTP-3b signaling link


ALM-1401 ALM-1713/1714 Port forward congested
congested

ALM-1402 MTP-3b DSP congested ALM-1714/1715 Port backward congested

ALM-1805 SAAL link congested ALM-1852 SCTP link congested

ALM-1711/1712 Path forward congested ALM-1862 M3UA link congested

2018-2-18 Huawei Confidential Page 17 of 34


Solutions to RNC Emergencies INTERNAL

ALM-1712/1713 Path backward congested  

If a lot of above alarms are present, it indicates that there are too heavy traffics on the
link.

Sometimes the link is not congested. RNC sends a lot of SCCP-layer CR messages,
but CN returns CREF messages instead of CC messages. If the IU interface cannot
initiate RAB assignment normally, it is recommended to trace the SCCP-layer
messages of the IU interface at the field to check whether CN returns CREF
messages instead of CC messages. If yes, try to deactivate and activate the MTP3B
link set to restore the service.

Handling measures:
(19) View the traffic statistics of the RNC level at the field. If the number of equivalent
Erlang of CS/PS is lower than 70% as usual, you need to block part of the cells
or NodeBs to decrease the traffic amount.
(20) Enable flow control at the IU interface. Flow control of the IU interface is
controlled by switches in the following versions:
- V29B072SP05 and above patch
- V210B061SP02 and above patch
- V110B061SP01 and above patch
The command is shown as follows:
- SET FCSW: BT=SPU, FCSW=ON, PRINTSW=OFF;
If PRINTSW is OFF, it means that IU flow control is enabled. ON means that IU
flow control is disabled, ON by default.
(21) If the alarm SCCP DSP unavailable (ALM-1506) is present, you can deactivate
and activate the link to restore the service.
- DEA MTP3BLKS
- ACT MTP3BLKS
- DEA M3LNK
- ACT M3LNK
(22) If the congestion alarm is present in path, you can decrease the factor of the IU
interface. In addition, you need to add more paths.

2018-2-18 Huawei Confidential Page 18 of 34


Solutions to RNC Emergencies INTERNAL

3.3.6 Node B Alarms

If the following alarms are present and the alarm time is consistent with the failure
time, take the following measures to restore the service.

Alarm ID Alarm Name Alarm ID Alarm Name

ALM-1802 SAAL link unavailable ALM-2006 Cell unavailable

ALM-2010 NCP failure ALM-2008 Public channel failure

ALM-2011 CCP failure ALM-2012 Cell setup failure

ALM-2026 Node B unavailable ALM-2014 Public channel setup failure

If a lot of above alarms are present, it indicates that the IUB interface is faulty.
Through alarms, you can find out the interface board of the corresponding Node B or
the control subsystem where the Node B resides. This type of problems may be
caused by the failure of the interface board, intermediate transmission device, or the
SPU. If many alarms of cell unavailable (ALM-2006) are present, run the DSP CELL
command to check the reason why the cells are unavailable, and then remove faults
based on specific reasons.
Handling measures:
(23) Check whether the alarm of public channel setup failure is present in many cells
and whether a cell terminates an interface board or an SPU board subsystem:
- If it is associated with the public channel, swap or reset the interface board
that the cell terminates.
- In case of other reasons, swap the SPU board.
- If this problem cannot be solved by Step 1 and 2, reset the subrack.
(24) If the IUB interfaces of a large number of faulty Node Bs are led from the same
interface board, swap or reset it.
(25) If a large number of faulty Node Bs are terminated on the same SPU, swap or
reset the active and standby SPU boards.
(26) If the problem still persists, you need to trace the corresponding SAAL link to
check whether any packet is dropped by the intermediate transmission device,
and ask for help from the headquarters.

3.4 Tracing and Analyzing Related Signaling


If you cannot handle emergencies by viewing the alarms, you need to trace signaling
to perform further analysis.

2018-2-18 Huawei Confidential Page 19 of 34


Solutions to RNC Emergencies INTERNAL

3.4.1 Tracing IU Interface Signaling

(27) If you cannot see any signaling tracing at the IU interface, it indicates that all the
CS/PS services of RNC are interrupted. You need to reset the RNC.
(28) If you can only trace initial direct-transfer messages, but cannot trace messages
from CN, trace the SCCP messages at the IU interface.
- If a large number of CR messages sent to CN are traced, but no CC
message is traced, swap the SPU board, and then deactivate/activate the
MTP3B link set.
- If both a large number of CR messages sent to CN and CREF messages
are traced, it indicates that LAC, SAC, or RAC at the CN side is not correctly
configured. Contact the CN personnel to locate the problem.
(29) When tracing the IU interface messages, if RNC receives the ERR IND message
from CN after sending the initial directly-transmitted message, you need to run
the RST IU command. If the problem still persists, reset the IU interface board.
(30) If a large number of messages are traced, judge whether the registration or
attach flow can be accomplished by viewing the direct-transfer message at the
IU interface.
- If the registration or attach flow is rejected by CN, check whether the
LAC/SAC/RAC is configured and activated.
(31) Check whether RAB ASSIGNMENT REQ exists.
- If not, cooperate with CN personnel to locate the problem.
- If yes, view the result in the RAB ASSIGNMENT RSP message.
 If it is successful and RNC does not release the message within five
seconds, it means that the failure has nothing to do with RNC. RNC
judges whether the messages belong to the same user based on the
user ID contained in the messages. If RNC releases the message
immediately, AAL2Path/IPPATH of the IU interface may be not
connected. You can troubleshoot paths by referring to IU Interface User
Plane Alarms.
 If it is unsuccessful because the IU interface transmission resources
failed to be established, the transmission resources of the IU interface
are faulty. If no alarm is given, reset AAL2PATH (RST AAL2PATH:
PATHID=0;). If the service cannot be restored, you can block all
AAL2Paths, and then unblock them one by one. If it is unsuccessful
because the IUB resources or air interfaces fail or are unspecial,
continue tracing the IUB interface or IOS/CDT/IFTS.

2018-2-18 Huawei Confidential Page 20 of 34


Solutions to RNC Emergencies INTERNAL

3.4.2 Tracing IOS/CDT/IFTS Signaling

(32) If you find that the IUB interface is faulty by checking IOS/CDT/IFTS, RL does
not respond, or RL setup fails, it indicates that the SPU board may be faulty. It is
recommended to swap the SPU board. If the service cannot be restored, swap
the IUB interface board. If the problem still persists, reset the faulty subrack.
(33) If you find that the air interface is faulty by checking IOS/CDT/IFTS and RB does
not respond, activate/deactivate the cell first. If the problem still persists, reset all
the FRM/DPU boards in the faulty subrack. If the problem still persists, reset the
entire subrack.

Before resetting FMR/DPU boards, you must determine to reset which ones based on
the subsystem where the faulty cells are located. For different versions, FRM/DPU
boards correspond to different SPU boards.
V1 (V18 and V110):
 Subsystem 0 corresponds to FRM boards in odd slots.
 Subsystem 1 corresponds to FRM boards in even slots.
V2 (V29)
 Only SPM 2 and SPM 4 exist in the RSS subrack. The following figure shows the
corresponding SPU/DPU board slots:

2018-2-18 Huawei Confidential Page 21 of 34


Solutions to RNC Emergencies INTERNAL

 Only SPM 0, SPM 2 and SPM 4 exist in the RBS subrack. The following figure
shows the corresponding SPU/DPU board slots:

V2 (V210)
 To share resources inside RNC, V2 adopts MPU to manage the relations between
SPU subsystems and DPUs. The users of all the SPU subsystems are likely to be
assigned to DSPs of each DPU.

3.5 Collecting and Analyzing Traffic Statistics


If none of the above steps can restore the service (except the problem of intermediate
transmission device and peer device), typically the failure is caused by a faulty
FMR/SPU board.
Using NASTAR or M2000, you can analyze traffic statistics in the failure period to
view traffic volume of each subsystem:
 VS.CSLoad.Erlang.Equiv.SPU
 VS.PSLoad.DLThruput.SPU
You need to find out one or multiple subsystems whose traffic volume has a great
change. If the subrack of a faulty subsystem has never been reset, you need to reset
it. If the traffic volume of all the subsystems has a change and RNC has never been
reset, you need to reset the RNC within ten minutes.
If the problem still persists after the entire RNC is reset, you need to collect effective
logs and contact R&D personnel in the headquarters as soon as possible.

4 Solutions to KPI Deterioration Problems

KPI deterioration problems cover low RRC establishment rate, low RAB
establishment rate, or high call drop ratio, which greatly affects customer satisfaction.

2018-2-18 Huawei Confidential Page 22 of 34


Solutions to RNC Emergencies INTERNAL

The on-site personnel must be able to:


 View traffic statistics by using M2000 or Nastar; be familiar with common traffic
statistics; analyze abnormity in RRC/RAB/CDR.
 View CHR logs by using Insight Plus or OMStar; analyze basic call
establishment flow and call drop flow.
 Query information from the live network by using LMT.

4.1 Judging Whether Faults are Present on the Same


FRM/DPU Board or the Same DSP
View the error codes related to MACD/RLC by analyzing CHR logs, as shown in the
following figure:

View the ID of the DSP where faults occurred, as shown in the following figure:

Handling measures:

2018-2-18 Huawei Confidential Page 23 of 34


Solutions to RNC Emergencies INTERNAL

(34) If the error code about MACC/MACD/RLC is displayed many times and the same
CPU ID is displayed, or the CPU IDs displayed belong to the same FMR board,
reset the DSP or the FMR/DPU board.
(35) If most faults are present on the same DSP or on the same board, reset the DSP
or the FMR/DPU board.
CPU ID can be converted by the following special tools:
- For V1(V17/V18/V110), you can use the following tool to convert CPU/DSP
ID:

CPUid.exe

- For V2(V29/CV210), you can use the following tool to convert CPU/DSP ID:

cpuid_V29.exe

(36) If only one DSP is faulty, reset it. If the problem still persists, disable the DSP.
(37) If only one FMR/DPU board is faulty, reset it. If the problem still persists, disable
the FMR/DPU board.

4.2 Judging Whether Faults are Present in the Same SPU


Subsystem
Analyze traffic statistics or CHR logs and find out TOP N cells. Check whether KPI
deterioration is caused by the problems in them.
Handling measures:
(38) If KPI deterioration is caused by the problems in TOP N cells, query their SPU
subsystems through LMT or based on scripts, judging whether the faults are
present in the same subrack (different SPUs) or in the same SPU subsystem.
- If the faults are present in the same SPU subsystem, swap the SPU board.
- If the faults are present in different SPU subsystems of the same subrack,
swap the SCU board.
- If the faults are present in different subracks, swap the NET board or the
SCU board in the RSS subrack.
(39) If KPI deterioration is not caused by the problems in TOP N cells:
- Reset the entire RNC if the faults greatly affect customer satisfaction.
- Otherwise:

2018-2-18 Huawei Confidential Page 24 of 34


Solutions to RNC Emergencies INTERNAL

 Trace valid CDT (containing L2 statistics and L2 data transmission


reported within 100 seconds, or FP data).
 Trace valid IOS data;
 Collect traffic statistics data and CHR logs.
 Collect the above logs and feed them back to the headquarters.

4.3 Judging Whether Faults are Present on the Same


Interface Board (IUB Interface)
Analyze traffic statistics or CHR logs and find out TOP N cells. Check whether KPI
deterioration is caused by the problems in them.
Handling measures:
(40) If KPI deterioration is caused by the problems in TOP N cells, query the interface
boards where the IUB interfaces of the cells are located through LMT or based
on scripts, judging whether the faults are present in the same interface board.
(41) If yes, probably the interface board or the intermediate transmission device
connected with it drops packets. It is recommended to swap the interface board.

5 Information Collection Checklist

5.1 Emergency
In case of emergency, that is, before the services are restored, the on-site personnel
are required to obtain necessary logs and send them to the headquarters for location.
The size of logs must be as small as possible. The logs must contain faulty points.
For example, the log about CDT/IOS tracing must contain call drop points. CHR logs
or the logs in this document must be those recorded during the fault period. The logs
provided to the headquarters must be filtered by on-site personnel. They must be
short and contain faulty points. In this case, they can be sent to the headquarters
quickly. All logs must be compressed with RAR or ZIP to reduce the file size.
For access emergencies:
 CHR logs recorded during the fault period, especially the subrack number and
specific time. (Required)
 Alarm logs recorded during the fault period, especially specific time. (Required)
 IFTS/CDT/IOS tracing logs (CDT/IFTS must contain the statistics of L2 user
plane and L2 data reported within 100 seconds). The on-site personnel must
check that the logs to be sent contain fault information. (Required)

2018-2-18 Huawei Confidential Page 25 of 34


Solutions to RNC Emergencies INTERNAL

 Traffic statistics data collected from two hours before the fault occurred to the
fault period. (Optional)
 RNC configuration script. (Optional)
 Text logs recorded during the fault period, especially the subrack number and
specific time. (Optional)
For KPI deterioration emergencies:
 Traffic statistics data from two hours before the fault occurred to the fault period.
(Required)
 CHR logs recorded during the fault period, especially the subrack number and
specific time. (Required)
 IFTS/CDT/IOS tracing logs (CDT/IFTS must contain the statistics of L2 user
plane and L2 data reported within 100 seconds). The on-site personnel must
check that the logs to be sent contain fault information. (Required)
 RNC configuration script. (Required)
 Text logs recorded during the fault period, especially the subrack number and
specific time. (Optional)

5.2 After Services are Restored


If the services have been restored, the log size is not limited. Too large logs can be
sent through FTP. The following logs need to be prepared:
 Busy-hour (one to two hours) logs backed up locally one day before the faults
occurred, all the logs during the fault period, part of busy-hour (one to two hours)
logs after the faults are removed. (Required)
 All the text logs backed up locally, which are recorded from the time when the
faults occurred to the time when the services were restored. (Required)
 All the traffic statistics logs (files in the MeasResult directory) backed up locally,
which are recorded from the time when the faults occurred to the time when the
services were restored. (Required)
 CDT/IFTS/IOS tracing conducted during the fault period and other tracing.
CDT/IFTS must contain the statistics of L2 user plane and L2 data reported
within 100 seconds. (Required)
 COL LOG:TP=BAM; saved BAM logs, operation logs, and alarm logs. (Required)

2018-2-18 Huawei Confidential Page 26 of 34


Solutions to RNC Emergencies INTERNAL

6 How to Use Common Tracing Tools and Obtain


Logs

6.1 CDT/IFTS Tracing


(42) RNC CDTs fall into CDT and IFTS. They are used to trace specific IMSI/TMSI
users and those in some cells respectively. Click CDT from the tracing tree. The
following window displays.
- Tracing specific IMSI/TMSI users

- Tracing IFTS of some cells

2018-2-18 Huawei Confidential Page 27 of 34


Solutions to RNC Emergencies INTERNAL

When tracing IFTS, the subsystem must be that corresponding to the cell being
traced. You can run the LST CELL command to obtain the corresponding SPU
subsystem ID. You can fill RRC EST Cause as required. In the CS service case,
select Originating Conversational Call/Terminating Conversational Call; in
the PS service case, select Originating Interactive Call/Originating
Background Call/Terminating Interactive Call/Terminating Background Call.
Traffic Type is optional.
(43) CDT tracing with internal print message.
Find the following directory from the PC where LMT is located:
D:\HW
LMT\adaptor\clientadaptor\RNC\BSC6810V200R010C01B051\style\defaultstyle\l
ocale\en_US\rnctest
The blue part describes the RNC version and the language type. Take the
English version of V210B051 as the example.
Find the RncTestConfig.xml file, and then open it with UE or Notepad. Then find
the following part:
<DESC descname="CDTMSGTYPE">

2018-2-18 Huawei Confidential Page 28 of 34


Solutions to RNC Emergencies INTERNAL

<PARAS>

<PARA name="UI_FAM_UT_STANDARD_MSG" value="1"/>

<PARA name="UI_FAM_UT_INTRA_MSG" value="0"/>

<PARA name="UI_FAM_UT_CTRL_TBL" value="0"/>

<PARA name="UI_FAM_UT_STATE_TRANS" value="0"/>

<PARA name="UI_FAM_UT_PRINT_INFO" value="0"/>

<PARA name="UI_FAM_UT_FUNC_CALL" value="0"/>

<PARA name="UI_FAM_UT_L2_DATA_FWD_MSG" value="0"/>

<PARA name="UI_FAM_UT_L2_TXT_FWD_MSG" value="0"/>

<PARA name="UI_FAM_UT_GTPU_DATA_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_REAL_TIME_INFO" value="0"/>

<PARA name="UI_FAM_UT_FMR_SIG_DT_FWD_MSG" value="0"/>

<PARA name="UI_FAM_UT_FMR_UP_DT_FWD_MSG" value="0"/>

<PARA name="UI_FAM_UT_FMR_INBAND_DT_FWD_MSG" value="0"/>

<PARA name="UI_FAM_UT_RADIO_PERF_INFO" value="1"/>

<PARA name="UI_FAM_UT_CELL_INFO" value="1"/>

<PARA name="UI_FAM_UT_ALPATH_PVC_INFO" value="0"/>

</PARAS>

</DESC>

Replace all 0s with 1s:

<DESC descname="CDTMSGTYPE">

<PARAS>

<PARA name="UI_FAM_UT_STANDARD_MSG" value="1"/>

<PARA name="UI_FAM_UT_INTRA_MSG" value="1"/>

<PARA name="UI_FAM_UT_CTRL_TBL" value="1"/>

<PARA name="UI_FAM_UT_STATE_TRANS" value="1"/>

<PARA name="UI_FAM_UT_PRINT_INFO" value="1"/>

<PARA name="UI_FAM_UT_FUNC_CALL" value="1"/>

<PARA name="UI_FAM_UT_L2_DATA_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_L2_TXT_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_GTPU_DATA_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_REAL_TIME_INFO" value="1"/>

<PARA name="UI_FAM_UT_FMR_SIG_DT_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_FMR_UP_DT_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_FMR_INBAND_DT_FWD_MSG" value="1"/>

<PARA name="UI_FAM_UT_RADIO_PERF_INFO" value="1"/>

2018-2-18 Huawei Confidential Page 29 of 34


Solutions to RNC Emergencies INTERNAL

<PARA name="UI_FAM_UT_CELL_INFO" value="1"/>

<PARA name="UI_FAM_UT_ALPATH_PVC_INFO" value="1"/>

</PARAS>

</DESC>

(44) CDT tracing with L2 statistics


Check the Periodically Data Report checkbox, and select 2s.
Check the AI Collect Period checkbox, and use the default 2.
Type 100 in the L2 Data Report Time box.

For data transmission problems, make sure that the problems reoccur within 100
seconds after tracing begins.

2018-2-18 Huawei Confidential Page 30 of 34


Solutions to RNC Emergencies INTERNAL

6.2 Tracing IOS Signaling


Click IOS from the tracing tree. The following dialog box appears. You can type calls
count as required and 10 is recommended. Type the ID of the faulty cell in the Cell
ID box. You can type IDs of multiple cells. Use default settings of other parameters.

6.3 Tracing IU Interface Signaling


Click IU from the tracing tree. The following dialog box appears. Select a CN node as
required. In the case of a CS service fault, type DPC of MSC. In the case of a PS
service fault, type DPC of SGSN. Destination point code (DPC) should be in the
hexadecimal format. You can query the specific value through LST N7DPC. SCCP
needs to be traced as required. On-site personnel are recommended to trace SCCP.

2018-2-18 Huawei Confidential Page 31 of 34


Solutions to RNC Emergencies INTERNAL

6.4 Obtaining BAM Logs


Usually BAM logs contain BAM logs, operation logs, and alarm logs. You can use the
following command to obtain BAM logs:
COL LOG: TP=BAM;
After the command is used, logs will be saved in the /BAM/FTP/FixInfo_BAM.zip file.
On-site personnel can log in to BAM through FTP to obtain this file.
To feed back operation logs or alarm logs, on-site personnel can download the
FixInfo_BAM.zip file, and then decompress it. To obtain operation logs, on-site
personnel only needs to compress the files in the OperateLog directory and then
sends the zip file to the headquarters. To obtain alarm logs, on-site personnel needs
to all files in the Warn directory and then sends the zip file to the headquarters.

6.5 Obtaining CHR/Text Logs


The methods of obtaining CHR logs in V1 and V2 are different. In the following figure,
CHR and text logs of V1 are obtained. CHR logs are saved in the FamLogFmt
directory, while text logs are saved in the Famlog directory.

2018-2-18 Huawei Confidential Page 32 of 34


Solutions to RNC Emergencies INTERNAL

In the following figure, CHR and text logs of V29 are obtained. CHR logs are saved in
the famlog directory, while text logs are saved in the FamLogFmt directory.

In the following figure, CHR and text logs of V210 are obtained. CHR logs are saved
in the fmt directory, while text logs are saved in the txt directory.

The CHR/text log format of V1 is as follows:


 01Log20071014000034_20071014235834.log.zip
01 indicates the subrack number; 20071014000034_20071014235834 indicates
the start time and end time, in the format of year-month-day-hour-minute-second.
The CHR/text log format of V2 is as follows:
 RNC0000_00Log20081117174447_20081117180123.log.zip
RNC0000 indicates the RNC ID; 00 indicates the subrack number;
20081117174447_20081117180123 indicates the start time and end time, in the
format of year-month-day-hour-minute-second.

2018-2-18 Huawei Confidential Page 33 of 34


Solutions to RNC Emergencies INTERNAL

To feed back text logs and CHR logs quickly, on-site personnel needs to know the
specific time when faults occurred and the number of the faulty subrack, based on
which they can obtain valid logs. The size of log files should be as small as possible
to shorten the time spent on file delivery.

6.6 Obtaining Performance Files


The method of obtaining performance files of V1 and V2 is shown as follows:

The data format is as follows:


A20081231.0930+0800-1000+0800_EMS-NORMAL.mrf.bz2
20081231 indicates year, month, and day. 0930+0800-1000+0800 indicates the
record time, from 9:30 to 08:00. On-site personnel can feed back valid data based on
this time.

2018-2-18 Huawei Confidential Page 34 of 34

Anda mungkin juga menyukai