OTF702301 OptiX RTN 980L Troubleshooting ISSUE 1.00

OptiX RTN 980L Troubleshooting P-0
 Contents
 Methods of Analyzing and Locating Faults....................................................Page3
 Classified Troubleshooting Analysis.............................................................Page18
Confidential Information of Huawei. No Spreading Without Permission

 The course describes the general troubleshooting procedure and the methods of rectifying
the common faults.
 Reference:
 RTN 900 Maintenance Guide


 Observe and record fault phenomenon:

 When recording the fault phenomena, make a true and detailed record of the
entire process of the fault. Record the exact time when the fault occurs and the
operations performed before and after the fault occurs. Save the alarms,
performance events, and other important information.
 Exclude the external causes:
 Faults owing to external factors, including the power supply, cables, environment,
and terminal equipment (such as switch devices) etc.
 Make experience-based judgment and theory-based analysis:
 According to the information on the fault phenomena and other fault-related
information, analyze the probable causes based on the experience and related
theories.
 Rectify faults:
 According to the probable causes, make a plan to confirm each probable cause,
find out the most likely cause, and rectify the fault.
 Check whether the fault is rectified:
 After confirming a cause, analyze the result to check whether the fault is rectified
and whether any new fault occurs.
 Contact Huawei engineers for co-work :
 If you fail to rectify the fault, contact Huawei technical support engineers and co-
work with them to find a solution. If remote maintenance is required, help Huawei
engineers for remote access.
 Write the fault handling report:
 After rectifying a fault, record the work done for handling the fault in a timely
manner. When summarizing the working experience, provide reference
information for handling similar faults.

 The general principles for fault locating can be summarized as "external first, then internal;
station first, then board; high-severity alarms first, then low-severity alarms.“ The principles
can not be used separately, three principles should cooperate with each other.
 External first, then internal
 During fault localization, firstly confirming that external conditions are normal, for
example, line optical fiber is correct or there is no power failure or switching
equipment fault, and so on.
 Station first, then board
 The most causes of faults are board’s failure in the subrack, so finding the affected
NE firstly, then locate the failure to the certain board.
 High-severity alarms first, then low-severity alarms
 High-severity alarms should be analyzed firstly, for example, critical alarms and
major alarms. Then go further for low-severity alarms, such as, minor alarms and
warnings.

 The most popular methods of locating hardware faults can be summarized as "Analyze
first, then loopback, and finally replace the board."
 That is, when fault occurs, first determine the possible faulty points by analyzing the alarm
events, performance data and signal flow. Then locate the fault to a particular NE by
looping back station by station. Finally, clear the fault by replacing faulty board.

 Besides the alarms, in the RTN 980L system, to query the transmit and receiving power are
also important and useful.
 The advantages and disadvantages of fault locating by querying fault information through
NM are as follows:
 Comprehensive: it is able to obtain the fault information network wide.
 Accurate: it is able to obtain the current alarms and the alarm generation time as
well as history alarms. It is also able to obtain the specific values of the
performance events.
 If there are too many alarms and performance events, it is difficult to find the clue
of analysis.
 It all depends on the normal operation of the computer, software, and
communication equipment. If one of the three is faulty, it reduces or even loses the
fault information query capability of the approach.
 On the OptiX RTN 980L, there are running and alarm indicators in different colors that
reflect the current running status of the equipment or the severities of existing alarms.

 The HARD_BAD is an alarm indicating hardware errors. The board that reports the alarm
fails to work. If the board is configured with the 1+1 protection, the protection switching
may be triggered.
 The NESF_LOST is an alarm indicating that the NE software is lost. This alarm is reported
when the system control, cross-connect, and timing board detects that the NE software is
lost.
 The NO_BD_SOFT is an alarm indicating that the board software is lost. If the board
software is lost, the board fails to work normally.
 The FAN_FAIL is an alarm indicating that the fan is faulty. When the FAN_FAIL alarm
occurs, the heat dissipation of the system is affected.
 The POWER_ALM is an alarm indicating that the power module is abnormal. If the alarm is
reported by a board of the IDU, the possible causes are as follows: Cause 1: The input
power or the PIU is abnormal. Cause 2: The power module is abnormal.If the alarm is
reported on the RFU/ODU, the cause is as follows: Cause 1: The power module of the
RFU/ODU is faulty.
 The BD_STATUS is an alarm indicating that the board is not in position. When the
BD_STATUS alarm occurs, the board that reports the alarm fails to work.

 The MW_LOF is an alarm indicating that the radio frame is lost. The services are
interrupted by MW_LOF. If the system is configured with protection, protection switching
may be triggered.
 The MW_CFG_MISMATCH is an alarm of configuration mismatch on radio links. This alarm
occurs when an NE detects configuration mismatch on both ends of a radio link. For
example, the number of E1 signals, the number of STM-1 signals, AM enabling, 1588
overhead enabling, modulation mode may be configured differently on both ends of a
radio link.
 The CONFIG_NOSUPPORT is an alarm indicating that the configuration is not supported.
This alarm is reported if the ODU detects that the specified parameters do not meet the
requirements of the ODU.
 The RADIO_RSL_LOW is an alarm indicating that the radio receive power that comes from
opposite side is very low. This alarm is reported if the detected receive power is equal to or
lower than the lower threshold of the ODU (-90 dBm).
 The RADIO_RSL_HIGH is an alarm indicating that the radio receive power that comes from
opposite side is very high. This alarm is reported if the detected receive power is equal to
or higher than the upper threshold of the ODU (-20 dBm). The service transmission is
affected. If the system is configured with 1+1 protection, protection switching may be
triggered.
 The RADIO_MUTE is an alarm indicating that radio transmitter is mute. The transmitter of
ODU does not transmit services.

 The IF_CABLE_OPEN is an alarm indicating that the IF cable is open. When the
IF_CABLE_OPEN alarm occurs, the service on the IF port that reports the alarm is
interrupted.
 The MW_LIM is an alarm indicating that a mismatched radio link identifier is detected. This
alarm is reported if an IF board detects that the link ID in the radio frame overheads is
inconsistent with the specified link ID.
 The MW_RDI is an alarm indicating that there are defects at the remote end of the radio
link.This alarm is reported when the IF board detects an RDI in the radio frame overheads.
 The RPS_INDI is an alarm indicating that the radio protection switching is detected.
 The LOOP_ALM is an alarm indicating that a loop occurs. When the LOOP_ALM alarm
occurs, the looped port or path cannot carry services.
 The TEMP_ALARM alarm indicates that the board temperature crosses the threshold.

 In the case of SDH boards, the R_LOS is an alarm indicating that the signals on the receive
line side are lost. In the case of IF boards, the R_LOS is an alarm indicating that the radio
frames on the receive line side are lost. The services are interrupted. If the system is
configured with protection, the protection switching may be triggered.
 The ETH_LOS is an alarm of the loss of Ethernet port connection. When the ETH_LOS
alarm occurs, the service at the port that reports the alarm is interrupted.
 The T_ALOS is an alarm indicating that the 2 Mbit/s analog signal is lost at the specific port.
The 2Mbit/s services can not be accessed by RTN 900.
 The TU_AIS is an alarm indicating that the TU path has interruption. This alarm is reported
if a board detects that the TU pointer is all 1s.

 Analysis
 Totally 3 alarms in the system. The high severity alarm among them is “MW-LOF”
in NE1, it means the receiving radio signals loss of the frame, just like receive no
signal. And the alarm “MW-RDI” is caused by the previous alarm obviously. Finally,
the “RPS-INDI” indicated that the 1+1 protection switch in the microwave link or
equipment is taken place, for there were no other alarms on the service, most
probably after the automatic protection switch, the services were ok.
 Based on the above analysis, the key of the faulty is the reason which caused the
“MW-LOF” alarm in NE1. By the alarm definition, we can list out the possible
reasons below:
 The microwave propagation route from NE2 to NE1 encountered abnormal

fading and cause the receiving radio power of NE1 is too low. It can be
confirmed by query the receiving radio power on the ODU of NE1 via NMS.
 The IF cable or the IF board faulty in NE1, this doubt can be checked the
loopback operations which will be introduced afterwards.
 The transit part of the IF or ODU on NE2 have problems.

 Users can loopback the system by either software or hardware.

 Compared with software loopback, hardware loopback is more reliable. However,
hardware loopback always needs on-site operation. In addition, the overload of the
receiving optical power should be considered during the operation.
 Software loopback is easier but less reliable than hardware loopback. For example, during
single station testing, the normal running of an optical board cannot be determined
through software loopback. The board is tested by hardware loopback.

 Replace components to locate and rectify faults.

 The replacement method is widely applied for locating external faults in the fiber,
cable or power supply device, or faults in boards.
 The replacement method is practical and simple. In the case of replacement, the
spare component should be intact. Adhere to the rules for replacing components.
Otherwise, the components may be damaged, or other problems accompany the
damage.
 Note:
 When the replacement function is used for locating faults, the original data about
the fault cause may be lost. To avoid impact on the analysis of the fault, collect
data about the fault before replacing the component.

 This method is usually applied to clear the external problems or to locate the
interconnectivity problems.
 If the power supply is doubted abnormal, use a multimeter to measure the input voltages.
If you suspect that the poor interconnectivity between the microwave equipment and
other equipment is due to the grounding, use a multimeter to measure the voltage
between the shielding layer of coaxial ports of the transmitter and receiver of the
interconnection path. If the voltage value exceeds 0.5 V, there must be some problem
with the grounding. If you doubt that the poor interconnectivity is due to the incorrect
signal, you can use appropriate analyzers to observe whether the frame signals are normal,
whether the overhead bytes are normal, and whether there is any alarms.
 This method provides highly accurate results. However, this method rather depends on
meters and professional knowledge.

 Sometimes a running board enters abnormal state because of transient power supply
behavior, low voltage or strong external electromagnetic interference, and so on. Service
interruption and inband DCN communication interruption, might be or might not be
accompanied with corresponding alarms. The configuration data might also be correct. In
this case, the fault can be cleared and the normal service can be resumed in time by
resetting board, restarting the station, re-sending the configuration or switching the
service to the standby path.
 The main disadvantage of this method is uncertainty, because the problem is not fully
known and there is probability that the alarm persists after board or even power reset.
This method is not recommended.
 Note:
 Normally, the warm reset of boards does not affect the running services. The cold
reset affects the running services.
 The cold reset takes a longer time than the warm reset. After the reset, data of
boards is not lost.

 Based on the preceding purposes, the RMON defines a serial of statistic formats and
functions to realize the data exchange between the control stations and detection stations
that complies with the RMON standards. To meet the requirements of different networks,
the RMON provides flexible detection modes and control mechanism. What's more, the
RMON provides error diagnosis, planning and information receiving of the performance
events of the entire network. The RMON complies with the standards, such as the RFC
1757 and RFC 2819.



 If the transmit power is abnormal. The first case is that the transmit power exceeds the
range that the ODU supports. The second case is that the difference between the transmit
power and the set value is more than 2 dB when the ATPC is disabled. The relevant alarms
and performance events are as follows:
 RADIO_TSL_HIGH
 RADIO_TSL_LOW
 TSL_CUR
 TSL_MAX , TSL_MIN
 In the following two cases, the RSL is abnormal. The one case is that the receive power is
lower than the ideal value (Ideal value = Planned value - 3 dB). The second case is that the
receive power is lower than the receiver sensitivity or higher than the free space receive
power due to fading. The relevant alarms and performance events are as follows:
 RADIO_RSL_HIGH
 RADIO_RSL_LOW
 RSL_CUR
 RSL_MAX, RSL_MIN
 In the case of the radio link whose AM function is enabled, the receiver sensitivity
is the specific receiver sensitivity at the guaranteed capacity.
 Generally, external interference is classified into co-channel interference and adjacent
channel interference.
 Co-channel interference refers to crosstalk from two different radio transmitters
that use the same frequency channel. Hence, the entire spectrum may be affected.
 Adjacent channel interference refers to signal impairment to one frequency, due to
presence of another signal on a nearby frequency. Hence, a part of the spectrum is
affected.
 Interference is closely related to the frequency. Hence, the radio link may be faulty
in one direction if interference exists on the radio link.



 Experience and Summary

 During the commissioning, ensure that the antenna is aligned properly, to prevent
possible incipient faults.
 Periodically collect and analyze the data about the changes in the transmit power
and receive power so that you can detect and then rectify the incipient faults
accordingly in time.

 The IF bit errors refer to the bit errors that the Hybrid IF board detects through the self-
defined overhead byte in the microwave frame. The related alarms and performance
events are as follows:
 MW_BER_EXC,MW_BER_SD,IFBBE,IFES,IFSES,IFCSES,IFUAS
 The RS bit errors refer to the bit errors that the line processing unit or the IF board that
works in SDH mode through the B1 overhead byte in the RS overhead. The related alarms
and performance events are as follows:
 B1_EXC,B1_SD,RS_CROSSTR,RSBBE,RSES,RSSES,RSCSES,RSUAS
 The IF board that works in PDH mode may also detect the previous RS bit error
alarms and performance events. In this case, the IF board detects bit error alarms
and performance events in the PDH microwave frame through the self-defined B1
byte.
 The MS bit errors refer to the bit errors that the line board detects through the B2 byte in
the MS overhead. The related alarms and performance events are as follows:
 B2_EXC,B2_SD,MS_CROSSTR,MSBBE,MSES,MSSES,MSCSES,MSUAS

 The HP bit errors refer to the bit errors that the line processing unit or the IF board that
works in SDH mode through the B3 byte in the HP overhead. The related alarms and
performance events are as follows:
 B3_EXC,
 B3_SD,
 HP_CROSSTR,
 HPBBE,HPES,
 HPSES,
 HPCSES,
 HPUAS
 The LP bit errors refer to the bit errors that the tributary board or Hybrid IF board detects
through the V5 byte in the VC-12 overhead. The related alarms and performance events
are as follows:
 BIP_EXC,
 BIP_SD,
 LP_CROSSTR,
 LPBBE,
 LPES,
 LPSES,
 LPCSES,
 LPUAS


 The VC-12 numbering method of the OptiX equipment is different from the numbering
method of the equipment of certain vendors. The OptiX equipment applies the timeslot
numbering method. The numbering formula is:
 VC-12 number = TUG-3 number + (TUG-2 number - 1) x 3 + (TU-12 number - 1) x
21.This method is also called as the method of numbering by order
 Certain equipment applies the line numbering method. The numbering formula is:
VC-12 number = (TUG-3 number - 1) x 21 + (TUG-2 number - 1) x 3 + TU-12
number. This method is also called as the interleaved method
 The overhead bytes(J0,J1,C2,J2,V5) at both ends are inconsistent, pay special
attention to the following alarms:
 J0_MM,HP_TIM,LP_TIM,HP_SLM,LP_SLM
 The indexes of the SDH interfaces do not meet the requirements, common indexes
of the optical interfaces are as follows:
 Mean launched optical power, receiver sensitivity, overload optical power,

permitted frequency deviation of the input interface
 Fault Locating Methods
 Analyze the fault phenomena and alarms that are generated on the equipment.
Check the possible fault causes one after another.
 Experience and Summary
 To rectify an interconnection fault, you must be familiar with the characteristics of
the interfaces on the interconnected equipment
 According to the fault causes, the operator can perform checking operation as follow:
 Check the impedance of the E1 path. Ensure that the impedance of the E1 path is
consistent with the cable type.
 Check whether all the equipment and the DDF in the equipment room are jointly
grounded.
 Check whether the shielding layers of the coaxial cable connectors on the DDF are
connected to the protection ground.
 Check whether the shielding layers of coaxial cables are grounded in the same
manner.
 Check whether the wires of the cable are correctly connected.
 Check whether the cable is broken or pressed.
 Check whether the cable signal is interfered (for example, when the trunk cable is
bound with the power cable, the cable signal is interfered by the power signal).
 Checking the cables involves checking the cables from the DDF to the client side
and checking the cables from the DDF to the transmission equipment side.
 Check the following indexes:
 Input jitter tolerance
 Permitted frequency deviation of the input interface
 Output jitter and Output frequency deviation

 The Ethernet service interruption indicates that the Ethernet service is completely
interrupted.
 The Ethernet service deterioration indicates that the Ethernet service is abnormal. For
example, the network access speed is low, the equipment delay is long, the packet loss
occurs, or incorrect packets exist in the received or transmitted data.


 Check whether a loopback is set for the Ethernet port or the transmission line.
 Check whether the parameter settings of the Ethernet port, such as the port
enabled state, working mode, and flow control, are the same as the parameter
settings of the Ethernet port on the interconnected equipment
 check whether the Ethernet protocol and the Ethernet service configurations
(especially the attributes of the Ethernet port) are correct.
 Pay special attention to the following equipment alarms:
 POWER_ALM,FAN_FAIL,HARD_BAD,BD_STATUS,NESF_LOST,TEMP_ALAR
M,RADIO_RSL_HIGH,RADIO_RSL_LOW,RADIO_TSL_HIGH,RADIO_TSL_LOW
,IF_INPWR_AB, AM_DOWNSHIFT
 Pay special attention to the following line alarms:
 MW_LIM,MW_LOF,MW_BER_EXC,MW_BER_SD,MW_RDI,
MW_FEC_UNCOR
 Check the RMON performance events and alarms.

 Fault Causes:
 Incorrect operations are performed.
 The transmission link is looped back.
 Service configuration data is inconsistent between the local end and the
opposite end.
 Service configuration is incorrect.

 The local NE is faulty.
 The transmission link is faulty or has bit errors.
 Service bandwidth decreases due to an AM downshift.
 The opposite NE is faulty.
 External electromagnetic interference is severe.


 Fault Causes:
 Incorrect operations are performed.
 The transmission link is looped back.
 Service configuration data is inconsistent between the local end and the
opposite end.
 Service configuration is incorrect.

 The local NE is faulty.
 The transmission link is faulty or has bit errors.
 Service bandwidth decreases due to an AM downshift.
 The opposite NE is faulty.
 External electromagnetic interference is severe.


 Fault Locating Methods:

 Check whether the data is modified, whether the line is looped back, and whether
any boards are replaced.
 Check whether the PW works properly by using the PW ping function. If the PW is
faulty, check whether the MPLS tunnel works properly by using the LSP ping
function. If the MPLS tunnel works properly, check whether the PW has the same
configuration at both ends. If the configuration is the same, replace the board on
the NNI side.
 If the PW works properly, check whether the PE data configured at both ends is
the same. If the PE data is different, change the PE data to the same.
 Check whether UNI-side data and CE-side data are consistent.
 Analyze the RMON performance events of CES services.
 Check whether there is impedance mismatch on channels and whether any
electrical cables are connected incorrectly.
 Replace Smart E1 processing boards.



 Fault Locating Methods:

 Check whether the data is modified, whether the link is looped back, and whether
any boards are replaced.
 Check whether the PW works properly by using the PW ping function. If the PW is
faulty, check whether the MPLS tunnel works properly by using the LSP ping
function. If the MPLS tunnel works properly, check whether the PW has the same
configuration at both ends. If the configuration is the same, replace the board on
the NNI side.
 If the PW works properly, check whether the PE data configured at both ends is
the same. If the PE data is different, change the PE data to the same.
 Check whether UNI-side data and CE-side data are consistent.
 Analyze the RMON performance events of Ethernet services.
 Check whether there is impedance mismatch on channels and whether any
electrical cables are connected incorrectly.
 Replace Ethernet interface boards.



 Check whether the ring current switch "RING" on the phone set is set to"ON".
 Check whether the dialing mode switch is set to "T", namely, the dual tonemulti-
frequency mode.An orderwire phone set should be in on-hook state when it is not
incommunication, and the upper-right red indicator in the front view of the
orderwire phone set should be off. If the red indicator is on, it indicates that the
phone set is in off-hook state. Press the "TALK" button in the front of phone set to
hook it up. In certain occasions, the maintenance personnel press the "TALK"
button is pressed by mistake. As a result, the phone set stay in off-hook state all
the time and the orderwire call from the other NEs cannot get through.
 Check whether all orderwire phone numbers on a subnet are of the same length.
 Check whether all orderwire phone numbers on a subnet are unique.
 Check whether the overhead bytes of all the NEs on a subnet are the same.
 Check whether the orderwire port is set correctly



OTF702301 OptiX RTN 980L Troubleshooting ISSUE 1.00

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

OTF702301 OptiX RTN 980L Troubleshooting ISSUE 1.00

Diunggah oleh

Hak Cipta:

Format Tersedia

OptiX RTN 980L Troubleshooting P-0

Confidential Information of Huawei. No Spreading Without Permission

 RTN 900 Maintenance Guide

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 Observe and record fault phenomenon:

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 The microwave propagation route from NE2 to NE1 encountered abnormal

 The transit part of the IF or ODU on NE2 have problems.

Confidential Information of Huawei. No Spreading Without Permission

 Users can loopback the system by either software or hardware.

Confidential Information of Huawei. No Spreading Without Permission

 Replace components to locate and rectify faults.

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 Experience and Summary

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 Mean launched optical power, receiver sensitivity, overload optical power,

 Input jitter tolerance

 Permitted frequency deviation of the input interface

 Output jitter and Output frequency deviation

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 The transmission link is looped back.

 Service configuration is incorrect.

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 The transmission link is looped back.

 Service configuration is incorrect.

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 Fault Locating Methods:

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

 Fault Locating Methods:

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Confidential Information of Huawei. No Spreading Without Permission

Anda mungkin juga menyukai