Anda di halaman 1dari 21

AXE 810- APG40

IN BSS
AXE810&APG 40: MAIN CHARACTERISTICS

RBS200 BSC AXE810 MSC


Abis
A
RBS2000 4
Abis

SS7
BYB 501 BSC-TRC 2 SS7
BSC-MSC 4 channels
Ater
A

First AXE810 configured with:


•APZ 212 30 (3,5 more capacity than APZ21220)
•IO module is APG 40 instead of IOG20C. OSS supervision was planned initially with IP instead X25
•64 ETCs (Abis, Ater, Gb)
•16 RPG3 (TRH, STC, C7)
•4 RPP and 1 GSS (SRS) 16K

1
Y&E
BSCs AXE810 IN VF SPAIN NETWORK

•Besides the Core Department decision of rejecting APG40 in


HLR, BSS Department had no other alternative.
•BSS tested it during Nov’03.

•During next months, we are going to deploy this new Ericsson


product:
• 9 AXE810 before april’04.
• More than 15 AXE810 from apr’04 to mar’05.

• There is no chance of using BYB501, BYB202 or a new


competitive product besides AXE810….
• …We expect a continuous improvement of weak points!.

2
Y&E
AXE810 TEST PERIOD

•TEST PERIOD:

• First phase: Test labs (3 weeks)


• Second phase: Real network (3 weeks more).

• We were testing APG40 during one week with an APG40 Ericsson


expert.
•We found some of the HLR problems in BSS, related with insecurity
and inestability, so we asked for an Ericsson answer.

• Third phase:

• We worked with Ericsson to solve the main problems before


Chrismas period.
•We still continue studying AXE behaviour.

3
Y&E
MAIN PROBLEMS FOUND

•PENDIENTE ACTUALIZAR ALGUNOS PROBLEMAS


• (no es la tabla definitiva) 4
Y&E
MAIN PROBLEMS FOUND

• Key points to consider:

1. Telnet and FTP were used for remote admin access and software
upgade (plus TFTP) to the APG40 plus the PC-Anywhere.

2. Telnet and FTP sessions, including user-id and passwords are


transmitted unencrypted. It is therefore possible for a malicious user to
"sniff" the network and obtain access to critical systems and resources.

3. SSH is a secure method to remote logon (all data transference is


encrypted)

• The Telnet and FTP services (between the OSS TMOS and the
APG40) were stopped, desinstalled and substituted by Secure
SHell (SSH) using the F-Secure software pack.

• Now any type of telnet or ftp communications is forbidden and rejected.

5
Y&E
MAIN PROBLEMS FOUND: SECURITY

• Besides Telnet and FTP Server funcionality in the APG40 can be


configured to accept or deny address or address ranges, for
further security a Firewall have to be used (Operating system:
Windows NT4).

• Ericsson would not provide a SW firewall until R12!

• We decided to install HW firewall associated with each AXE810,


to guarantee security until a software IDS or FW will be included in
the APG40.

6
Y&E
MAIN PROBLEMS FOUND:FIREWALL HW

•We blocked all ports of all sources, allowing only O&M IP


addresses.
•With the firewall HW,all the traffic from the “unsecured zone”
should be go through this firewall to the “secure zone”
SWITCH CORPORATIVE
NETWORK
CORPORATIVE
NETWORK

VLAN BSC
Insecured side
Secured side
Ethernet Ethernet
Node A Node B
APG40 APG40
FW

BSC AXE810

7
Y&E
WHY ARE WE SO CONSIOUS ABOUT SECURITY?

•Related with firewall solution:

•To avoid Distributed Denial of Service (DDoS) Attacks.


•To avoid IP Spoofing: filter spoofed IP packets, an attacker can
forge any field in the IP headed.

•Related with the sofware upgrade (antivirus, PCAnywhere):

•To avoid Trojan Horse Attacks.You absolutely MUST make sure


you have the very latest update files for your programs, or else
they will miss the latest trojans.

• In BSS, you loose control about each BSC... In Core


Network...how about loosing control in our billing system?

8
Y&E
QUESTIONS

1) How could we wait to BSS R11 or R12 to upgrade the


Antivirus?

• Must we manage to guarantee our network security due to the


new products?
• What “time upgrade” gives Ericsson to a virus attacks? How do
you plan to upgrade the Service Pack?

2) Why should we wait to a Firewall SW solution (R12) ?

• Does not that suggest certain product inestability?

9
Y&E
QUESTIONS

•What about UNIX platform for such a product?

• How could an Operator deal with a DoS attack in HLR that blocks
billing?
•Why Ericsson doesn’t use a propietary system (as in IOG20) or a
MIPS RICS microprocessors an Non Stop UX/os (as in APG30)?

10
Y&E
AXE 810- APG40
IN HLR

11
Y&E
 WHY APG40 IN HLR ?

 Limited space in HD and OD with IOG20:


– Backups very large in HLR. With IOG20, not enough space to
store 3 backups in HD (Ericsson recommendation)
– One backup cannot be stored in one OD without compression.

 Time of backup and reloading from OD too high.


– Before dumping a backup to OD it is necessary to compress the
data.
– Before reloading the CP with a backup from OD, we must
uncompress the data.

 Problems in TCP/IP connections with Ethernet board.

 New Products AXE810 coming with APG-40

12
Y&E
 APG PROGRAM IN VF Sp
 Usual Dates:
• Validation Test : 03/02/03 – 21/02/03 (3 weeks)
• Acceptance Test : 27/02/03 – 12/03/03 (2 weeks)

 Real Dates in APG:


• Validation Test: 03/02/04 – 28/03/03 (8 weeks)
Interruptions: 14/02/03 – 20/02/03 Upgrade to AC-A7 in test-room.
26/02/03 – 12/03/03 E\\ works to restore APG
17/03/03 – 18/03/03 E\\ works to restore APG
• Acceptance Test: 01/04/03 – 21/07/03 (16 weeks)
Interruption: 23/06/03 – 27/06/03 Upgrade to AC-A10 in real HLRs
• Critical Problems before the date of service: 21/07/03 – 22/09/03
.
 Finnal decission (02/10 /03 ): NOT IMPLEMENT DE APG IN HLRs because VF
doesn’t consider it mature to offer commercial service .

13
Y&E
 PROBLEMS SUMMARY

 VALIDATION TEST:
16 TROUBLE REPORT:
3 with critical service impact
1 with OAM critical impact
1 with security impact

ACEPTANCE TEST.
52 TROUBLE REPORT:
17 with critical service impact
10 with OAM critical impact
1 with security impact
3 with statistic impact

14
Y&E
 MAIN PROBLEMS I
 Critical problems in APG40 without solution.
– APG out of service during more than 24 hours. Unknown reason. Ericsson’s
solution was to initialize APG.
– Data corruption in one shared disk of APG. Unkown reason. Ericsson’s solution
was to initialize APG.
o Ericsson answer: APG faulty in test environment

 Instability Problems.
– APG in state “unkown”- It was required a manual intervention. The APG got this
state after a test or suddenly without reason.
– Communication lost between APG and CP with unkown reason.
– Instability in CP (error interrupt) due to a problem in APG.
– Lost of service in one of the sides after APG switch on/off
o Ericsson’s solution were patches, manual restarts or replace the APG
 Statistics / Maintenance Problems
– Common Lost of connection with TMOS
– Incorrect hour in statistics data received in TMOS
– TMOS doesn't receive the statistics form HLR keeping its connection.

15
Y&E
 MAIN PROBLEMS II
 Documentation Problems:
– Recovery procedures for critical situations within APG40 availabe in ALEX were
faulty or incomplete.
– Many alarms regarding APG40 were not found in ALEX.
– Most of OPIs, referred by alarms in APG, end with phrase “Contact next level of
support” and no other solution was given.

 Problems with Backup


– Faul Code 6 at execution of command “SYBUP”. Solved by CNI 10922-APZ 212 20/5-965.
– “Command log” execution after reload fails.
 Miscelaneous:
– Wrong distribution in the shared disks. Small size in L partition without capacity
for 3 backups (ericsson recomendation)
– Load patches for APG sw is not automatic as in the IOG is.
– Very long time for the load of paches

16
Y&E
 CONCERNs:

 Today:
The procedure to update Windows NT(v4.0) with urgent packages is not
clearly defined by Ericsson.
The main sw of the ant-virus will not be updated until R11.
It is not possible choose the kind of anti-virus in the APG.
Ntbackup without response and activity check.

 Future :
 APG in MSCs with billing blocks open to DoS attacks.
 Product based in Windows means instability .
 Until R12 the operator needs a hw firewall per APG.

17
Y&E
 NEXT STEPs

 Resume after 6 moths the validation test for the APG in HLRs

Try to put in service in the second quarter of this year.

 Make the test for APG in MSCs.

18
Y&E
Backup...........
Problem description Impact Comments
There is no switch-over (active-passive) if we turn-off the Active node. The
Passive node changes state to "unkown". There is no control of APG nor Critical Solved with AC-A4 to 7
HLR.
There is no switch-over (active-passive) if there is a fail in the ethernet
cable for external communications in Active node on APG. In this situation
Critical Solved with AC-A4 to 7
there is only communication with Passive node on APG, so we cannot
reach the CP.
Data lost after reload during the execution of the test : Command log
Critical Command log file was bad defined.
execution after reload fails due to incorrect definition of Command Log file

While doing test, we got disk corruption. We had to stop testing for almost a week while Ericsson specialists solved
Disk corruption in the test bed. Unkown reason Critical
the problem.
APG out of service during more 24 hours. Unkown reason. Critical While doing test, we found the APG out of service. We had to stop testing again.
OPI – Vodafone Doc. – Type Acceptance Doc.
“AP, System Restore, Initiate” - Rev. H - Rev. M
“AP, System Backup and Verify, Initiate” - Rev. J - Rev. M
“AP, System Data Disk Restore” - Rev. B - Rev. F
Documentati “AP, System Disaster Recovery” - Rev. E - Rev. L
Vodafone APG documentation (ALEX) is not updated.
on “APG40, Node Change” - Rev. K - Rev. T
“Central Processor Store, Size Change” - Rev. D - Rev. E
“Command Log, Activate” - Rev. A - Rev. C
“Command Log, Initiate” - Rev. A - Rev. B

Documentati
INCORRECT OPI “AP, SYSTEM DISASTER RECOVERY” (REV. L) Bad steps: 89, 102, 146, 195.
on
Documentati
INCORRECT OPI “AP, SYSTEM DATA DISK RESTORE” (REV. F) Bad steps: 81, 90
on
Documentati After turn off APG, we get alarm "AP Fault" with "Node is down Cause". We follow OPI and the result is "Contact next
Incomplete OPI “AP FAULT”
on level of support". There is no "power check" step in the OPI.
APG “AM_LOG_EVENTLOG_TYPE” Y CP “AP FAULT. GENERAL Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
ERROR” alarm not specified in Ericsson documentation on
APG “fcc_save_to remove.
EVENTLOG_ERROR_TYPE_INTERNAL_DESCRIPTION_FOR_MAINTEN Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
ANCE_PURPOSES” Y CP “AP FAULT. GENERAL ERROR” alarm not on
specified in Ericsson documentation
“RDT_SERVICE. PROCESS DEATH” alarm not specified in Ericsson Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
documentation on
Problems to verify tape backup execution if you execute it from command Documentati If you want to control the evolution of a backup to tape you can't send it from command line. You must use graphical
line on tool instead.
Documentati
It's necessary to have administrator profile to execute backups
on
In "NETWORK SURVILLANCE" functionality, the parameter Documentati This parameter specifies the time that the APG should wait before switch-over in case of Network connectivity
“ACS_NSF_ROUTERREPONSE” doesn't work properly on problems. It does not work properly.
Documentati After an APG reboot caused by the Network Survillance functionality, we get an "AP Reboot" alarm, wtih cause
“AP REBOOT” alarm with wrong cause
on "Command Initiate". It's not correct.

19
Y&E
Problem Description Impact
Fault code 6 in SYBUP execution. It's impossible to do backups Critical
Data lost after reload during the execution of the test : Command log execution after reload fails. Critical
Communication lost between APG and CP without reason. It's impossible the communication with the HLR in MML mode, only CPT mode. Critical
APG Node B fall down without reason. It's necessary a local reset to recover it. Critical
APG Node A fall down without reason. It's necessary a local reset to recover it. Critical
I-Module: wrong definition of APG files. Some of the files weren't included in the I-Module (Statistics, etc). Critical
Instability in CP due to a problem in APG . Critical
One APG side pass to "Unkown" state, after a restart, without reason. Manual work to recover the APG . Critical
Some inestabilities in APG without reason . Critical
Lost of service in one of APG sides after switch off and switch on both sides of APG . Critical
IPUs (CP) boards with ROJ 212 238/2 R1B version have a design problem. Remove then and change with a new IPU boards. Critical
Wrong Synchronism connections wiring. Critical
Clocks out of range in HLR . Major
TMOS doesn't receive the statistics form HLR Critical
Incorrect hour in statistics data received in TMOS . Critical
Wrong distribution in the disk L size. There aren't enough size in the HLR for 3 Backups. Major
The file HPSDFOAFILE is incorrectly defined. Critical for Provisioning and O&M
EHIP stop without reason. It's impossible to open the Command Handling from TMOS. Major
"mml" command (used to access to the CP) introduced from CHA interface, block the connection. Major
The APG software module necessary to define the Alarms panel isn't included in APG . Critical
The FTP functionality is incorrectly implemented. The "FTP Area" isn't clearly defined . Major
The HLR has lost the connection with TMOS due to "ossuser" password has expired. Major
"AP FAULT" alarm with unknown cause . Major
"AP SYSTEM ANALYSIS" alarm with unknown cause. ALEX doesn't indicate the cause and the recovery procedure. Documentation
"AP DIAGNOSTIC FAULT" alarm with unknown cause. ALEX doesn't indicate the cause and the recovery procedure. Documentation
Ericsson hasn't provided the command file to recover the APG after a disaster fault in HW and SW. Documentation
SAACTIONS inconsistent. The system indicates that we must increase a SAE to a lowest value that the present value. Critical for O&M
The parameter TSMO-0 appears in the subscribers with NAM=0 (GPRS subscribers) . Critical for Provisioning
Problems with the HGPFI command. Major
Incorrect definition of the procedure to update the Antivirus . Critical for O&M
Manual work is necessary to delete the Command Log files.. Major
AD-0 and AD-4 routed to a file where all the printout are written. There is a danger of disks fulfil. Major
Problems with the Antivirus configuration. When we load an APG software correction, the Antivirus and the update procedure are
Major
desconfigured .
Problems to verify a tape backup execution if you execute it from command line . Critical for O&M
The OPI to solve the alarm "INTELLIGENT NETWORKS MANAGEMENT INTERFACE FILE FAULT" force to contact with the next support
Critical for O&M
level .
Billing alarms in the HLR . Major
No Alarm Panel defined in the I-Module Major
Wrong execution of password change with the APG command "NET USER". It's mandatory to do it from the graphical interface . Major

20
Y&E

Anda mungkin juga menyukai