Anda di halaman 1dari 82

View AR | AR without Proprietary

New AR:
History | States | Assignment | Timetracking
AR Text Search | More | CARES Home
Assistance Request 1-5531112

Customer Ticket N/A

Short TEC support to check urgently as we have outage on the


RNC231

Description

Current Fix delivered to load LR13.3.9

Summary

State Closed : Solution/Service Provided

Outage Yes

Severity 1

Priority 1

Service Request Remote Technical Support

Request Type Defect

Sub-Type Software

Category Software

Sub-Category Software Defect

Internal No

Outage Report

Assignment Events
Product

Product 9370 RNC

Version LR13.3.W

No Change Request is linked to this AR

Solution Source 1434917

Product Location

Site ETISALAT ABU DHABI

Instance 9370RNC-Etisalat : In Service

Site Company Emirates Telecommunications Corporation

City Abu Dhabi

Country United Arab Emirates

Contact

Name Karim IBRAHIM ABD EL NABY

Company Alcatel-Lucent : Open on behalf of Customer

Phone 20 127 109 0616 Email


Karim.Ibrahim@alcatel-lucent.com

Request Method Email-CR

Additional 971 56 354 4568

Contact Info

Dates

Occurred 12-Jan-2015 13:06 Time now 29-Aug-2015 13:10


(GMT+4)

Reported 12-Jan-2015 13:06 AR Created 12-Jan-2015 13:09

Service Start 12-Jan-2015 13:06

Outage Start 12-Jan-2015 13:06

Next Customer Contact 20-Feb-2015 03:00

Responded 12-Jan-2015 13:25 Respond Target 12-Jan-2015 13:36


SA - Calculated
Restored 12-Jan-2015 21:15 Restore Target 12-Jan-2015 21:35
SA - Calculated

Resolved 03-Feb-2015 12:02 Resolve Target 22-Mar-2015 08:37


SA - Calculated

Targets for Restore & Resolve were extended by 8 days 20 hours

to account for Pending time

Closed 03-Feb-2015 12:05

Last Modified 03-Jun-2015 16:08 Modified By archive

Entitlement

Agreement 242895 (OXIA 627085)

Covered Service TS 24x7 (Gold, Wireless Mobile Access)

Script This request is entitled to service.

People

Owner TSCr-MoA-WCDMA-SK : rories

Assignee TSCr-MoA-WCDMA-SK : rories

Copy To rories

Referred 1 TEC-MoA-WCDMA_RNC-FR : remiburie

Submitter dlempick

Description

Attachments

several TMU reset on the RNC231:

Lp/12 Ap/1; 2015-01-12 13:59:15.30

MSG minor equipment softwareError


00009000

ADMIN: unlocked OPER: enabled USAGE: active

AVAIL: PROC: reporting CNTRL:

ALARM: STBY: notSet UNKNW: false


Id: 0C000DAA Rel: Shelf Card/12

Com: TMU -- EXCEPTION:Memory Manager error[memPartFree] at block 0x52708

Prev Hdr [0xffffffff] Curr Hdr [0x526f8] Next Hdr


[0xffffffff]

both start and end addr is invalid.

both start and end addr is invalid.

Task:0x7d550000: UeCallp

DAR: 0x00000000

PC: 0x003a94f0 taskUnlock +0x174

LR: 0x6b27d844

_ZN20UeCallUeRrmInterface12handoverAlgoEP16Uecall_Context_t +0x114

003a94f0 taskUnlock

Int: 0/0/0/114; /clearlib/vobs/I-Node/sw/PS/Base/PsHAL/psFltSwr/src/ps

FaultHandling.cc; 2925; PATCH: RID37300E1953

Showing Investigation and Proprietary logs together in time order Show


separately

Investigation Log and Proprietary Log

1. 12-Jan-2015 13:19
dlempick

Bridge successfully reserved.

Access Code: 4092509

Chairperson Code: 2009

AR Number: 1-5531112

ALU Bridge (2801-2801) a/c 4092509


United Arab Emirates Toll-Free: 8000174537

2. 12-Jan-2015 13:41 ALCATEL-LUCENT PROPRIETARY


stbalko

ACT link:

http://umts-er.ca.alcatel.com/activecall.php?callId=1765#live-area

3. 12-Jan-2015 14:48
gmohamed

Update to Current Summary: one RNC is down after ip fluctuation

4. 12-Jan-2015 16:14
stbalko

Update to Current Summary: TMU reset caused by wrong message send over Iur.
TEC

recommendation is to lock all Iur links to stabilized RNC.

5. 12-Jan-2015 17:41
stbalko

Update to Current Summary: Iur link which caused the problem identified (Ss7

M3ua/1 PMP/53 Assoc/*) and locked, RNC looks stabilized now.

6. 12-Jan-2015 17:47
aelmidan

Update to Current Summary: Problematic IuR identified and locked, no more


TMU
reset. Logs/traces analysis ongoing by TSO/TEC for root cause

7. 12-Jan-2015 18:44 ALCATEL-LUCENT PROPRIETARY


stbalko

outage Timeline:

January 12, 2015

09:40:04 GMT (10:40:04) : ER engineers connect to Bridge on 10:25

09:41:04 GMT (10:41:04) : ER engineers connect to to RNC231

09:42:55 GMT (10:42:55) : all TMUs expect two are down (pending state)

09:43:55 GMT (10:43:55) : recoveri actions : TMU reset not successful

09:44:38 GMT (10:44:38) : next recovery action: PMC swact

09:44:57 GMT (10:44:57) : PMC swact without help

09:46:07 GMT (10:46:07) : customer tried to do switchover of lp/0. After

switchover situation went to previous state

09:47:50 GMT (10:47:50) : ER team performed switchover and RNC231 is back in


service

09:50:57 GMT (10:50:57) : RNC under monitoring

09:56:39 GMT (10:56:39) : RNC is running OK till now

10:02:57 GMT (11:02:57) : some TMU has been restarted with softwareError
(Lp/2

Ap/1, Lp/12 Ap/1, Lp/6 Ap/1) TMU -- EXCEPTION:Memory Manager


error[memPartFree] at

block 0x53f20

10:11:25 GMT (11:11:25) : new TMU reset on Lp/13 Ap/5 and Lp/12 Ap/1

10:11:56 GMT (11:11:56) : ER team requested for RNC reset

10:14:53 GMT (11:14:53) : RNCLogColection started in Emergency Mode

10:25:09 GMT (11:25:09) : CP switchover help, calls are processed, however


TMUs
are reseting with EXCEPTION:Memory Manager error[memPartFree]

10:25:37 GMT (11:25:37) : customer approved shelf reset

10:26:18 GMT (11:26:18) : reset shelf done at 11:25

10:31:55 GMT (11:31:55) : RNC is back

10:36:11 GMT (11:36:11) : just one TMU reset seen (Lp/12 Ap/1) after RNC
reset

10:36:23 GMT (11:36:23) : RNC under monitoring

10:44:55 GMT (11:44:55) : new TMU reset on (Lp/9 Ap/1)

10:45:09 GMT (11:45:09) : paging TEC

10:55:01 GMT (11:55:01) : two TMU reset seen after RNC restart

10:55:57 GMT (11:55:57) : Remi Burie from TEC join the bridge

11:05:09 GMT (12:05:09) : new occurence of the TMU reset, issue is back also
after

RNC reset

11:16:59 GMT (12:16:59) : TEC update: problem is on Iur link which causes
TMU

reset; recomendation is to lock Iur link

11:21:56 GMT (12:21:56) : some message coming from a Iur causing


softwareError on

the TMU which lead to TMU reset

11:22:39 GMT (12:22:39) : TEC recommendation ist to lock all Iur to


stabilize RNC

11:39:04 GMT (12:39:04) : TEC recommendation has been rejected by cutomer;

Customer denied to lock all Iur

11:41:05 GMT (12:41:05) : ALU team negotiating with customer to convince


them to

lock all Iur to stabilize RNC

11:45:17 GMT (12:45:17) : arround 8 Iur link exist, other RNC vendor
connected to

RNC231 via Iur which could cause the problem

11:53:18 GMT (12:53:18) : customer still rejecting to lock all Iur


12:17:29 GMT (13:17:29) : Customer reject to lock Iur link one by one to
identify

link which are casing the issue

12:31:04 GMT (13:31:04) : CTg in progress

12:46:47 GMT (13:46:47) : customer finaly agreed to lock Iur link one by one

13:00:12 GMT (14:00:12) : fisrt Iur locked (Ss7 M3ua/1 PMP/53 Assoc/*)

13:10:54 GMT (14:10:54) : Remark: customer were award that locking Iur link
one by

one may not identify the corrupted Iur if the bad message is sent via two or
more

different Iur link in same time.

13:19:05 GMT (14:19:05) : No more TMU reset after locking Ss7 M3ua/1 PMP/53

Assoc/* (link locked more the 20 min)

13:21:05 GMT (14:21:05) : Iur link unlocke to proof that this link was that
one

which send the band message

13:22:15 GMT (14:22:15) : New TMU reset occurred just after Iur link (Ss7
M3ua/1

PMP/53 Assoc/*) unlocked

13:32:31 GMT (14:32:31) : Link (Ss7 M3ua/1 PMP/53 Assoc/*) locked back to
restore

service.

13:34:49 GMT (14:34:49) : CTG, RNCLogcoLection in emergency mode and

RNCLogCollection in full mode under download

13:35:16 GMT (14:35:16) : RNC under monitoring

13:40:06 GMT (14:40:06) : logs are on CUBA server 135.120.182.10 =>

/traces/OUTAGE_RCA_ER_GPS/1-5531112

13:47:42 GMT (14:47:42) : Issue identified by TEC and it is covered by CR:


0949298

Outage triggered by IUR instability between ALU RNC and other vendor RNC

13:48:22 GMT (14:48:22) : Customer is currently running LR13.3.7 and fix is


in
LR13.3.9

13:53:39 GMT (14:53:39) : no respons on the bridge, only ER team remain on


the

bridge, silence from customer

13:54:05 GMT (14:54:05) : ER team requested for Restore of this outage

14:14:43 GMT (15:14:43) : no more TMU reset till now after Iur locking;
still

waiting for Restore approval from customer

14:31:46 GMT (15:31:46) : Customer disagree to close outage.

8. 12-Jan-2015 19:00 ALCATEL-LUCENT PROPRIETARY


rories

De : BURIE, REMI (REMI)

Envoy : lundi 12 janvier 2015 15:29

: BALKO, STANISLAV (STANISLAV); RIES, Robert (Robert)

Cc : RIGAULT, LAURENT (LAURENT)

Objet : TR: STC reset TMU

Hello,

Here after the stat done by Laurent regarding the TMU reset issue.

From callp point of view it's due to alcap timeout then it means instability
on

transport side on the iur link.

Also CR 949298 is mandatory for any IUR in front of H// RNCs.

A first recommendation is to upgrade RNC to LR13.3.9, to be discussed with


product

manager.

The second one regarding the investigation of this alcap issue on IUR:
It appears that's not new as we have reset since at least 4 weeks, local
team has

to check the configuration is correctly set on this IUR link.

Not sure, we need to go forward with a specific action plan, I understand


they

already aware of configuration issue, could you confirm?

Regards,

Rmi BURIE

9. 12-Jan-2015 21:30 ALCATEL-LUCENT PROPRIETARY


rories

Time Tracking Entry Added - IR01-TSA3

10. 12-Jan-2015 21:38 ALCATEL-LUCENT PROPRIETARY


tibosley

Time Tracking Entry Added - IR01-TSA1

11. 12-Jan-2015 21:42 ALCATEL-LUCENT PROPRIETARY


dberky

From: Berky, Dusan (Dusan)

Sent: 12 January 2015 18:42

To: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert (Robert)

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER
(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,
AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);

ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY (RAMSEY); OKASHA, TAHER (TAHER);

MORVAN, FREDERIC (FREDERIC); JACKYRA, GREGORY (GREGORY); MOHAMED, GEHAD


(GEHAD);

BIALOBRODA, ROLAND (ROLAND); COPOS, BOGDAN (BOGDAN); REDA, RAMY (RAMY);


KHEDR,

MAHMOUD (MAHMOUD); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BURIE,

REMI (REMI); Kosorin, Rastislav (Rastislav); BERIDY, AHMED (AHMED);


SUSARRET,

Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT)

Subject: MoM from the meeting about outage: AR 1-5531112

Gents,

Just sending short summary of the actions that we have discussed on the
call

earlier today.

Please forward to whoever is missing in the loop

1) The fix mentioned below is good to have on RNC in front of HWI RNC,
however

would not help in this particular scenario

2) We will set restore for outage ticket

3) We will keep IuR locked until further decision ( unlocking would


lead to TUM
storms)

4) Local team to

a. Engage engineering for validating IuR configuration on impacted RNC

i. Configuration check

ii. Features activated check ( compatibility with HWI + crosscheck


settings with

working RNC)

iii. Utilization check ( there seems to be very high utilization )

b. Open new ticket for TMU storms due to IuR resource shortage.
Investigation

will be aimed at understanding

i. Why TMU storms happen

ii. How the other RNC is different ( pls provide snapshot of impacted
"bad"

& not impacted "good"RNC that is connected to the same HWI RNC

iii. How to avoid defence of TMU

5) Christophe C: Investigate who from NEA can help with investigation

6) TPS : to continue investigaton of this ticket to understand initial


outage

issue ( all TMUs down until RNC restart ).... There are two disticket
issues

hence two different tickets are needed

7) Local team :

a. To correlate KPIs from today with recovery actions ( Robert to


supply

timestamps of recovery actions)

b. Objective is to understand what are KPIs like with unlocked IuR & how
big

is the impact of TMU restarts on KPIs ( issue was unnoticed from KPIs
even

though present since 10 of Dec.


8) Local Team: to set meeting at 12:0 Paris time to review actions
progress

Rgds,

Dusan

12. 13-Jan-2015 03:09 ALCATEL-LUCENT PROPRIETARY


tibosley

Time Tracking Entry Added - IR01-TSA3

13. 13-Jan-2015 16:11


rories

Update to Current Summary: Escalated to TPS L3

14. 13-Jan-2015 16:11 ALCATEL-LUCENT PROPRIETARY


rories

Dear TEC team

We need to perform post outage investigation of thiss issue.

It is connected with unexplained TMU pending states. The customer ETC (UAE)

noticed strong KPI degradation the 12th Jan 2015 (he was already asked to
provide

appropriate NPO KPI outputs. Once he provide them, we update the ticket).

d lp/* ap/* rncap

Lp/* Ap/* RncAp


Use -noTabular to see the many hidden attributes.

+==+==+------+----------+------+----------+----------+----------+----------

|Lp|Ap| role | instance |apSpar|sparedCell|unSparedCe| numCall |sparedPath

| | | | | | | ll | |

+==+==+------+----------+------+----------+----------+----------+----------

| 2| 0|aMaste| 0|spared| na| na| na| na

| 2| 1|tmuPen| 0|na | na| na| na| na

| 2| 2|rab | 0|na | 0| 0| 20| 0

| 2| 3|rab | 1|na | 3| 0| 16| 0

| 2| 4|pc | 0|na | 0| 0| 0| 0

| 2| 5|rab | 2|na | 2| 0| 20| 0

| 3| 0|sMaste| 1|spared| na| na| na| na

| 3| 1|tmuPen| 1|na | na| na| na| na

| 3| 2|rab | 3|na | 2| 0| 18| 0

| 3| 3|rab | 4|na | 3| 0| 20| 0

| 3| 4|pc | 1|na | 0| 0| 0| 0

| 3| 5|rab | 5|na | 3| 0| 25| 0

| 4| 0|rab | 6|na | 2| 0| 15| 0

| 4| 1|tmuPen| 2|na | na| na| na| na

| 4| 2|aNi | 0|spared| na| na| na| na

| 4| 3|rab | 7|na | 0| 0| 22| 0

| 4| 4|pc | 2|na | 0| 0| 0| 0

| 4| 5|aOmu | 0|spared| na| na| na| na

| 5| 0|rab | 8|na | 2| 0| 16| 0

| 5| 1|tmuPen| 3|na | na| na| na| na

| 5| 2|sNi | 1|spared| na| na| na| na

| 5| 3|rab | 9|na | 0| 0| 20| 0

| 5| 4|pc | 3|na | 0| 0| 0| 0
| 5| 5|sOmu | 1|spared| na| na| na| na

| 6| 0|rab | 10|na | 0| 0| 19| 0

| 6| 1|tmuPen| 4|na | na| na| na| na

| 6| 2|rab | 11|na | 2| 0| 17| 0

| 6| 3|rab | 12|na | 1| 0| 20| 0

| 6| 4|pc | 4|na | 0| 0| 0| 0

| 6| 5|rab | 13|na | 2| 0| 24| 0

| 7| 0|rab | 14|na | 0| 0| 18| 0

| 7| 1|tmuPen| 5|na | na| na| na| na

| 7| 2|rab | 15|na | 1| 0| 26| 0

| 7| 3|rab | 16|na | 1| 0| 23| 0

| 7| 4|pc | 5|na | 0| 0| 0| 0

| 7| 5|rab | 17|na | 1| 0| 16| 0

| 8| 0|rab | 32|na | 1| 0| 23| 0

| 8| 1|tmuPen| 12|na | na| na| na| na

| 8| 2|rab | 33|na | 1| 0| 21| 0

| 8| 3|rab | 34|na | 2| 0| 27| 0

| 8| 4|pc | 10|na | 0| 0| 0| 0

| 8| 5|rab | 35|na | 1| 0| 23| 0

| 9| 0|rab | 36|na | 2| 0| 23| 0

| 9| 1|tmuPen| 13|na | na| na| na| na

| 9| 2|rab | 37|na | 2| 0| 20| 0

| 9| 3|rab | 38|na | 1| 0| 22| 0

| 9| 4|pc | 11|na | 0| 0| 0| 0

| 9| 5|rab | 39|na | 2| 0| 18| 0

|10| 0|rab | 18|na | 1| 0| 17| 0

|10| 1|tmu | 6|na | na| na| na| na

|10| 2|rab | 19|na | 3| 0| 19| 0


|10| 3|rab | 20|na | 1| 0| 26| 0

|10| 4|pc | 6|na | 0| 0| 0| 0

|10| 5|rab | 21|na | 1| 0| 21| 0

|11| 0|rab | 22|na | 2| 0| 20| 0

|11| 1|tmuPen| 7|na | na| na| na| na

|11| 2|rab | 23|na | 3| 0| 23| 0

|11| 3|rab | 24|na | 2| 0| 21| 0

|11| 4|pc | 7|na | 0| 0| 0| 0

|11| 5|rab | 25|na | 2| 0| 18| 0

|12| 0|rab | 26|na | 2| 0| 25| 0

|12| 1|tmu | 8|na | na| na| na| na

|12| 2|rab | 27|na | 2| 0| 15| 0

|12| 3|rab | 28|na | 3| 0| 25| 0

|12| 4|pc | 8|na | 0| 0| 0| 0

|12| 5|tmuPen| 9|na | na| na| na| na

|13| 0|rab | 29|na | 1| 0| 17| 0

|13| 1|none | na|na | na| na| na| na

|13| 2|rab | 30|na | 2| 0| 24| 0

|13| 3|rab | 31|na | 2| 0| 20| 0

|13| 4|pc | 9|na | 0| 0| 0| 0

|13| 5|tmu | 11|na | na| na| na| na

ok 2015-01-12 13:40:31.12

During emergency recovery activity we performed TMU reset and PMC master
swact. No

change in TMU states. Only CP switchover brought all TMUs to oper state.

We need to investigate what is RC of such incorrect TMU pending state.


Logs are available on CUBA server 135.120.182.10 =>

/traces/OUTAGE_RCA_ER_GPS/1-5531112

Thank you

Best Regards

Robert Ries

15. 13-Jan-2015 19:26 ALCATEL-LUCENT PROPRIETARY


tibosley

Time Tracking Entry Added - IR01-TSA1

16. 13-Jan-2015 20:36 ALCATEL-LUCENT PROPRIETARY


remiburie

CR 1434917 escalated to design.

17. 14-Jan-2015 11:21


rories

From: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Sent: Monday, January 12, 2015 6:30 PM

To: EL-MIDANY, AHMED (AHMED); ABDEL-HALIM, SAYED (SAYED); OKASHA, TAHER


(TAHER);

KHEDR, MAHMOUD (MAHMOUD); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC


(FREDERIC);

BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU,
CYRIL

(CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);


EL-MIDANY, AHMED (AHMED); ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY
(RAMSEY);

OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA, GREGORY


(GREGORY);

KHEDR, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD)

Subject: RE: RNC231 cannot carry any calls and we have a stream of TMUs
reset,

OUTAGE issue // 1-5531112

Hello all,

Please find below the summary of the issue and the latest updates:

Problem description:

Traffic degradation on RNC231

25 NodeBs are impacted

Continuous TMU reset leading to traffic outage on the NodeBs

Actions done:

CP switchover => no improvement

RNC shelf reset => slight improvement observed (still observing TMU reset
but with

lower frequency)

Lock the IUR towards RNC206, the issue solved and no alarms.

Impact:

CS/PS traffic degradation/outage on random cells

IuR interface fluctuation => OK after RNC reset

Investigation:
TEC is suspecting some rejection occurring on IuR interface leading to TMU

instability

Investigation on-going in parallel by TSO/TEC

Next step/Action plan:

Waiting TEC & TSO feedback with AP and corrective action as per the customer

we cannot keep IUR link locked because it affecting the KPIs.

Additional traces to be collected and shared with TEC, all logs available @

/srv/data202256/data/server/Traces/from_france/ETISALAT_UAE/ 1-5531112

Please waiting your feedback ASAP.

Server details as the GPS server not working:

SFTP://172.25.80.47

user : ftraces

password : Elv24trf

Thanks,

--

BR,

Karim

18. 14-Jan-2015 11:21


rories

From: KHEDR, MAHMOUD (MAHMOUD)


Sent: Monday, January 12, 2015 3:49 PM

To: RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV); BURIE, REMI (REMI);

Berky, Dusan (Dusan); Kosorin, Rastislav (Rastislav)

Cc: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED);

OKASHA, TAHER (TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC


(FREDERIC);

EL-MIDANY, AHMED (AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR,
NAVNEET

(NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI,

RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD

(MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-


FAKHARANY,

HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY (RAMSEY);


OKASHA,

TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA, GREGORY (GREGORY);


MOHAMED,

GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS, BOGDAN (BOGDAN); REDA,


RAMY

(RAMY)

Subject: RE: RNC231 cannot carry any calls and we have a stream of TMUs
reset,

OUTAGE issue // 1-5531112

Importance: High

Dear Dusan and team,

Looking for your usual support to find the RCA and solution for the current

critical situation we have

Please note that locking IuR interface is not acceptable as WA, so we need
to

provide an urgent final solution


Thanks in advance

Best regards,

Mahmoud Khedr

19. 14-Jan-2015 11:22


rories

From: RIES, Robert (Robert)

Sent: Monday, January 12, 2015 7:27 PM

To: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); Berky, Dusan


(Dusan);

BALKO, STANISLAV (STANISLAV); BOSLEY, Tim (Tim)** CTR **

Subject: AR 1-5531112 TEC support to check urgently as we have outage on the


RNC231

Hi Karim

Issue summary:

There are continuously repeated TMU resets on RNC231.

By TEC investigation carried out by Remi Burie, we found out, that issue is
close

connected with IuR link.

By locking of PMP/53 associations 0 and 1, the issue disappeared => there


were

stopped TMU resets.

Following investigation found, that the issue is known and there is also
known fix

for it delivered to load LR13.3.9


Based on this emergency intervention we recognize the issue as explained.

Kindly, as issue has known RC and there is fix provided, could we set this
ticket

as Restore ? Issue will be investigated in further by TEC also in this case,


of

course.

And one more question: in communication on the bridge you (or someone from
your

team) mentioned, that you are aware of some configuration issue. What did
you mean

by this ? Could you explain it closer, please ?

Thanks

Best Regards

Robert Ries

20. 14-Jan-2015 11:23


rories

From: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Sent: Monday, January 12, 2015 4:39 PM

To: RIES, Robert (Robert)

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); Berky, Dusan


(Dusan);

BALKO, STANISLAV (STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED

(SAYED); OKASHA, TAHER (TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC


(FREDERIC); EL-MIDANY, AHMED (AHMED); BERIDY, AHMED (AHMED); REDA, RAMY
(RAMY);

KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)**


CTR

**; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM,

MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE);

EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); BURIE, REMI (REMI); Berky, Dusan (Dusan);


Kosorin,

Rastislav (Rastislav); BERIDY, AHMED (AHMED)

Subject: RE: AR 1-5531112 TEC support to check urgently as we have outage on


the

RNC231

Importance: High

Hi Reis,

Thanks for your feedback,

Please find my comments:

[REIS]Following investigation found, that the issue is known and there is


also

known fix for it delivered to load LR13.3.9

[Karim] is that mean to solve the issue we need a patch or SW upgrade ?!,
are
there a WA to solve the issue for now?, are there a released alert for this

issue?.

[Karim] What's our next steps?

[REIS]Kindly, as issue has known RC and there is fix provided, could we set
this

ticket as Restore ? Issue will be investigated in further by TEC also in


this

case, of course.

[Karim] As discussed , we cannot restore the outage for now! The customer is

pushing on us and cannot accept to keep the IUR link locked, also if we
unlocked

we will have outage again.

[REIS]And one more question: in communication on the bridge you (or someone
from

your team) mentioned, that you are aware of some configuration issue. What
did you

mean by this ? Could you explain it closer, please ?

[Karim] Ahmed Beridy the responsible for Radio part will update with more
details

about this issue.

Thanks,

--

BR,

Karim
21. 14-Jan-2015 11:23
rories

From: Berky, Dusan (Dusan)

Sent: Monday, 12 January, 2015 19:42

To: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert (Robert)

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER

(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,


AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);

ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY (RAMSEY); OKASHA, TAHER (TAHER);

MORVAN, FREDERIC (FREDERIC); JACKYRA, GREGORY (GREGORY); MOHAMED, GEHAD


(GEHAD);

BIALOBRODA, ROLAND (ROLAND); COPOS, BOGDAN (BOGDAN); REDA, RAMY (RAMY);


KHEDR,

MAHMOUD (MAHMOUD); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BURIE,

REMI (REMI); Kosorin, Rastislav (Rastislav); BERIDY, AHMED (AHMED);


SUSARRET,

Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT)

Subject: MoM from the meeting about outage: AR 1-5531112

Gents,

Just sending short summary of the actions that we have discussed on the
call
earlier today.

Please forward to whoever is missing in the loop

1) The fix mentioned below is good to have on RNC in front of HWI RNC,

however would not help in this particular scenario

2) We will set restore for outage ticket

3) We will keep IuR locked until further decision ( unlocking would


lead to

TUM storms)

4) Local team to

a. Engage engineering for validating IuR configuration on impacted


RNC

i.
Configuration check

ii.
Features activated check ( compatibility with HWI + crosscheck settings
with working RNC)

iii.
Utilization check ( there seems to be very high utilization )

b. Open new ticket for TMU storms due to IuR resource shortage.

Investigation will be aimed at understanding

i. Why
TMU storms happen

ii. How
the other RNC is different ( pls provide snapshot of impacted "bad" & not
impacted "good"RNC that is connected to the same HWI RNC

iii. How to
avoid defence of TMU

5) Christophe C: Investigate who from NEA can help with investigation

6) TPS : to continue investigaton of this ticket to understand initial


outage

issue ( all TMUs down until RNC restart ).... There are two disticket
issues

hence two different tickets are needed


7) Local team :

a. To correlate KPIs from today with recovery actions ( Robert to


supply

timestamps of recovery actions)

b. Objective is to understand what are KPIs like with unlocked IuR &
how

big is the impact of TMU restarts on KPIs ( issue was unnoticed from KPIs
even

though present since 10 of Dec.

8) Local Team: to set meeting at 12:0 Paris time to review actions


progress

Rgds,

Dusan

22. 14-Jan-2015 11:24


rories

From: OKASHA, TAHER (TAHER)

Sent: Monday, 12 January, 2015 20:24

To: MAHER, RAFIK (RAFIK); EL-SAEED, AHMED (AHMED); BENIGHIL, SOUFIANE


(SOUFIANE)**

CTR **; MONNAIE, DANIEL (DANIEL)

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); AHMED BERIDY (AHMED)

(ahmed.beridy@alcatel-lucent.com); EL-MIDANY, AHMED (AHMED); RIES, Robert

(Robert); BURIE, REMI (REMI); Berky, Dusan (Dusan)

Subject: FW: MoM from the meeting about outage: AR 1-5531112

Importance: High

Dear Daniel,
We have suffered a severe TMU resets on RNC231 which is currently Live
carrying 25

sites.

The feedback from TEC that is related to IUR link with RNC206 suffering from
ALCAP

issue and resource allocation issue. It's been requested to check with
engineering

the IUR link for the below points:

i.
Configuration check

ii.
Features activated check ( compatibility with HWI + crosscheck settings
with working RNC)

iii.
Utilization check ( there seems to be very high utilization )

We appreciate your support on whom can help with this and what kind of

counters/indicators needed to check the utilization of the IUR interface if


needs

to be re-engineered.

I'm adding Robert Ries from TSO and Remi Burie from TEC for any further
details

required on what's needed to be checked.

Thanks & BR,

Taher OKASHA

23. 14-Jan-2015 11:25


rories
From: RIES, Robert (Robert)

Sent: Tuesday, January 13, 2015 8:45 AM

To: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER

(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,


AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);

ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY (RAMSEY); OKASHA, TAHER (TAHER);

MORVAN, FREDERIC (FREDERIC); JACKYRA, GREGORY (GREGORY); MOHAMED, GEHAD


(GEHAD);

BIALOBRODA, ROLAND (ROLAND); COPOS, BOGDAN (BOGDAN); REDA, RAMY (RAMY);


KHEDR,

MAHMOUD (MAHMOUD); BURIE, REMI (REMI); BERIDY, AHMED (AHMED); SUSARRET,


Andres

(Andres)** CTR **; MERCHAUT, VINCENT (VINCENT); Berky, Dusan (Dusan)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello all

Next is history of recovery team activities connected to ETC (UAE) TMU issue
the

January 12, 2015:

09:38 (13:38 RNC local time) we found 11 TMUs out of 14 in Pended state

09:40 reset lp/13 ap/1 (one of pended TMUs)


09:41 waiting for restore - result => not success, 11 TMU still Pended

09:41 reset lp/13 ap/0 (RAB)

09:41 waiting for restore - result => not success, without change

09:41 reset lp/3 ap/0 (Stand-by PMC-master)

09:43 waiting for restore - result => not success, 11 TMUs still Pended

09:43 reset lp/2 ap/0 (Active PMC-master)

09:44 waiting for restore - result => not success, 11 TMUs still Pended

09:44 switchover lp/0 (CP) - command sent by customer

09:44 waiting for restore - result => not success, 11 TMUs still Pended

09:44 switchover lp/0 - command sent by us

09:46 waiting for restore - result => success, all 14 TMUs working

09:48 alarms "TMU -- EXCEPTION:Memory Manager error[memPartFree] at block"


back

(TMU reset)

10:23 (14:23 RNC local time) we did "reset shelf"

10:30 The first C-Bearer

10:31 alarms "TMU -- EXCEPTION:Memory Manager error[memPartFree] at block"


back

12:54 (16:54 RNC local time) Ss7 M3ua/1 PMP/53 Assoc/0&1 locked - command
sent

by customer

- no present alarms "TMU -- EXCEPTION:Memory Manager error[memPartFree] at


block"

13:19 (17:19 RNC local time) Ss7 M3ua/1 PMP/53 Assoc/0&1 unlocked - command

sent by customer

13:20 alarms "TMU -- EXCEPTION:Memory Manager error[memPartFree] at block"


back

13:30 (17:30 RNC local time) Ss7 M3ua/1 PMP/53 Assoc/0&1 locked - command
sent

by customer
- no present alarms "TMU -- EXCEPTION:Memory Manager error[memPartFree] at
block"

=====================================================================

Furthermore we inspect hfb files for alarms "Memory Manager error". Next is
history:

October 2014 => 0

November 2014 => 2 (founded just on RNC232 & RNC235)

December 2014 => 2487 (all of them on RNC231)

January 2015 => 2391 (all of them on RNC231)

The first one "TMU -- EXCEPTION:Memory Manager error[memPartFree] at block"


alarm

appears on RNC231 the 10th Dec 2014 at 2:15PM.

Best Regards

Robert Ries

24. 14-Jan-2015 11:25


rories

From: CAHOUR, CHRISTOPHE (CHRISTOPHE)

Sent: Tuesday, January 13, 2015 9:15 AM

To: DURANCEAU, FRANCOIS-XAVIER (FRANCOIS-XAVIER)

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER

(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,


AHMED
(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI,


RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); BURIE, REMI (REMI); Kosorin, Rastislav


(Rastislav);

BERIDY, AHMED (AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT

(VINCENT); Berky, Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR
**;

RIES, Robert (Robert)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Franois Xavier

Would you have somebody who could contribute on this one.

More from a configuration point of view.

We have a RNC experiencing TMU reset.

According to analysis seems to be related to either bandwith/capacity for


ALCAP

allocation on IUR or feature activated on IUR.

We would need somebody with skills to investigate on those area.

Thanks.
Best Regards

Christophe.

25. 14-Jan-2015 11:26


rories

From: MOHAMED, GEHAD (GEHAD)

Sent: Tuesday, January 13, 2015 11:47 AM

To: RIES, Robert (Robert); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER

(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,


AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); CAHOUR,


CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED);


ZAHABI,

RAMSEY (RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC);


JACKYRA,

GREGORY (GREGORY); BIALOBRODA, ROLAND (ROLAND); COPOS, BOGDAN (BOGDAN);


REDA, RAMY

(RAMY); KHEDR, MAHMOUD (MAHMOUD); BURIE, REMI (REMI); BERIDY, AHMED (AHMED);

SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT); Berky,


Dusan

(Dusan)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Robert,
I think in our report to customer we will need to explain the reason of "11
TMUs

out of 14 in Pended state" that has been fixed only by switchover.

Do we have a root cause for that?

We know that we had an issue in IP network, but is it normal to stuck in


that

status?

Regards

Gehad

26. 14-Jan-2015 11:26


rories

From: EL-MIDANY, AHMED (AHMED)

Sent: Tuesday, January 13, 2015 1:14 PM

To: EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **;

ABDEL-HALIM, SAYED (SAYED); OKASHA, TAHER (TAHER); KHEDR, MAHMOUD (MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BURIE,

REMI (REMI); BOSLEY, Tim (Tim)** CTR **; Hall, Gail Culver (Gail)** CTR **;

SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT); NABIL


GEORGES,

MARTIN (MARTIN)** CTR **


Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-
5532074

Hello all,

Thanks for your attendance, kindly find below the MoM:

- Relation between IuR & TMU reset is not clear yet

- RCA for TMUs in pending state is not identified

- ARs to be escalated to RNC design team => AP. TSO/TEC

- Snapshot configuration comparison between RNC231 & RNC235 (in the


scope of

AR# 1-5532074) => AP. TSO

- Snapshot configuration checks, IuR utilization and activated features


=> AP.

Engineering team (feedback expected today)

- KPIs to be checked for any degradation in the period between RNC reset
& IuR

lock => AP. Local team

- Next synchro call Wednesday 14-Jan@14:00 Paris time => AP. Local team
to send

the invitation

BR,

Ahmed El-Midany

27. 14-Jan-2015 11:28


rories

From: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **


Sent: Tuesday, January 13, 2015 2:51 PM

To: MONNAIE, DANIEL (DANIEL); OKASHA, TAHER (TAHER); MAHER, RAFIK (RAFIK)

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI

(REMI); Berky, Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **;
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hi Taher, can you give me a call please ? I can't join you from my phone

As first analysis and my understanding of the issue, it is related to TMU


load

issue caused by the IuR link between RNC231 & Huawie RNC206

I don't understand why you are referencing to the ALCAP on your below mail,
howver

the network is on fullIP. ALCPA is existing only when ATM is existing but it
is

not the case

Can I get the CPU laod of all the eDCPS boards please for these 2 last
weeks? Make

sure this period is including the period where the IuR is not Locked and
after

when it was not unlocked

As the TEC Guys, seems also the issue is known, and there is a fix provided,
as

peer the historic of this email, can I get it? It may be related to the IuR

mapping of SCTP links where a PDC is used more than other... to be checked
on
fresh snapshot, so thanks also to provide the new snapshot of the network

Cdt/BR

Soufiane B

28. 14-Jan-2015 11:29


rories

From: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Sent: Tuesday, January 13, 2015 3:03 PM

To: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; MONNAIE, DANIEL (DANIEL);


OKASHA,

TAHER (TAHER); MAHER, RAFIK (RAFIK)

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI

(REMI); Berky, Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **;
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

The counter to provide the data for hourly granularity is

VS_ApCpuUtilizationAvg for all baords

29. 14-Jan-2015 11:29


rories

From: OKASHA, TAHER (TAHER)

Sent: Tuesday, January 13, 2015 6:14 PM


To: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; MONNAIE, DANIEL (DANIEL); MAHER,
RAFIK

(RAFIK); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI

(REMI); Berky, Dusan (Dusan); WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-
HALIM,

SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Soufiane,

As per our discussion the issue is that the TMUs were restarting randomly
with

high frequency with unknow reason till now.

TEC has concluded that it's due to something on the IUR link with RNC206 but
they

are unable to determine the correlation till now. issue has been escalated
to RNC

design. TMU restart stopped after this IUR link was locked.

On the other hand they want a check from engineering side for any
differences

between the IUR between RNC231<>RNC206 & RNC235<>206 as this one is

working fine. And check for IUR link utilization if it's OK or not.

The ALCAP topic is discarded of course it was strange already but that was
TEC

feedback but they discarded this topic since it's full IP.
The below mentioned fix has been found out that it's not related to the
issue

appeared here it was wrong correlation from TEC at beginning as mentioned in


MOM

of Dusan.

We will provide you with a fresh snapshot and the CPU loads for the last
week as

requested during our call as well as timestamps of the actions taken


yesterday.

Thanks & BR,

Taher OKASHA

30. 14-Jan-2015 11:30


rories

From: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **

Sent: Tuesday, January 13, 2015 3:35 PM

To: OKASHA, TAHER (TAHER); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; MONNAIE,
DANIEL

(DANIEL); MAHER, RAFIK (RAFIK)

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI

(REMI); Berky, Dusan (Dusan); WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-
HALIM,

SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Soufiane,
Kindly find attached network snapshot, required indicator and current TMU
mapping

for RNC231.

Let us know for another requirements,

Thanks,

--

BR,

Karim

31. 14-Jan-2015 11:30


rories

From: RIES, Robert (Robert)

Sent: Tuesday, January 13, 2015 4:05 PM

To: IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; OKASHA, TAHER (TAHER);
BENIGHIL,

SOUFIANE (SOUFIANE)** CTR **; MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK)

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); BURIE, REMI (REMI); Berky, Dusan
(Dusan);

WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Karim/all

Thank you for data provided, however to see complex behavior (degradation)
we are
waiting for another KPI also:

CSSR

CDR

RRC

RAB

All for CS & PS domain on 15min basis at least since 11th Jan 2015.

Thank you

Best Regards

Robert Ries

32. 14-Jan-2015 11:31


rories

From: DURANCEAU, FRANCOIS-XAVIER (FRANCOIS-XAVIER)

Sent: Tuesday, January 13, 2015 6:25 PM

To: CAHOUR, CHRISTOPHE (CHRISTOPHE); JAOUANI, NOELLE (NOELLE)

Cc: BURIE, REMI (REMI); Kosorin, Rastislav (Rastislav); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER

(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,


AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,
CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI,


RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); BURIE, REMI (REMI); Kosorin, Rastislav


(Rastislav);

BERIDY, AHMED (AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT

(VINCENT); Berky, Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR
**;

RIES, Robert (Robert)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Christophe,

I went through the AR description and mail thread below and I think that the
NEA

support requested is more on RNC configuration related to the IuR interface.

Therefore I add in the loop Noelle Jaouani to check if her team can provide
such

support.

BR,

fx

33. 14-Jan-2015 11:31


rories
From: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Sent: Tuesday, January 13, 2015 6:54 PM

To: OKASHA, TAHER (TAHER); MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK);
IBRAHIM

ABD EL NABY, Karim (Karim)** CTR **

Cc: KHEDR, MAHMOUD (MAHMOUD); REDA, RAMY (RAMY); EL-SAEED, AHMED (AHMED);
BERIDY,

AHMED (AHMED); EL-MIDANY, AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI

(REMI); Berky, Dusan (Dusan); WAGDY MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-
HALIM,

SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hi Taher,

I had a look to the network configurations bellow is my feeback:

1. For the IuR 231-206 there is only 1 Association definded as below

Where IuR 235-206 there 4 Association defined as below :

with Port 7000 instead of 2905 in IuR for the peer (206)

I propose to align with 235-206 IuR by creating the missed associations &

right Ports 7000,7001,7002,7003

2. There are 5 IuCs links + 8 IuRs Links+ 3 IuPS links:


a) 8 IuRs links : 8*2 SCTP(2/2)= 16 associations are mapped to TUM/2 &
TMU/3

b) 2 IuCS links : 2 *2 SCTP (2/4)= 4 associations are mapped to TUM/2 &


TMU/3

c) 3 IuPS links : 3*2 SCTP(2/4) =6 associations are mapped to TUM/2


& TMU/3

d) Total of 26 associations are mapped only to TMUs/2/3, where we are


disposing 12

DCPS available 6 of them are free, only slots 2/3/6/7 are used,
8/9/10/11/12/13

By looking to all of these points, I can conclude, that the TMU mapping
should be

reviewed as bellow:

1. In first Action, in order to have less impact on the network, let's


try just to

modify the IuR links associations:

" By mapping the SCTP association to 8/9 (unused till now) instead of
Slot/2 &

/3 and unlock the IuRs.

" And Created the missed Associations with ports 7000 to 7003 & with
PMP Ip @

10.241.31.53

2. In 2nd action: After 1st action outcomes, we align all the network
interfaces

as it is configured on RNC235 for IuR & IuCs & PS with addition of Slot

12 & 13 which are free:

Keep IuRs associations to DCPS Slot/2/3

Map IuCs associations to DCPS Slot/8/9/10/11 Instead of


2/3/6/7

Map IuPs associations to DCPS Slot/6/7/12/13 Instead of


2/3/6/7
Very curious behaviours of Random TMUs reseting

So, the fact that the IuR 231-206 is looked seems giving some breathing to
the

TMUs loaded, because the STCP Asscication of all IuRs links in this RNC are
linked

to TMU/2 &/3. I guess any lock of other IiuRs will have the same effect, I

mean TMU offload..

Let's see the impact of this changes for the first action then look at the
TMUs

behaviors

Attached is the WO1 to be applied asap, and unlock the IuR 239-206

Cdt

Soufiane B

34. 14-Jan-2015 11:32


rories

From: OKASHA, TAHER (TAHER)

Sent: Tuesday, January 13, 2015 10:37 PM

To: KHEDR, MAHMOUD (MAHMOUD); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; EL-
SAEED,

AHMED (AHMED)

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BERIDY, AHMED (AHMED); EL-MIDANY, AHMED

(AHMED); RIES, Robert (Robert); BURIE, REMI (REMI); Berky, Dusan (Dusan);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; ABDEL-HALIM, SAYED (SAYED)


Subject: RE: MoM from the meeting about outage: AR 1-5531112

Importance: High

Thanks Soufiane for your feedback,

But we can't proceed with changing the IUR interface with Huawei from our
side

only. We have to check with Etisalat operation team.

One association for IUR with RNC206 is the information shared from Huawei
and

Etisalat when creating the IUR link RNC231<>RNC206

Kindly find attached the negotiation data for the IUR.

For the IUR creation for RNC235 I see that configuration of IUR with RNC206
is

different.

Maybe we can proceed with the other recommended changes without changing the
IUR

associations till we confirm with Etisalat & Huawei. What do you think?

Can you please provide the WO for these changes only?

Thanks & BR,

Taher OKASHA

35. 14-Jan-2015 18:27 ALCATEL-LUCENT PROPRIETARY


tibosley
Time Tracking Entry Added - IR01-TSA1

36. 15-Jan-2015 18:04 ALCATEL-LUCENT PROPRIETARY


tibosley

Time Tracking Entry Added - IR01-TSA1

37. 15-Jan-2015 19:49 ALCATEL-LUCENT PROPRIETARY


remiburie

Hi,

Here design feedback for outage:

1) In general, the boards would be in pending state when the MIB is


minimal

(reduced mib).

But, after the CP switchover, I could see the MIB was in nominal state.

12/01/2015 13:09:07.544 num:000ed346 cmca_trace.h.66 [0x6c42f000] CM_CA @

OMU_0(Lp/4,Ap/5) (PERM): CM_CA : MIB build number = 14

12/01/2015 13:09:12.507 num:000ed52a cmca_trace.h.66 [0x6c42f000] CM_CA @

OMU_0(Lp/4,Ap/5) (PERM): CM_CA : Type (Mib State) : 1

Non-zero MIB build number and MIB State 1 signify a Nominal MIB.

As the logs before switchover are not available before CP switchover, I


cannot

confirm that MIB was in proper state prior to CP switchover.

And the following errors were seen on the active OMU while accessing the
MIB,
after coming up from switchover.

12/01/2015 13:17:41.735 num:000eeaf0 DAS_trace_display.cc.73 [0x6a39f000]


DAS @

OMU_0(Lp/4,Ap/5) (PERM): ##ERROR## DAS_Dst: Cannot write in Mib, Mib has


been

deleted (Mib, Data) "SAVE_DB_TEST.do"

12/01/2015 13:17:41.735 num:000eeaf1 DAS_trace_display.cc.60 [0x6a39f000]


DAS @

OMU_0(Lp/4,Ap/5) (PERM): ### DAS WARNING FAULT ###: "MIB has been
deleted"

...

12/01/2015 13:20:06.267 num:000eec10 DAS_trace_display.cc.50 [0x6c42f000]


CM_CA @

OMU_0(Lp/4,Ap/5) (PERM): ##ERROR## DAS_Messages::decodeSimpleReadRep :


Length of

received pointer < minimum expected length. Cannot decode complete message.

12/01/2015 13:20:06.267 num:000eec11 cmca_FaultNotification.cc.144


[0x6c42f000]

CM_CA @ OMU_0(Lp/4,Ap/5) (PERM): *** Error: Error reading MIB

Looks like the MIB was not in proper state when the switchover was
triggered.

2) I could also see couple of NFS errors. These were seen at 13:45:13 on
OMU-1

fms_path_error: path=/OMU/share/rw_data/Assoc_4_9 errno=3145733,

errstr=S_nfsLib_NFSERR_IO

-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-

Date (d/m/y) : 12/01/2015

Time (h:m:s) : 13:45:13

BaseOS NOTIFICATION: APE(0x6a298000) - fms.cc#381 0x13f6000f = Unexpected


error

-> Decoded Call Stack


00a457a0

_Z9notif_log9Boolean_t15Q3_Event_Type_t15Q3_Prob_Cause_t13Q3_Severity_tiPKci
iimjPK

vmjS6_mjS6_ +0x154

00a45618

_Z9notif_log9Boolean_t15Q3_Event_Type_t15Q3_Prob_Cause_t13Q3_Severity_tiPKci
mjPKvm

jS6_mjS6_ +0xcc

00a7c8fc _Z20fms_initialize_errorv +0x404

00a7cbf4 _Z20fms_initialize_errorv +0x6fc

00a7d3c8 _Z10fms_removePKc +0x1a8

00a37768

_Z17fcirc_check_spaceiR18FCIRC_FileHeader_tjPK13FCIRC_mData_tRK14FCIRC_ptFil
e_t9Bo

olean_tRS4_RS7_ +0x2b74

01dda69c _Z14ape_fich_activv +0x100

01de3680 _Z26ape_admin_init_class_activPhP13ape_admin_c_t +0x48

01de06e4 _GLOBAL__D_ape_fich_DESCR +0x274

010c0250 _Z7transithPhP8au_rid_t +0x160

01dd3c54 _Z18ape_admin_exe_autojtjthhPvttS_S_PhPb +0x6c

00c17298

_Z8gob_maintPK11gob_class_tttPKjtP11rootParam_tPvPFvhjjS6_ttE13gob_startup_t
iPKPKc

11MSG_scope_t +0x1fa0

DUMP 00

=======

"/OMU/share/rw_data/Assoc_4_9"

DUMP 01
=======

"S_nfsLib_NFSERR_IO"

DUMP 02

=======

574

-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-_+_-

3) And, coming to the tSmStarter resets, it is expected behavior.

The board will be reset, if it fails to initialize in 5mins.

Lp/13 Ap/1 ; 2015-01-12 11:13:33.22

SET minor communications 70701010

ADMIN: unlocked OPER: enabled USAGE: active

AVAIL: PROC: reporting CNTRL:

ALARM: STBY: notSet UNKNW: false

Id: D000CC4

Com: AP[1] is not initialized within 5 minutes.Initialization timer expired.

Lp/13 Ap/1 ; 2015-01-12 11:13:33.56

MSG minor equipment 00009000

ADMIN: unlocked OPER: enabled USAGE: active

AVAIL: PROC: reporting CNTRL:

ALARM: STBY: notSet UNKNW: false

Id: D000CC5

Com: TMU -- EXCEPTION:Reset from

/
ccase_rnccn/ControlNode/BaseOS/Error/bexception/src/bexception_vxworks.cc:12
31

Reason:CNode application [tSmStarter] requests a reset


4) the following TMU Memory Manager error was seen thousand times even
before the

CP switchover.

Com: TMU -- EXCEPTION:Memory Manager error[memPartFree] at block 0x4f6d8

In conclusion, it seems that this behaviour appeared because we have a huge


number

of TMU reset, it put RNC in several defense case until it get stuck and TMU
went

in pending state.

We believe that if we fix the TMU reset (remember up to 400 per day) we will
not

fall in such outage situation.

Regards,

Rmi

38. 26-Jan-2015 13:29


rories

From: OKASHA, TAHER (TAHER)

Sent: Wednesday, January 14, 2015 10:04 AM

To: KHEDR, MAHMOUD (MAHMOUD); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; EL-
SAEED,

AHMED (AHMED); RIES, Robert (Robert); BURIE, REMI (REMI)

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN
(AYMAN)** CTR **; Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Soufiane,

The configuration of IUR link from Huawei side is matching the attached data
we

requested from them based on the design. Do you think it should be changed
for

RNC231 to add other two SCTPs? What should it impact?

Dear Ahmed El-Saeed,

Is there a reason why only the IUR RNC235<>RNC206 is having 4 SCTPs while
all

other IUR links for RNC235 or others are having two?

Thanks & BR,

Taher OKASHA

39. 26-Jan-2015 13:29 ALCATEL-LUCENT PROPRIETARY


rories

From: BURIE, REMI (REMI)

Sent: Wednesday, January 14, 2015 10:24 AM

To: JAOUANI, NOELLE (NOELLE); CAHOUR, CHRISTOPHE (CHRISTOPHE); MENJAOUI,


NABIL

(NABIL); MERLIN, YANN (YANN)

Cc: DELMAS, PHILIPPE PD (PHILIPPE); Kosorin, Rastislav (Rastislav);


DURANCEAU,

FRANCOIS-XAVIER (FRANCOIS-XAVIER); BALKO, STANISLAV (STANISLAV); BOSLEY, Tim


(Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA, TAHER (TAHER); JACKYRA,

GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY, AHMED (AHMED);


BERIDY,

AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL
(CYRIL);

ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR,
AYMAN

(AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD);

EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); Kosorin, Rastislav (Rastislav); BERIDY, AHMED

(AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


Berky,

Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert
(Robert);

ROY, Paul (Paul)** CTR **

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hi All,

Some clarification and possible route cause

1/ RNC231 is full ip, there is no alcap here, by the way design use same
primitive

to allocate resource that is cause some confusion.

2/ From call trace, there is a lot of rnsap radio link setup failure due to
a SRB
6.8 configuration which is not supported by our product (already seen and
tracked

by AR 1-5254837).

By the way even it's not supported we attempt to allocate resource on uplane
for

those radio link attempt and may conduct to the resource exhaustion
observed.

I believe it the main cause of the apparition of the faulty scenario that
conduct

to the TMU reset.

Design should refuse this radiolink without attempting allocate resources.

In the meantime, is it possible to deactivate this SRB rate on H//


side and then

attempt to re-open the iur link?

3/ CR investigation on AR 1-5532074 is ongoing and they actually work on a

workaround to avoid the bad defense on TMU leading to the reset.

4/ CR investigation on AR 1-5531112 is still ongoing as well, no update.

Regards,

Rmi BURIE

Time Tracking Entry Added - IR01-TSA3

40. 26-Jan-2015 13:31


rories

From: BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Sent: Wednesday, January 14, 2015 10:52 AM

To: OKASHA, TAHER (TAHER); KHEDR, MAHMOUD (MAHMOUD); EL-SAEED, AHMED


(AHMED);
RIES, Robert (Robert); BURIE, REMI (REMI)

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN

(AYMAN)** CTR **; Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hello Taher,

For me it should be the same as in 235_206 IuR, since we are pointing to the
same

neighboring RNC (206), if we follow the logic, unless the Huawie RNc is
working

differently. I think it is better in this case to ask a confirmation to H*


team

& Customer.

Meanwhile please apply a change only on our side, I mean remap the sctp

association to Slot /8 & /9 in stead of /2 &/3

We still have to insist to the fact of number of Association which is not


the same

on 235 (4 V.S 1 in 231), this could offload the Cplane of the IuR

Cdt

Soufiane B

41. 26-Jan-2015 13:32


rories

From: EL-SAEED, AHMED (AHMED)


Sent: Wednesday, 14 January, 2015 11:54

To: OKASHA, TAHER (TAHER); KHEDR, MAHMOUD (MAHMOUD); BENIGHIL, SOUFIANE

(SOUFIANE)** CTR **; RIES, Robert (Robert); BURIE, REMI (REMI)

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN

(AYMAN)** CTR **; Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Dear Taher,

For both IUR interfaces on RNC231 & RNC235 the configuration from ALU RNC
side

is the same

We have 2 SCTP endpoints on each RNC

The difference is in the configuration on RNC206 side which is provided by


HUAWEI

For RNC206<->RNC235 4 SCTP endpoints are defined on RNC206

For RNC206<->RNC231 1 SCTP endpoint is defined on RNC206

Regards,

Ahmed

42. 26-Jan-2015 13:32


rories

From: OKASHA, TAHER (TAHER)


Sent: Wednesday, January 14, 2015 12:02 PM

To: EL-SAEED, AHMED (AHMED); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN

(AYMAN)** CTR **; KHEDR, MAHMOUD (MAHMOUD); BURIE, REMI (REMI); RIES, Robert

(Robert); Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Dear Soufiane, Ahmed,

So as long as for both links there are two SCTP endpoints from RNC231 side
then

having associations to 4 SCTP endpoints from Huawei side should not make a

difference in signaling load, am I right?

Dear Soufiane,

Do you see from counters that the Cplane of the IUR needs to be offloaded?

Thanks & BR,

Taher OKASHA

43. 26-Jan-2015 13:32


rories

From: EL-SAEED, AHMED (AHMED)


Sent: Wednesday, 14 January, 2015 12:22

To: OKASHA, TAHER (TAHER); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN

(AYMAN)** CTR **; KHEDR, MAHMOUD (MAHMOUD); BURIE, REMI (REMI); RIES, Robert

(Robert); Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Dear Taher,

From ALU RNC processor load point of view both configurations are the same

Two processors are handling all IUR SCTP messages in both cases

Regards,

Ahmed

44. 26-Jan-2015 13:32


rories

From: OKASHA, TAHER (TAHER)

Sent: Wednesday, January 14, 2015 12:25 PM

To: EL-SAEED, AHMED (AHMED); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED
(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,
AYMAN

(AYMAN)** CTR **; KHEDR, MAHMOUD (MAHMOUD); BURIE, REMI (REMI); RIES, Robert

(Robert); Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

So what is recommended now is to re-distribute the mapping of SCTP


associations

over DCPS. Right?

Dear Soufiane,

The WO you provided will impact only IUR links correct? IUCS/IUPS SCTP
mapping to

DCPS will remain the same correct?

Thanks & BR,

Taher OKASHA

45. 26-Jan-2015 13:33


rories

From: EL-SAEED, AHMED (AHMED)

Sent: Wednesday, January 14, 2015 11:28 AM

To: OKASHA, TAHER (TAHER); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **

Cc: MONNAIE, DANIEL (DANIEL); MAHER, RAFIK (RAFIK); IBRAHIM ABD EL NABY,
Karim

(Karim)** CTR **; REDA, RAMY (RAMY); BALKO, STANISLAV (STANISLAV); BERIDY,
AHMED

(AHMED); EL-MIDANY, AHMED (AHMED); Berky, Dusan (Dusan); WAGDY MANSOUR,


AYMAN
(AYMAN)** CTR **; KHEDR, MAHMOUD (MAHMOUD); BURIE, REMI (REMI); RIES, Robert

(Robert); Kosorin, Rastislav (Rastislav); ABDEL-HALIM, SAYED (SAYED)

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Yes it is recommended to have SCTP endpoints on RNC distributed among


available

eDCPS cards to distribute the load

I can't tell whether this action will solve the existing IUR issue or not

Regards,

Ahmed

46. 26-Jan-2015 13:37


rories

From: BURIE, REMI (REMI)

Sent: mercredi 14 janvier 2015 10:24

To: JAOUANI, NOELLE (NOELLE); CAHOUR, CHRISTOPHE (CHRISTOPHE); MENJAOUI,


NABIL

(NABIL); MERLIN, YANN (YANN)

Cc: DELMAS, PHILIPPE PD (PHILIPPE); Kosorin, Rastislav (Rastislav);


DURANCEAU,

FRANCOIS-XAVIER (FRANCOIS-XAVIER); BALKO, STANISLAV (STANISLAV); BOSLEY, Tim

(Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA, TAHER (TAHER); JACKYRA,

GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY, AHMED (AHMED);


BERIDY,

AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL
(CYRIL);

ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR,
AYMAN

(AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD);


EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI, RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); Kosorin, Rastislav (Rastislav); BERIDY, AHMED

(AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


Berky,

Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert
(Robert);

ROY, Paul (Paul)** CTR **

Subject: RE: MoM from the meeting about outage: AR 1-5531112

Hi All,

Some clarification and possible route cause

1/ RNC231 is full ip, there is no alcap here, by the way design use same
primitive

to allocate resource that is cause some confusion.

2/ From call trace, there is a lot of rnsap radio link setup failure due to
a SRB

6.8 configuration which is not supported by our product (already seen and
tracked

by AR 1-5254837).

By the way even it's not supported we attempt to allocate resource on uplane
for

those radio link attempt and may conduct to the resource exhaustion
observed.

I believe it the main cause of the apparition of the faulty scenario that
conduct
to the TMU reset.

Design should refuse this radiolink without attempting allocate resources.

In the meantime, is it possible to deactivate this SRB rate on H//


side and then

attempt to re-open the iur link?

3/ CR investigation on AR 1-5532074 is ongoing and they actually work on a

workaround to avoid the bad defense on TMU leading to the reset.

4/ CR investigation on AR 1-5531112 is still ongoing as well, no update.

Regards,

Rmi BURIE

47. 26-Jan-2015 13:37 ALCATEL-LUCENT PROPRIETARY


rories

Time Tracking Entry Added - IR01-TSA3

48. 26-Jan-2015 13:37


rories

De : MENJAOUI, NABIL (NABIL)

Envoy : jeudi 15 janvier 2015 09:23

: BURIE, REMI (REMI); CAHOUR, CHRISTOPHE (CHRISTOPHE); DELMAS, PHILIPPE PD

(PHILIPPE)

Cc : MERLIN, YANN (YANN); JAOUANI, NOELLE (NOELLE); Kosorin, Rastislav

(Rastislav); DURANCEAU, FRANCOIS-XAVIER (FRANCOIS-XAVIER); BALKO, STANISLAV

(STANISLAV); BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA,
TAHER
(TAHER); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY,
AHMED

(AHMED); BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET);
DAOU,

CYRIL (CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY);
WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI,


RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); Kosorin, Rastislav (Rastislav); BERIDY, AHMED

(AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


Berky,

Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert
(Robert);

ROY, Paul (Paul)** CTR **

Objet : RE: MoM from the meeting about outage: AR 1-5531112

Hello All,

Please find thereafter the analysis done in our side:

From the PPC alarms/debug we have the fallowing alarms that are linked to
the

issue:

This alarms show that the Drift RNC receive a request Qaal2 Establishment
Request
from the callP to be able to reserve and allocate UDP port ipInf/6208 ( this
port

correspond to the IuR/8 Neighboring RNC_206).

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:22.803 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.741 tbmNFqaal2.cc CAC History:
[06208:00001:00011:IP] [NOCR]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:23.003 Tbm 3000 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc
TbmNF::receiveEstablishRequest - Msg_traceKey: 0x801084c8 Failed to find the
best Bwpool [NOCR]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:23.203 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc sendReleaseConfirm -
Msg_traceKey: 0x801084c8 FcHistory CID failure cause FINAL VALUE: 21
(IN_FAIL_PATHGROUP_FAILURE)

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:23.403 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc TRM_IF_Estreq_t
Message Received

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:23.603 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc
Msg_traceKey[x0801084c8] userType[2] userId[1] IfId[8] rbType[0]
cnDomainInd[2]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:23.803 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc
Msg_traceKey[x0801084c8] qosInfo:trmTc[3] callpQos[1] thp[1] arpPl[3]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:24.003 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc isA2ea FALSE

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:24.203 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc
Msg_traceKey[x0801084c8] [uplink:downlink] --maximumBitRate[18240:0]
equivalentBitRate[9088:0] equivalentSSSARSduSz[182:1] pcCACType[1]

But the TMB is not able to allocate UDP port on the BandWithPool/ipinf

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:24.403 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc
Msg_traceKey[x0801084c8] plmnId [0:0:0]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:24.603 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.888 tbmNFqaal2.cc CAC History:
[06208:00001:00011:IP] [NOCR]
DBG Lp/3 Ap/0 ; 2015-01-12 14:05:24.803 Tbm 3000 0x00000001 302
Jan 12 2015 14:04:36.914 tbmNFqaal2.cc
TbmNF::receiveEstablishRequest - Msg_traceKey: 0x8010c61a Failed to find the
best Bwpool [NOCR]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:25.003 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc sendReleaseConfirm -
Msg_traceKey: 0x8010c61a FcHistory CID failure cause FINAL VALUE: 19
(IN_FAIL_INODE_INTERNAL_ERROR)

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:25.203 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc TRM_IF_Estreq_t
Message Received

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:25.403 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc
Msg_traceKey[x08010c61a] userType[3] userId[1] IfId[82] rbType[0]
cnDomainInd[1]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:25.603 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc
Msg_traceKey[x08010c61a] qosInfo:trmTc[4] callpQos[1] thp[NONE] arpPl[1]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:25.803 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc isA2ea FALSE

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:26.003 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc
Msg_traceKey[x08010c61a] [uplink:downlink] --maximumBitRate[8682:8288]
equivalentBitRate[4341:4172] equivalentSSSARSduSz[159:152] pcCACType[0]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:26.203 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc
Msg_traceKey[x08010c61a] plmnId [36:244:32]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:26.403 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.914 tbmNFqaal2.cc CAC History:
[00082:00001:00007:IP] [NOCR]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:26.601 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.954 tbmNFqaal2.cc HSPA Audit Rslt, TMU-
44, NumIub-8,(Iub-Stat) 2-A,49-A,70-A,72-A,81-A,94-A,98-A,111-A,

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:26.803 Tbm 3000 0x00000001 302


Jan 12 2015 14:04:36.976 tbmNFqaal2.cc
TbmNF::receiveEstablishRequest - Msg_traceKey: 0x800509b0 Failed to find the
best Bwpool [NOCR]

DBG Lp/3 Ap/0 ; 2015-01-12 14:05:27.003 Tbm 1767 0x00000001 302


Jan 12 2015 14:04:36.976 tbmNFqaal2.cc sendReleaseConfirm -
Msg_traceKey: 0x800509b0 FcHistory CID failure cause FINAL VALUE: 21

By comparing the RNC 231 IuR configuration of the neighboring to HWI RNC
that
causing the issue to other RNC 235 which not suffer from the same issue,
there is

some incoherence specially concerning the number of BWpool/x Ipflow/x


between the

RNC 235 configuration and the RNC 231 :

RNC 231 (RNC with the TMU/Iur issue):

In the RNC 231 the neighboring to RNC 206 : the IUR corresponding to this

neighboring is IUR/8.

In the IuR/8 for the UPlane we use the the UDP port 6208:

And The UDP port have only one association :

RNC 235:

By the doing the same mapping with the RNC 235 which the same neighboring
RNC 206

( HWI) as the impacted RNC 231:

TO summarize the mapping I will just listed thereafter :

RNC 235/Neighboring RNC /206 ' Iur/3

Iur/3'IpIf/6203

IpIf6203 has 4 associations:

For my point of view if there is less associations makes only some of the
TMU
linked to this ports more stresses and overload then the other, and then as
the

TMU could not answer any requested so it resets and the RNC is unbalanced
and

makes other TMU reset creating a snowball effect on the RNC.

The TMU reset happen more than 5268 time since the 18th December !

To conclude this analysis from our part we suspect that the messing
association on

the IuR interface have an impact on the reset of the TMU.

The SRB 6.8 could be a trigger to bring the problem up and make the TMU
relocate

resources , which leads to memory erros on the TMU ( ppc alams :


EXCEPTION:Memory

Manager error[memPartFree] at block 0x4f6d8).

As the RNC 235 not have the same issue, and as he is neighbor with the RNC (
206 -

HWI) we suspect that the different IuR configuration messing BWpool/ ipflow
could

be part of the root cause of this issue.

Philippe Delmas will have a look concerning the messing association and will

provide his point of view from transport perspective.

Regards,

Nabil

49. 26-Jan-2015 13:37


rories
From: DELMAS, PHILIPPE PD (PHILIPPE)

Sent: Thursday, January 15, 2015 10:02 AM

To: MENJAOUI, NABIL (NABIL); BURIE, REMI (REMI); CAHOUR, CHRISTOPHE


(CHRISTOPHE)

Cc: MERLIN, YANN (YANN); JAOUANI, NOELLE (NOELLE); Kosorin, Rastislav


(Rastislav);

DURANCEAU, FRANCOIS-XAVIER (FRANCOIS-XAVIER); BALKO, STANISLAV (STANISLAV);

BOSLEY, Tim (Tim)** CTR **; ABDEL-HALIM, SAYED (SAYED); OKASHA, TAHER
(TAHER);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); EL-MIDANY, AHMED


(AHMED);

BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU,
CYRIL

(CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); EL-FAKHARANY, HISHAM (HISHAM); ABDEL-HALIM, SAYED (SAYED); ZAHABI,


RAMSEY

(RAMSEY); OKASHA, TAHER (TAHER); MORVAN, FREDERIC (FREDERIC); JACKYRA,


GREGORY

(GREGORY); MOHAMED, GEHAD (GEHAD); BIALOBRODA, ROLAND (ROLAND); COPOS,


BOGDAN

(BOGDAN); REDA, RAMY (RAMY); KHEDR, MAHMOUD (MAHMOUD); RIES, Robert


(Robert);

BALKO, STANISLAV (STANISLAV); Kosorin, Rastislav (Rastislav); BERIDY, AHMED

(AHMED); SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


Berky,

Dusan (Dusan); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; RIES, Robert
(Robert);

ROY, Paul (Paul)** CTR **

Subject: RE: MoM from the meeting about outage: AR 1-5531112

hello

The amount of ipFlow under a bwPool depends on the TransportMap table


associated
to the bwPool.

Per default, the TransportMap associated to the Iur bwPool is the


TransportMap/5

from the IubTEG, this TransportMap/5 classifies the egress traffic in 4


ipFlows

then required 4 qos ipFlow under the bwPool.

You should identify and read the transportMap associated to the Iur bwPool
for the

ALU RNC under discussion and check how many ipFlow are indicated in the

transportMap.

Regards

Philippe

50. 26-Jan-2015 13:39


rories

From: OKASHA, TAHER (TAHER)

Sent: Monday, January 19, 2015 9:03 AM

To: ROY, Paul (Paul)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BURIE,
REMI (REMI); BOSLEY, Tim (Tim)** CTR **; Hall, Gail Culver (Gail)** CTR **;

SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT); BENIGHIL,

SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); MENJAOUI, NABIL


(NABIL)

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; RATHI, Rajneesh (Rajneesh)** CTR **;
LOUVIER,

CHRISTOPHE (CHRISTOPHE); GILMOUR, CARL R (CARL)** CTR **; ESCANDE, PHILIPPE

(PHILIPPE); CHARBONNIER, DOMINIQUE (DOMINIQUE); PUJAR, Girish (Girish)** CTR


**

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello All,

Last night the recommendations from NEA team to add the needed IPflows to
the IUR

Uplane has been implemented. IUR has been unlocked this morning and so far
it's

stable an no TMU resets has been observed.

It's under close monitoring and we'll keep you updated.

Thanks & BR,

Taher OKASHA

51. 26-Jan-2015 13:39


rories

From: OKASHA, TAHER (TAHER)

Sent: mardi 20 janvier 2015 11:07


To: BURIE, REMI (REMI); MENJAOUI, NABIL (NABIL); ROY, Paul (Paul)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; ROY, Paul (Paul)** CTR **; EL-MIDANY, AHMED

(AHMED); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; ABDEL-HALIM, SAYED
(SAYED);

KHEDR, MAHMOUD (MAHMOUD); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC


(FREDERIC);

BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU,
CYRIL

(CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);


Kosorin,

Rastislav (Rastislav); Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO,

STANISLAV (STANISLAV); BOSLEY, Tim (Tim)** CTR **; Hall, Gail Culver
(Gail)** CTR

**; SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


BENIGHIL,

SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); RATHI, Rajneesh

(Rajneesh)** CTR **; LOUVIER, CHRISTOPHE (CHRISTOPHE); GILMOUR, CARL R


(CARL)**

CTR **; ESCANDE, PHILIPPE (PHILIPPE); CHARBONNIER, DOMINIQUE (DOMINIQUE);


PUJAR,

Girish (Girish)** CTR **; RAAOUF, SAMEH (SAMEH)

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Dears,

No TMU reset alarms have been observed till now since the IUR IPflows
addition.
Are there any traces or logs you would like to check to confirm proper
behavior?

Thanks & BR,

Taher OKASHA

52. 26-Jan-2015 13:39


rories

From: MENJAOUI, NABIL (NABIL)

Sent: Tuesday, January 20, 2015 01:59 PM

To: OKASHA, TAHER (TAHER); BURIE, REMI (REMI); ROY, Paul (Paul)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; ROY, Paul (Paul)** CTR **; EL-MIDANY, AHMED

(AHMED); IBRAHIM ABD EL NABY, Karim (Karim)** CTR **; ABDEL-HALIM, SAYED
(SAYED);

KHEDR, MAHMOUD (MAHMOUD); JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC


(FREDERIC);

BERIDY, AHMED (AHMED); REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU,
CYRIL

(CYRIL); ELHAKIM, Mohamed (Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY

MANSOUR, AYMAN (AYMAN)** CTR **; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD

(GEHAD); CAHOUR, CHRISTOPHE (CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM);


Kosorin,

Rastislav (Rastislav); Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO,

STANISLAV (STANISLAV); BOSLEY, Tim (Tim)** CTR **; Hall, Gail Culver
(Gail)** CTR

**; SUSARRET, Andres (Andres)** CTR **; MERCHAUT, VINCENT (VINCENT);


BENIGHIL,

SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); RATHI, Rajneesh

(Rajneesh)** CTR **; LOUVIER, CHRISTOPHE (CHRISTOPHE); GILMOUR, CARL R


(CARL)**
CTR **; ESCANDE, PHILIPPE (PHILIPPE); CHARBONNIER, DOMINIQUE (DOMINIQUE);
PUJAR,

Girish (Girish)** CTR **; RAAOUF, SAMEH (SAMEH)

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Taher,

Can I have the RNC CPU load , with the RNC RAB/TMU configuration (counter :

VS_ApCpuUtilizationAvg (U20202))?

I would like to have the counters for the last two days for one hour
granularity.

Regards,

Nabil

53. 26-Jan-2015 13:40


rories

From: NABIL GEORGES, MARTIN (MARTIN)** CTR **

Sent: Tuesday, January 20, 2015 4:29 PM

To: OKASHA, TAHER (TAHER); MENJAOUI, NABIL (NABIL); BURIE, REMI (REMI); ROY,
Paul

(Paul)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed
(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BOSLEY,

Tim (Tim)** CTR **; Hall, Gail Culver (Gail)** CTR **; SUSARRET, Andres
(Andres)**

CTR **; MERCHAUT, VINCENT (VINCENT); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **;

EL-SAEED, AHMED (AHMED); RATHI, Rajneesh (Rajneesh)** CTR **

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Nabil,

Kindly find attached

Best Regards,

Martin Nabil

54. 26-Jan-2015 13:40


rories

From: MENJAOUI, NABIL (NABIL)

Sent: Wednesday, January 21, 2015 2:15 PM

To: NABIL GEORGES, MARTIN (MARTIN)** CTR **; OKASHA, TAHER (TAHER); BURIE,
REMI

(REMI); ROY, Paul (Paul)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;
CARLSON, Keith (Keith)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

Berky, Dusan (Dusan); RIES, Robert (Robert); BALKO, STANISLAV (STANISLAV);


BOSLEY,

Tim (Tim)** CTR **; Hall, Gail Culver (Gail)** CTR **; SUSARRET, Andres
(Andres)**

CTR **; MERCHAUT, VINCENT (VINCENT); BENIGHIL, SOUFIANE (SOUFIANE)** CTR **;

EL-SAEED, AHMED (AHMED); RATHI, Rajneesh (Rajneesh)** CTR **

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello all,

Thanks Matin for the data provided.

Please find there after the analysis of the RNC 231 load.

The RNC is not loaded at all , the average CPU per process is under 10%,
this RNC

may not host a lot of sites :

For the TMU load, one thing catch my attention is that in the board 9 one
TMU is
not reporting data, TMU ( LP9/Ap1).

For the rest of TMU they are all under 25% most of the TMU loads are
oscillating

between 5 and 20%.

There is no alarming point in terms of load in concerning the RNC , even if


the

load of the TMU is not balanced, it is not that surprising because the RNC
is not

handling a significant load.

We can see also that a small increase of the average RNC load happened after

adding the ipflow.

The TMU in the board 9 ( Lp9/Ap1) is not reporting data ( since 11th Jan
2015

counters in my position- maybe before), you should clarify if the it is a


counter

data issue or if the TMU is out of service.

Regards,

Nabil

55. 26-Jan-2015 13:41


rories

From: RIES, Robert (Robert)

Sent: Monday, January 26, 2015 10:11 AM

To: NABIL GEORGES, MARTIN (MARTIN)** CTR **; OKASHA, TAHER (TAHER)

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;
CARLSON, Keith (Keith)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); RATHI,


Rajneesh

(Rajneesh)** CTR **; BURIE, REMI (REMI); ROY, Paul (Paul)** CTR **;
MENJAOUI,

NABIL (NABIL); ESCANDE, PHILIPPE (PHILIPPE)

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Martin, Taher, all

Original outage issue of this outage AR was solved by correction of


provisioning.

Next issues result from this outage are continuously investigated in another
ARs:

AR 1-5532074 => opened to find RC of frequently repeated TMU resets - CR


1434789

in design

AR 1-5542978 => KPIs degradation after unlock the IUR link between RNC231& H

RNC206 - TPS L3
As even after 2weeks of monitoring period there are no other issues
connected with

original outage issue, kindly could we close this AR ?

Thank you

Best Regards

Robert Ries

56. 03-Feb-2015 11:53


rories

From: OKASHA, TAHER (TAHER)

Sent: Tuesday, January 27, 2015 9:03 AM

To: RIES, Robert (Robert); NABIL GEORGES, MARTIN (MARTIN)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);

JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED


(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); RATHI,


Rajneesh

(Rajneesh)** CTR **; BURIE, REMI (REMI); ROY, Paul (Paul)** CTR **;
MENJAOUI,
NABIL (NABIL); ESCANDE, PHILIPPE (PHILIPPE)

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Robert,

For me the correction of provisioning solved the TMU reset issue but the RC
for

the main outage itself for AR 1-5531112 is not clear yet. is it final
confirmation

that it was correlated with the TMU resets?

Thanks & BR,

Taher OKASHA

57. 03-Feb-2015 11:53 ALCATEL-LUCENT PROPRIETARY


rories

Time Tracking Entry Added - IR01-TSA3

58. 03-Feb-2015 11:56


rories

From: RIES, Robert (Robert)

Sent: Tuesday, January 27, 2015 10:03 AM

To: OKASHA, TAHER (TAHER); NABIL GEORGES, MARTIN (MARTIN)** CTR **

Cc: DAS, Sunil (Sunil)** CTR **; T, Sreenivas Murthy (Sreenivas Murthy)**
CTR **;

CARLSON, Keith (Keith)** CTR **; EL-MIDANY, AHMED (AHMED); IBRAHIM ABD EL
NABY,

Karim (Karim)** CTR **; ABDEL-HALIM, SAYED (SAYED); KHEDR, MAHMOUD


(MAHMOUD);
JACKYRA, GREGORY (GREGORY); MORVAN, FREDERIC (FREDERIC); BERIDY, AHMED
(AHMED);

REDA, RAMY (RAMY); KUMAR, NAVNEET (NAVNEET); DAOU, CYRIL (CYRIL); ELHAKIM,
Mohamed

(Mohamed)** CTR **; ZAHABI, RAMSEY (RAMSEY); WAGDY MANSOUR, AYMAN (AYMAN)**
CTR

**; IBRAHIM, MAHMOUD (MAHMOUD); MOHAMED, GEHAD (GEHAD); CAHOUR, CHRISTOPHE

(CHRISTOPHE); EL-FAKHARANY, HISHAM (HISHAM); Kosorin, Rastislav (Rastislav);

BENIGHIL, SOUFIANE (SOUFIANE)** CTR **; EL-SAEED, AHMED (AHMED); RATHI,


Rajneesh

(Rajneesh)** CTR **; BURIE, REMI (REMI); ROY, Paul (Paul)** CTR **;
MENJAOUI,

NABIL (NABIL); ESCANDE, PHILIPPE (PHILIPPE)

Subject: RE: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Taher

Yes, based on analysis performed by Nabil Menjaoui (sent the 15th Jan
9:23AM) we

know that TMU resets were triggered by insufficient IuR bandwith (alarm
Failed to

find the best Bwpool).

And it is caused by under-dimensioned IuR interconnection => just one


association

from RNC231 towards to RNC206.

So if you agree, may I close the AR 1-5531112 ?

Best Regards

Robert Ries
59. 03-Feb-2015 11:59
rories

From: RIES, Robert (Robert)

Sent: Tuesday, January 27, 2015 4:03 PM

To: OKASHA, TAHER (TAHER)

Cc: EL-MIDANY, AHMED (AHMED); NABIL GEORGES, MARTIN (MARTIN)** CTR **

Subject: FW: Follow-up//Etisalat-UAE//RNC231 Outage -- AR 1-5531112//1-


5532074

Hello Taher

I consulted with TEC points regarding the POA what we talked about via Lync
last

time.

Nabil Menjaoui forwarded to me the attached mail stream (where I was not in
copy

originally). You can see, that he already explained in details the outage RC
to

you.

Next is summary of it:

The TMU reset was caused because there was some Ipflow missing and because
the

fact that there was not a balance distribution of Ipflow link to all the
TMUs.

Some TMU interfaces are handling more traffic than the others. This causes
some

TMU`s memory errors. Thus TMU triggers its own reset as defense mechanism in
order
to recover the memory issue. But this activity creates an imbalance and then
all

the TMUs of the RNC start falling down one after the other (like snow ball

effect).

After several resets (at least 5268 per month), the RNC`s self defense
mechanism

decide to reset all the board to recover the issue. Which leads to the
outage.

Is it sufficient explanation for you ?

Best Regards

Robert Ries

60. 03-Feb-2015 12:02


rories

Update to Current Summary: Fix delivered to load LR13.3.9

Resolution

The issue is known and there is fix for it delivered to load LR13.3.9

End of Assistance Request 1-5531112


View AR | AR without Proprietary | OLCS
New AR:
History | States | Assignment | Timetracking
AR Text Search | More | CARES Home